Working with dataframe formats
A dataframe is a 2-dimensional array that corresponds to a representation of data in table format with columns (fields) of different data types. The first array index is given by the row counter, the second index is derived from the field labels (the header row in a CSV file) or in the absence of a header row in a CSV-format file by a counter of the table columns.
Dataframe format for CSV
Dataframe-format CSV reading is provided via the 'diskdata' functionality of the module mmetc, it is supported by the I/O driver mmetc.diskdata and equally by the subroutine diskdata. If the data types are not know upfront or may vary across the table columns the Mosel array type needs to be able to cover all possible cases, so typically this will be either text or a union type.
Assuming we wish to read this CSV file mydata.csv (note that this file contains some empty cells):
C_e,C_d,C_c,C_b,C_a,C_s 1,,"3",4,5.5,"r1" 6,7,"8",,10.5,"r2"
We can use the I/O driver mmetc.diskdata (the example also shows output into a newly created CSV file) to read data into fixed size or dynamic arrays of type text or of a suitable union type (here we simply use the predefined type any):
model 'dataframecsv' uses 'mmsystem', 'mmetc' declarations dfd: dynamic array(r:range, s:set of string) of text dff: array(rf:range, sf:set of string) of text dfa: dynamic array(ra:range, sa:set of string) of any end-declarations initialisations from 'mmetc.diskdata:' dfd as "dataframe;skiph;mydata.csv" dff as "dataframe;skiph;mydata.csv" dfa as "dataframe;skiph;mydata.csv" end-initialisations writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12 initialisations to 'mmetc.diskdata:' dfd as "dataframe;skiph;res.csv" ! Output CSV file with header line dfd as "dataframe;resnh.csv" ! Output CSV file without header end-initialisations end-model
Or we can employ the diskdata subroutine to perform the same tasks as shown in the following code snippet.
model 'dataframecsv' uses 'mmsystem', 'mmetc' declarations dfd: dynamic array(r:range, s:set of string) of text dff: array(rf:range, sf:set of string) of text dfa: dynamic array(ra:range, sa:set of string) of any end-declarations diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfd) diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dff) diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfa) writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12 ! Output CSV file with and without header line diskdata(ETC_SKIPH+ETC_OUT+ETC_CSV+ETC_DATAFRAME, "res.csv", dfd) diskdata(ETC_OUT+ETC_CSV+ETC_DATAFRAME, "resnh.csv", dfd) end-model
If no header line is present in the CSV input file we need to adapt the type of the second index of the Mosel array storing the dataframe to be a range
declarations dfd2: dynamic array(r21:range, r2:range) of text end-declarations initialisations from 'mmetc.diskdata:' dfd2 as "dataframe;resnh.csv" end-initialisations ! Same as: ! diskdata(ETC_DATAFRAME, "resnh.csv", dfd2) writeln("size df2=", dfd2.size) ! Output displayed: size df2=10
With the I/O driver it is possible to select specific fields (this option is not available through the subroutine version):
declarations dfd3: dynamic array(r3:range,s3:set of string) of any end-declarations initialisations from 'mmetc.diskdata:' dfd3 as "dataframe;skiph;csv(C_d,C_b);mydata.csv" end-initialisations writeln("size df3=", dfd3.size) ! Output displayed: size df3=2
Dataframe format for databases
The ODBC interface in mmodbc provides the subroutine SQLdataframe for reading all or selected fields of a database table into a Mosel array without any prior knowledge of data types or table dimensions: the expected argument is a 2-dimensional array with its first index a range and the second index a set of string, typically it will be either of type text or of a suitable union type. A typed array (such as integer) is accepted, but in this case only fields with matching types are read.
The following code snippet shows how to read all or selected columns of a database table into a Mosel array. It is possible to work with database commands that combine multiple tables—if this results in multiple occurrences of field names unique field names will be generated by appending counters.
model 'dataframedb' options keepassert uses 'mmodbc','mmsystem' declarations dfa: array(ra:range,csa:set of string) of any dft: array(rt:range,cst:set of string) of text dfa2: array(ra2:range,csa2:set of string) of any end-declarations SQLconnect("dbtest.sqlite") assert(getparam("SQLsuccess")) ! Read the entire table 'shirts' SQLdataframe("select * from shirts", dfa) writeln("csa: ", csa) ! Output: csa: {`id',`style',`price',`color',`owner'} writeln(sum(i in ra) dfa(i,"owner").integer) ! Select some columns and specify new name for one of them SQLdataframe("select id as prod,color as couleur,style from shirts", dft) writeln("cst: ", cst) ! Output: cst: {`prod',`couleur',`style'} ! Can handle data from multiple tables, in the case of multiple occurrences ! of field names unique names are generated SQLdataframe("select * from shirts inner join shop on id=article", dfa2) writeln("csa2: ", csa2) ! Output: csa2: {`id',`style',`price',`color',`owner',`article',`owner_7'} SQLdisconnect end-model
© 2001-2021 Fair Isaac Corporation. All rights reserved. This documentation is the property of Fair Isaac Corporation (“FICO”). Receipt or possession of this documentation does not convey rights to disclose, reproduce, make derivative works, use, or allow others to use it except solely for internal evaluation purposes to determine whether to purchase a license to the software described in this documentation, or as otherwise set forth in a written software license agreement between you and FICO (or a FICO affiliate). Use of this documentation and the software described in it must conform strictly to the foregoing permitted uses, and no other use is permitted.