Working with dataframe formats
A dataframe is a 2-dimensional array that corresponds to a representation of data in table format with columns (fields) of different data types. The first array index is given by the row counter, the second index is derived from the field labels (the header row in a CSV file) or in the absence of a header row in a CSV-format file by a counter of the table columns.
Dataframe format for CSV
Dataframe-format CSV reading is provided via the 'diskdata' functionality of the module mmetc, it is supported by the I/O driver mmetc.diskdata and equally by the subroutine diskdata. If the data types are not know upfront or may vary across the table columns the Mosel array type needs to be able to cover all possible cases, so typically this will be either text or a union type.
Assuming we wish to read this CSV file mydata.csv (note that this file contains some empty cells):
C_e,C_d,C_c,C_b,C_a,C_s 1,,"3",4,5.5,"r1" 6,7,"8",,10.5,"r2"
We can use the I/O driver mmetc.diskdata (the example also shows output into a newly created CSV file) to read data into fixed size or dynamic arrays of type text or of a suitable union type (here we simply use the predefined type any):
model 'dataframecsv'
uses 'mmsystem', 'mmetc'
declarations
dfd: dynamic array(r:range, s:set of string) of text
dff: array(rf:range, sf:set of string) of text
dfa: dynamic array(ra:range, sa:set of string) of any
end-declarations
initialisations from 'mmetc.diskdata:'
dfd as "dataframe;skiph;mydata.csv"
dff as "dataframe;skiph;mydata.csv"
dfa as "dataframe;skiph;mydata.csv"
end-initialisations
writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12
initialisations to 'mmetc.diskdata:'
dfd as "dataframe;skiph;res.csv" ! Output CSV file with header line
dfd as "dataframe;resnh.csv" ! Output CSV file without header
end-initialisations
end-model Or we can employ the diskdata subroutine to perform the same tasks as shown in the following code snippet.
model 'dataframecsv'
uses 'mmsystem', 'mmetc'
declarations
dfd: dynamic array(r:range, s:set of string) of text
dff: array(rf:range, sf:set of string) of text
dfa: dynamic array(ra:range, sa:set of string) of any
end-declarations
diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfd)
diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dff)
diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfa)
writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12
! Output CSV file with and without header line
diskdata(ETC_SKIPH+ETC_OUT+ETC_CSV+ETC_DATAFRAME, "res.csv", dfd)
diskdata(ETC_OUT+ETC_CSV+ETC_DATAFRAME, "resnh.csv", dfd)
end-model If no header line is present in the CSV input file we need to adapt the type of the second index of the Mosel array storing the dataframe to be a range
declarations
dfd2: dynamic array(r21:range, r2:range) of text
end-declarations
initialisations from 'mmetc.diskdata:'
dfd2 as "dataframe;resnh.csv"
end-initialisations
! Same as:
! diskdata(ETC_DATAFRAME, "resnh.csv", dfd2)
writeln("size df2=", dfd2.size) ! Output displayed: size df2=10 With the I/O driver it is possible to select specific fields (this option is not available through the subroutine version):
declarations
dfd3: dynamic array(r3:range,s3:set of string) of any
end-declarations
initialisations from 'mmetc.diskdata:'
dfd3 as "dataframe;skiph;csv(C_d,C_b);mydata.csv"
end-initialisations
writeln("size df3=", dfd3.size) ! Output displayed: size df3=2 Dataframe format for databases
The ODBC interface in mmodbc provides the subroutine SQLdataframe for reading all or selected fields of a database table into a Mosel array without any prior knowledge of data types or table dimensions: the expected argument is a 2-dimensional array with its first index a range and the second index a set of string, typically it will be either of type text or of a suitable union type. A typed array (such as integer) is accepted, but in this case only fields with matching types are read.
The following code snippet shows how to read all or selected columns of a database table into a Mosel array. It is possible to work with database commands that combine multiple tables—if this results in multiple occurrences of field names unique field names will be generated by appending counters.
model 'dataframedb'
options keepassert
uses 'mmodbc','mmsystem'
declarations
dfa: array(ra:range,csa:set of string) of any
dft: array(rt:range,cst:set of string) of text
dfa2: array(ra2:range,csa2:set of string) of any
end-declarations
SQLconnect("dbtest.sqlite")
assert(getparam("SQLsuccess"))
! Read the entire table 'shirts'
SQLdataframe("select * from shirts", dfa)
writeln("csa: ", csa) ! Output: csa: {`id',`style',`price',`color',`owner'}
writeln(sum(i in ra) dfa(i,"owner").integer)
! Select some columns and specify new name for one of them
SQLdataframe("select id as prod,color as couleur,style from shirts", dft)
writeln("cst: ", cst) ! Output: cst: {`prod',`couleur',`style'}
! Can handle data from multiple tables, in the case of multiple occurrences
! of field names unique names are generated
SQLdataframe("select * from shirts inner join shop on id=article", dfa2)
writeln("csa2: ", csa2)
! Output: csa2: {`id',`style',`price',`color',`owner',`article',`owner_7'}
SQLdisconnect
end-model © 2001-2022 Fair Isaac Corporation. All rights reserved. This documentation is the property of Fair Isaac Corporation (“FICO”). Receipt or possession of this documentation does not convey rights to disclose, reproduce, make derivative works, use, or allow others to use it except solely for internal evaluation purposes to determine whether to purchase a license to the software described in this documentation, or as otherwise set forth in a written software license agreement between you and FICO (or a FICO affiliate). Use of this documentation and the software described in it must conform strictly to the foregoing permitted uses, and no other use is permitted.
