Working with dataframe formats

A dataframe is a 2-dimensional array that corresponds to a representation of data in table format with columns (fields) of different data types. The first array index is given by the row counter, the second index is derived from the field labels (the header row in a CSV file) or in the absence of a header row in a CSV-format file by a counter of the table columns.

Dataframe format for CSV

Dataframe-format CSV reading is provided via the 'diskdata' functionality of the module mmetc, it is supported by the I/O driver mmetc.diskdata and equally by the subroutine diskdata. If the data types are not know upfront or may vary across the table columns the Mosel array type needs to be able to cover all possible cases, so typically this will be either text or a union type.

Assuming we wish to read this CSV file mydata.csv (note that this file contains some empty cells):

C_e,C_d,C_c,C_b,C_a,C_s
1,,"3",4,5.5,"r1"
6,7,"8",,10.5,"r2"

We can use the I/O driver mmetc.diskdata (the example also shows output into a newly created CSV file) to read data into fixed size or dynamic arrays of type text or of a suitable union type (here we simply use the predefined type any):

model 'dataframecsv'
 uses 'mmsystem', 'mmetc'

 declarations
   dfd: dynamic array(r:range, s:set of string) of text
   dff: array(rf:range, sf:set of string) of text
   dfa: dynamic array(ra:range, sa:set of string) of any
 end-declarations
 initialisations from 'mmetc.diskdata:'
   dfd as "dataframe;skiph;mydata.csv"
   dff as "dataframe;skiph;mydata.csv"
   dfa as "dataframe;skiph;mydata.csv"
 end-initialisations
 writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12

 initialisations to 'mmetc.diskdata:'
   dfd as "dataframe;skiph;res.csv"           ! Output CSV file with header line
   dfd as "dataframe;resnh.csv"               ! Output CSV file without header
 end-initialisations
end-model

Or we can employ the diskdata subroutine to perform the same tasks as shown in the following code snippet.

model 'dataframecsv'
 uses 'mmsystem', 'mmetc'

 declarations
   dfd: dynamic array(r:range, s:set of string) of text
   dff: array(rf:range, sf:set of string) of text
   dfa: dynamic array(ra:range, sa:set of string) of any
 end-declarations
 diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfd)
 diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dff)
 diskdata(ETC_DATAFRAME+ETC_SKIPH, "mydata.csv", dfa)
 writeln("dyn:", dfd.size, " fix:", dff.size) ! Output displayed: dyn:10 fix:12

! Output CSV file with and without header line
 diskdata(ETC_SKIPH+ETC_OUT+ETC_CSV+ETC_DATAFRAME, "res.csv", dfd)
 diskdata(ETC_OUT+ETC_CSV+ETC_DATAFRAME, "resnh.csv", dfd)
end-model

If no header line is present in the CSV input file we need to adapt the type of the second index of the Mosel array storing the dataframe to be a range

 declarations
   dfd2: dynamic array(r21:range, r2:range) of text
 end-declarations
 initialisations from 'mmetc.diskdata:'
   dfd2 as "dataframe;resnh.csv"
 end-initialisations
! Same as:
! diskdata(ETC_DATAFRAME, "resnh.csv", dfd2)
 writeln("size df2=", dfd2.size)            ! Output displayed: size df2=10

With the I/O driver it is possible to select specific fields (this option is not available through the subroutine version):

declarations
   dfd3: dynamic array(r3:range,s3:set of string) of any
 end-declarations
 initialisations from 'mmetc.diskdata:'
   dfd3 as "dataframe;skiph;csv(C_d,C_b);mydata.csv"
 end-initialisations
 writeln("size df3=", dfd3.size)            ! Output displayed: size df3=2

Dataframe format for databases

The ODBC interface in mmodbc provides the subroutine SQLdataframe for reading all or selected fields of a database table into a Mosel array without any prior knowledge of data types or table dimensions: the expected argument is a 2-dimensional array with its first index a range and the second index a set of string, typically it will be either of type text or of a suitable union type. A typed array (such as integer) is accepted, but in this case only fields with matching types are read.

The following code snippet shows how to read all or selected columns of a database table into a Mosel array. It is possible to work with database commands that combine multiple tables—if this results in multiple occurrences of field names unique field names will be generated by appending counters.

model 'dataframedb'
 options keepassert
 uses 'mmodbc','mmsystem'

 declarations
  dfa: array(ra:range,csa:set of string) of any
  dft: array(rt:range,cst:set of string) of text
  dfa2: array(ra2:range,csa2:set of string) of any
 end-declarations

 SQLconnect("dbtest.sqlite")
 assert(getparam("SQLsuccess"))

! Read the entire table 'shirts'
 SQLdataframe("select * from shirts", dfa)
 writeln("csa: ", csa)   ! Output: csa: {`id',`style',`price',`color',`owner'}
 writeln(sum(i in ra) dfa(i,"owner").integer)

! Select some columns and specify new name for one of them
 SQLdataframe("select id as prod,color as couleur,style from shirts", dft)
 writeln("cst: ", cst)   ! Output: cst: {`prod',`couleur',`style'}

! Can handle data from multiple tables, in the case of multiple occurrences
! of field names unique names are generated
 SQLdataframe("select * from shirts inner join shop on id=article", dfa2)
 writeln("csa2: ", csa2)
   ! Output:  csa2: {`id',`style',`price',`color',`owner',`article',`owner_7'}

 SQLdisconnect
end-model

© 2001-2021 Fair Isaac Corporation. All rights reserved. This documentation is the property of Fair Isaac Corporation (“FICO”). Receipt or possession of this documentation does not convey rights to disclose, reproduce, make derivative works, use, or allow others to use it except solely for internal evaluation purposes to determine whether to purchase a license to the software described in this documentation, or as otherwise set forth in a written software license agreement between you and FICO (or a FICO affiliate). Use of this documentation and the software described in it must conform strictly to the foregoing permitted uses, and no other use is permitted.

Contents

Index

Glossary

Search Results

Working with dataframe formats

Dataframe format for CSV

Dataframe format for databases