Spreadsheets: Excel and CSV

MS Excel spreadsheets can be accessed directly from a Mosel model with the help of the excel or xls/xlsx I/O drivers. The use of these drivers is similar to what we have seen above for the odbc driver.

Yet another method for accessing spreadsheet data consists in using the CSV driver csv with spreadsheets that have been saved in CSV format. All the dedicated spreadsheet drivers are defined by the module msheet. The main differences between the functionality and usage of the various spreadsheet drivers are summarized in the following table.

**Table 1:** Comparison of spreadsheet I/O drivers
	excel	xls/xlsx	csv
File type	physical file	physical file	extended file
Supported platforms	Windows	Windows, Linux, Mac	all Xpress platforms
Requirements	Excel + open interactive session	none, can be used remotely	none, can be used remotely
File creation for output	no	yes	yes
Output writing mechanism	on-screen display without saving if application running, otherwise data saved into file	data saved into file	data saved into file
Named ranges	yes	yes	no
Multiple worksheets	yes	yes	no
VBA macros	yes	no	no

The diskdata driver defined in the module mmetc also provides functionality for reading and writing CSV format text files. For the items listed in Table Comparison of spreadsheet I/O drivers it has the same charateristics as the csv driver. The requirements on the layout of the data file by diskdata are somewhat more restrictive than those made by the csv I/O driver—most importantly, this driver expects a single data table per file. The diskdata driver works in a linewise fashion (always reading all data rows of a file), as opposed to the I/O drivers of the mmsheet module that load the entire data file into memory when first accessing it and then allow the user to select specific ranges. Its use may be preferrable to csv with respect to memory use and performance when reading large data files, particularly on platforms where loading large files is time-consuming.

Data input using the excel or xls I/O drivers

We shall work with a spreadsheet that has the following contents:


	A	B	C	D	E
1
2		Index_i	Index_j	Value
3		1	1	12.5
4		2	3	5.6
5		10	9	-7.1
6		3	2	1
7

Assuming that we have named 'MyDataTable2' the cell range B3:D6 (that is, selecting just the data, excluding the header row) we can then read in these data with the following model duoexc.mos.

model "Duo input (Excel)"
 uses "mmsheet"

 declarations
  A4: dynamic array(range,range) of real
 end-declarations

! Use the excel driver for reading the data
 initializations from "mmsheet.excel:data.xls"
  A4 as 'MyDataTable2'
 end-initializations

! Print out the data we have read
 writeln('A4 is: ', A4)

end-model

It is also possible to read the data from a range 'MyDataTable' that inludes the header row (that is, selecting the area B2:D6). In this case we need to specify the driver option skiph to skip the header line of the selected data range:

 initializations from "mmsheet.excel:data.xls"
  A4 as 'skiph;MyDataTable'
 end-initializations

mmsheet

Yet another possibility is to use directly the worksheet and cell references instead of defining a named range (NB: the first sheet of the workbook is selected if no worksheet name is specified and the used cells of the selected sheet are assumed if no cell range selection is provided):

 initializations from "mmsheet.excel:data.xls"
  A4 as '[Sheet1$B3:D6]'
 end-initializations

Working with named ranges has the advantage over this explicit form that modifications to the spreadsheet layout repositioning the data range will not make it necessary to modify the model.

Instead of the excel driver that can only be used with an existing MS Excel installation, we can switch to the generic xls driver by modifying the file name to

 "mmsheet.xls:data.xls"

All else, including the driver options and the usage of named ranges, remains unchanged. Note: with a spreadsheet saved in XLSX format (files with the extension .xlsx) we would have to use the driver xlsx.

Data output using excel or xls

The following model duoexc_out.mos writes out the array A into the spreadsheet range F3:H3 that we have called 'MyOutTable3'. These cells denote the first row of the rectangular area into which we wish to write.

model "Duo output (Excel)"
 uses "mmsheet"

 declarations
  A: array(-1..1,5..7) of real
 end-declarations

 A :: [ 2,  4,  6,
       12, 14, 16,
       22, 24, 26]

! Use an initializations block with the excel driver for writing data
 initializations to "mmsheet.excel:data.xls"
  A as "grow;MyOutTable3"
 end-initializations

end-model

In this model we have used the option grow of the excel driver to indicate that the actual output area may grow (= add more rows, the number of selected columns must be sufficient) beyond the specified output range as required by the data. Alternatively, we may also specify the complete output range such as in

 initializations to "mmsheet.excel:data.xls"
  A as '[Sheet1$F3:H11]'
 end-initializations

or more dynamically:

 initializations to "mmsheet.excel:data.xls"
  A as '[Sheet1$F3:H' + (3+A.size-1) + ']'
 end-initializations

If the output range has been defined to include the header row (ODBC-compatible format) we need to use again the option skiph.

 initializations to "mmsheet.excel:data.xls"
  A as 'skiph;grow;MyOutTable1'
 end-initializations

When using the excel or xls/xlsx drivers the definition of the output range in the spreadsheet remains unchanged even if the actual output area exceeds its length. As a consequence, the output from a second model run will start at exactly the same place as the first, overwriting any previous results in the same location (but not deleting any lines if the output from the second run uses fewer rows than the first).

A specific feature of the excel driver is that the Excel spreadsheet file may remain open while writing to it from Mosel. In this case the data written to the spreadsheet does not get saved, enabling the user thus to experiment with model runs without causing any unwanted lasting effects to the output file. However, when using the xls/xlsx drivers, output data is saved directly into the spreadsheet file—the output file needs to be closed if the application used for displaying it locks write access to the file by other programs.

With the excel driver, the output file must exist prior to writing to it from Mosel. The xls/xlsx drivers will create a new spreadsheet file of the corresponding format if the specified file is not found, in which case it is obviously not possible to work with predefined ranges, but the option skiph+ can be used to output a header line, for example as in the following version that writes output into the columns F to H of the first sheet of a new spreadsheet file 'anewfile.xls':

 initializations to "mmsheet.xls:anewfile.xls"
  A as 'skiph+;[1$F:H](Index1,Index2,Value_of_A)'
 end-initializations

Data input using the csv I/O driver

Now assume that the spreadsheet from Section Data input using odbc has been saved in CSV format into the file data.csv. We can then read in these data with the following model duosheet.mos.

model "Duo input (CSV)"
 uses "mmsheet"

 declarations
  A6: dynamic array(range,range) of real
 end-declarations

! Use the csv driver for reading the data
 initializations from "mmsheet.csv:data.csv"
  A6 as '[B3:D6]'
 end-initializations

! Print out the data we have read
 writeln('A6 is: ', A6)

end-model

The CSV format does not support the definition of named ranges and there is no notion of 'worksheet' (when saving an Excel spreadsheet in CSV format the user selects the sheet to be saved). We therefore need to address cell ranges explicitly by indicating their position using the (letter,number) notation as shown above or alternatively, with RC (row-column) notation:

 initializations from "mmsheet.csv:data.csv"
  A6 as '[R3C2:R6C4]'
 end-initializations

Driver options such as skiph to skip the range header line apply as before

 initializations from "mmsheet.csv:data.xls"
  A4 as 'skiph;[B2:D6]'
 end-initializations

As for the spreadsheet drivers, it is also possible to use an empty range definition, in which case all contents of the CSV file is considered as the selected area.

 initializations from "mmsheet.csv:data.csv"
   A6 as '[]'
 end-initializations

If we want to read all the content of columns B to D, skipping the first line, we can use the following shorthand notation (NB: this notation without row numbers, just like the row-column notation works with all mmsheet drivers):

 initializations from "mmsheet.csv:data.xls"
  A4 as 'skiph;[B:D]'
 end-initializations

The main advantage of CSV over Excel format is its greater portability (CSV format is supported on all Xpress platforms) and the possibility to combine the csv driver freely it with other drivers, such as shmem or rmt that are not compatible with the other spreadsheet drivers.

Data output using csv

The following model duosheet_out.mos writes out the array A into the spreadsheet range F3:H3. These cells denote the first row of the rectangular area into which we wish to write.

model "Duo output (CSV)"
 uses "mmsheet"

 declarations
  A: array(-1..1,5..7) of real
 end-declarations

 A :: [ 2,  4,  6,
       12, 14, 16,
       22, 24, 26]

! Use an initializations block with the csv driver for writing data
 initializations to "mmsheet.csv:data.csv"
  A as "grow;[F3:H3]"
 end-initializations

end-model

 initializations to "mmsheet.csv:data.csv"
  A as '[F3:H11]'
 end-initializations

Yet another option that is available with the csv driver is the possibility to output the names of column headers via the option skiph+ (note also that just like xls/xlsx, the csv driver will create the output file if it does not exist). The following version of our output example writes output into the columns F to H of a newly created file 'anewfile.csv' where the first row of each column contains the indicated header text (note that if the file and column headers exist already, the header text is expected to match the existing).

 initializations to "mmsheet.csv:anewfile.csv"
  A as 'skiph+;[F:H](Index1,Index2,Value_of_A)'
 end-initializations

Accessing simple CSV format files via diskdata

The diskdata I/O driver can be used to read or write simple CSV format files. This driver works with a single data table per file and whilst it is possible to make a selection among the columns (selecting columns from left to right only) the driver will always read all rows from the specified file. The diskdata driver operates in a line-wise fashion instead of loading the entire file into memory, which might be an advantage in situations where the available memory is limited. The following model duodd.mos implements the simple example we have seen in the previous sections with the diskdata I/O driver and equally using the diskdata subroutine that supports a more limited set of configuration options than the driver.

model "Duo input (diskdata)"
 uses "mmsheet"

 declarations
  A7,A8: dynamic array(range,range) of real
 end-declarations

! Use the diskdata driver for reading the data
 initializations from "mmetc.diskdata:"
  A7 as 'csv,datadd.csv'
 end-initializations

! Print out the data we have read
 writeln('A7 is: ', A7)

! Use the diskdata subroutine for reading the data
 diskdata(ETC_IN+ETC_CSV, 'datadd.csv', A8)
end-model

With the diskdata I/O driver it is possible to select columns by column numbers or by column headers when reading data:

initializations from "mmetc.diskdata:"
  A7 as 'csv(Index_i,Index_j,Value),skiph,datadd.csv'
  A8 as 'csv(1,2,3),datadd.csv'
 end-initializations

The 'diskdata' functionality can equally be used for writing out basic CSV-format files. With the I/O driver it is possible to configure a field separator or decimal separator character, but it does not support column selection of writing of header as this is the case for the spreadsheet drivers of msheet.

model "Duo output (diskdata)"
 uses "mmetc"

 declarations
  A: array(-1..1,5..7) of real
 end-declarations

 A :: [ 2,  4,  6,
       12, 14, 16,
       22, 24, 26]

! Using the diskdata driver
 initializations to "mmetc.diskdata:"
  A as 'csv,fsep=|;anewfile.csv'
 end-initializations

! Using the diskdata subroutine
 diskdata(ETC_OUT+ETC_CSV, 'anothernewfile.csv', A)
end-model

Working with spreadsheet ranges

In the preceding sections we have seen various forms of how to specify or refer to data ranges in a spreadsheet file. Table Definition of spreadsheet ranges (where applicable with the corresponding formulation for the diskdata I/O driver) provides an overview of the different possibilities supported by the I/O drivers of mmsheet and where applicable also the corresponding form that can be used with the diskadata driver from the module mmetc.

**Table 2:** Definition of spreadsheet ranges (where applicable with the corresponding formulation for the *diskdata* I/O driver)
		excel, xls/xlsx	csv	diskdata
Sheet reference
by name	'[Mysheet$]'	yes	no	no
by counter	'[2$]'	yes	no	no
Named ranges	'Myrange'	yes	no	no
Explicit ranges
entire (first) sheet	'[]'	yes	yes	'csv,myfile.csv'
column range	'[B:D]'	yes	yes	no
rectangular area	'[B3:D6]'	yes	yes	no
row-column notation	'[R3C2:R6C4]'	yes	yes	no
Column selection
by column numbers	'[A:Z](#2,#5,#9)'	yes	yes	'csv(2,5,9),myfile.csv'
(unordered)	'[A:Z](#5,#9,#2)'	yes	yes	no
multiple occurrence	'[A:Z](#2,#2,#9)'	yes	yes	no
by column headers	'skiph;[A:Z](Col1,Col3)'	yes	yes	'csv(Col1,Col3),skiph,myfile.csv'
(unordered)	'skiph;[A:Z](Col3,Col1)'	yes	yes	no

The spreadsheet I/O drivers, including csv, allow users to retrieve size information about the defined ranges by specifying the option 'rangesize' (the forms marked with (*) in the code snippet below can be used with the csv driver)—note that the excel driver always returns all rows, not a row count for the populated area.

declarations
  l,l2,l3,l4,l5,l6,l7,l8,l9: list of integer   ! [height,width] of ranges
end-declarations

initialisations from 'mmsheet.xlsx:datarsz.xlsx'
  l as 'rangesize;[]'                   ! (*) Sizes for populated area in first sheet
  l2 as 'rangesize;[2$]'                ! Sizes for populated area in second sheet
  l3 as 'rangesize;[data$]'             ! Sizes for populated area in sheet 'data'
  l4 as 'rangesize;[a:c]'               ! (*) Sizes for populated area in columns A-C
  l5 as 'rangesize;[A:Z](#2,#7,#5)'     ! (*) Sizes for populated area in columns B-G
  l6 as 'rangesize;myrange'             ! Size of area defined by named range 'myrange'
  l7 as 'rangesize;myrange(#4,#2)'      ! Size of area between col.s 2-4 of 'myrange'
  l8 as 'rangesize;[R1C4:R9C8](#1,#2)'  ! (*) Size of area from selected col.s in range
  l9 as 'rangesize;[D1:H9](#1,#2)'      ! (*) Size of area from selected col.s in range
end-initializations

writeln_("Data rows in spreadsheet range:", l(1), ", number of data columns:", l(2))

© 2001-2020 Fair Isaac Corporation. All rights reserved. This documentation is the property of Fair Isaac Corporation (“FICO”). Receipt or possession of this documentation does not convey rights to disclose, reproduce, make derivative works, use, or allow others to use it except solely for internal evaluation purposes to determine whether to purchase a license to the software described in this documentation, or as otherwise set forth in a written software license agreement between you and FICO (or a FICO affiliate). Use of this documentation and the software described in it must conform strictly to the foregoing permitted uses, and no other use is permitted.

Contents

Index

Glossary