Dense vs. sparse data format

All examples we have seen so far use sparse data format, i.e., every data entry in the database tables or spreadsheets is given with its complete index tuple. If the index set(s) of an array are defined in the model or fixed otherwise it is also possible to work with data in dense format, i.e., just the data entries without their index tuples.

The following example (file indexeg.mos) shows how data tables given in different formats may be read in by a Mosel model. We have enabled the `debug' option to see the SQL statements generated by Mosel.

model ODBCImpEx
 uses "mmodbc"

 declarations
  A: array(1..3, 1..2) of real
  B: array(1..2, 1..3) of real
  C: array(1..2, 1..3) of real
  CSTR: string
 end-declarations

 CSTR:= 'mmodbc.odbc:debug;indexeg.sqlite'

 ! Data must be dense - there are not enough columns to serve as index!
 initializations from CSTR
  A as 'Range3by2'
 end-initializations

 forall(i in 1..3)
  writeln("Row(",i,"): ", A(i,1), " ", A(i,2))

 ! Dense data
 initializations from CSTR
  B as 'noindex;Range2by3'
 end-initializations

 forall(i in 1..2)
  writeln("Row(",i,"): ", B(i,1), " ", B(i,2), " ", B(i,3))

 ! Indexed data
 initializations from CSTR
  C as 'Range2by3i'
 end-initializations

 forall(i in 1..2)
  writeln("Row(",i,"): ", C(i,1), " ", C(i,2), " ", C(i,3))

end-model

The first array, A is read from a table that helds data in dense format—just the data, no indices:


First	Second
1.2	2.2
2.1	2.2
3.1	4.4

This table has two columns and three rows. Since data in sparse format would require at least three columns, there is no confusion possible between the two formats and mmodbc will deduce automatically the correct format to use. (However, if you wish to document clearly which data format is expected you may choose to add the option `noindex' as in the following case.)

If an input table has a sufficiently large number of columns to serve as indices as is the case for the table Range2by3 and array B, then the situation is ambiguous and we need to state explicitly that no indices are specified by using the option `noindex'. Otherwise mmodbc will use its default (namely sparse format), which will lead to an error message in the present example since the data type (real) of the first two columns does not correspond to the type of the indices (integer) indicated in the model:


First	Second	Third
1.2	1.2	1.3
2.1	2.2	2.3

The third case, table Range2by3i and array C, corresponds to the (default) format that we have already seen in the previous examples—each data item is preceded by its index tuple. This table defines exactly the same data as the previous one:


Firsti	Secondi	Value
1	2	1.1
1	2	1.2
1	3	1.3
2	2	2.1
2	2	2.2
2	3	2.3

The same model as above may be rewritten with SQL commands instead of initializations blocks (the driver option `noindex' is replaced by resetting the value of the mmodbc parameter SQLndxcol):

model ODBCImpEx2
 uses "mmodbc"

 declarations
  A: array(1..3, 1..2) of real
  B: array(1..2, 1..3) of real
  C: array(1..2, 1..3) of real
  CSTR: string
 end-declarations

 CSTR:= 'indexeg.sqlite'

 SQLconnect(CSTR)
 setparam("SQLdebug",true)

 ! Data must be dense - there are not enough columns to serve as index!
 SQLexecute("select * from Range3by2 ", A)
 forall(i in 1..3)
  writeln("Row(",i,"): ", A(i,1), " ", A(i,2))

 setparam("SQLndxcol", false)      ! Dense data
 SQLexecute("select * from Range2by3 ", B)
 forall(i in 1..2)
  writeln("Row(",i,"): ", B(i,1), " ", B(i,2), " ", B(i,3))

 setparam("SQLndxcol", true)       ! Indexed data
 SQLexecute("select * from Range2by3i ", C)
 forall(i in 1..2)
  writeln("Row(",i,"): ", C(i,1), " ", C(i,2), " ", C(i,3))

 SQLdisconnect

end-model

If using the excel driver instead of an ODBC connection, we need to use the noindex option also with the first data table ('Range3by2') since this driver does not do any 'guessing' about the table format.

model ODBCImpEx3
 uses "mmodbc"

 declarations
  A: array(1..3, 1..2) of real
  B: array(1..2, 1..3) of real
  C: array(1..2, 1..3) of real
  CSTR: string
 end-declarations

 CSTR:= 'mmsheet.excel:indexeg.xls'

 ! Dense data ('noindex'), skipping the header line ('skiph')
 initializations from CSTR
  A as 'skiph;noindex;Range3by2'
 end-initializations

 forall(i in 1..3)
  writeln("Row(",i,"): ", A(i,1), " ", A(i,2))

 ! Dense data
 initializations from CSTR
  B as 'skiph;noindex;Range2by3'
 end-initializations

 forall(i in 1..2)
  writeln("Row(",i,"): ", B(i,1), " ", B(i,2), " ", B(i,3))

 ! Indexed data
 initializations from CSTR
  C as 'skiph;Range2by3i'
 end-initializations

 forall(i in 1..2)
  writeln("Row(",i,"): ", C(i,1), " ", C(i,2), " ", C(i,3))

end-model

Multidimensional tables in rectangular format

Quite frequently, particularly when working with spreadsheets, multidimensional input data arrays are formatted in 2-dimensional (rectangular) form. For example, take a look at the following table: we want to populate an array A_ijkwith 3 dimensions from the data held in seven columns, where the first 2 columns are indices (i and j) and the remaining columns are data values corresponding to different index values for the third and last index k of the array.


Firsti	Secondi	Value_1	Value_2	Value_3	Value_4	Value_5
2	B	22.1	22.2	22.3	22.4	22.5
1	D	14.1	14.2	14.3	14.4	14.5

This table can be read by the following Mosel code (see example file threedimarr.mos). Notice that the contents of the first two index sets is not defined in the model—their contents is read with the input data. However, the last index k that is written 'across the table columns' needs to be defined in the model and it has to be of type range (that is, using a set type that has an ordering) to make sure that the entries will be read in the same order as expected by the model. With this data format we need to use the partndx option of the I/O drivers to indicate that not all indices are to be read in with the input data. Furthermore, as the input data only defines values for a few index tuples, we use a dynamic array that will only contain those elements for which values are specified in the data file.

model "ThreeDimArr"
 uses "mmodbc"

 declarations
  I: range
  J: set of string
  K = 1..5	              ! The last index set must be defined in the model	
  A: dynamic array(I,J,K) of real
 end-declarations

 initializations from 'mmsheet.excel:partndx;threedim.xls'
  A as 'Tab_23'
 end-initializations

 writeln("A: ")
 forall(i in I, j in J, k in K | exists(A(i,j,k)))
  writeln("A(", i, ",", j, ",", k, "): ", A(i,j,k))

end-model

To read the same data with the odbc driver we can use

 initializations from 'mmodbc.odbc:partndx;threedim.sqlite'
  A as 'Tab_23'
 end-initializations

and to obtain the corresponding version using SQL statements, the initializations block can be replaced by the following lines.

 declarations
  Idx3: text
 end-declarations

 SQLconnect("threedim.sqlite")

 setparam("SQLndxcol", false)   ! Partially indexed data

 forall(k in K) Idx3+= (", Value_"+k )
 SQLexecute("select Firsti, Secondi" + Idx3 + " from Tab_23", A)
 SQLdisconnect

Contents

Index

Glossary

Search Results

Dense vs. sparse data format

Multidimensional tables in rectangular format