Modeling in Extended MPS Format
Basic formulation
Standard MPS format uses a fixed format text file to hold the problem information. Extended MPS format has two main differences from the standard form:
- The records in the file are free-format – that is, the fields are not necessarily in fixed columns or of fixed size, and each field is delimited by one or more spaces.
- The standard MPS format allows only numbers to be used in the "coefficient" fields – extended MPS format allows the use of formulae.
- There is an optional extra section in extended MPS format, holding additional data and structures for Xpress NonLinear.
We shall tend to use a fairly fixed format, to aid readability.
NAME POLYGON |
The first record of any MPS file is the NAME record, which has the name which may be used to create file names where no other name is specified, and is also written into the matrix and solution files.
ROWS |
The ROWS record introduces the list of rows of the problem – this includes the objective function as well as all the constraints.
N OBJ E OBJEQ G T2T1 G T3T2 G T4T3 L V1V2 L V1V3 L V2V3 L V2V4 L V3V4
The first character denotes the type of constraint. The possible values are:
N | not constraining (always used for the objective function, but may be used elsewhere). |
E | equality: the left hand side (LHS) is equal to the right hand side (RHS). |
L | less than or equal to: the LHS is less than or equal to the RHS. |
G | greater than or equal to: the LHS is greater than or equal to the RHS. |
The second field is the name used for the constraint. In MPS file format, everything has a name. Therefore, within each type of entity (rows, columns, etc) each name must be unique. In general, you should try to ensure that names are unique across all entities, to avoid possible confusion.
You should also try to make the names meaningful, so that you can understand what they mean.
In the example: | |
OBJ | is the objective function. |
OBJEQ | is the "equality" version of the objective function which, as explained below, is required because we are trying to optimize a non-linear objective. |
TiTj | is the constraint that will ensure θi ≥ θj (j=i-1). |
ViVj | is the constraint that will ensure that the distance between Vi and Vj is ≤ 1. |
COLUMNS |
The COLUMNS record introduces the list of columns and coefficients in the matrix. In a normal linear problem, all the variables will appear explicitly as columns in this section. However, in non-linear problems, it is possible for variables to appear only in formulae and so they may not appear explicitly. In the example, the variables THETA1 to THETA4 appear explicitly, the variables RHO1 to RHO4 appear only in formulae. Constraints which involve only one variable in a linear way (that is, they limit the value of a variable to a minimum value, a maximum value or both – possibly equal – values) are usually put in a separate "BOUNDS" section which appears later.
OBJX OBJ 1.0 OBJX OBJEQ -1.0
The first field is the name of the column. All "COLUMNS" records for a column must be together. The second field is the name of the row (which was defined in the ROWS section). The third field is the value. It is not necessary to include zero values – only the non-zeros are required
If the coefficients are constant, then it is possible to put two on each record, by putting a second row name and value after the first (as in the example for THETA2 and THETA3 below).
The constraints putting θi in order are all linear – that is, the coefficients are all constant.
THETA1 T2T1 -1 THETA2 T2T1 1 T3T2 -1 THETA3 T3T2 1 T4T3 -1 THETA4 T4T3 1
The RHS of any constraint must be constant. Therefore, to write THETA2 ≥ THETA1, we must actually write THETA2 - THETA1 ≥ 0. The constraint T2T1 has coefficient -1 in THETA1 and +1 in THETA2.
We want to maximise the area of the polygon. The formula for this is the sum of the areas of the triangles with one vertex at V5 – i.e.:
0.5 * RHO1 * RHO2 * SIN ( THETA2 - THETA1 ) + 0.5 * RHO2 * RHO3 * SIN ( THETA3 - THETA2 ) + 0.5 * RHO3 * RHO4 * SIN ( THETA4 - THETA3 )
– which is a non-linear function. Xpress NonLinear does not itself have a problem with non-linear objective functions, but Xpress distinguishes between the original N-type row which contains the objective function coefficients when the matrix is read in, and the objective function which is actually optimized. To avoid any confusion between these two "objectives", Xpress NonLinear also requires that the objective function as passed to Xpress Optimizer is linear. What we want to do is:
We create a new variable – called in this example OBJX – and write:
and then: maximize OBJX, where OBJX is just a variable.
The constraint linking OBJX and AREA was defined as the equality constraint OBJEQ in the ROWS section, and AREA is the formula given above. This is where the coefficient of -1 in column OBJX comes from.
Every item in the matrix has to be in a coefficient – that is, it is the multiplier of a variable. However, the formula for area, as written, is not a coefficient of anything. There are several ways of dealing with this situation. We shall start by breaking the formula up into coefficient form – that is, to write it as X1*formula1 + X2*formula2 + .... Our formula could then be:
RHO1 * ( 0.5 * RHO2 * SIN ( THETA2 - THETA1 ) ) + RHO2 * ( 0.5 * RHO3 * SIN ( THETA3 - THETA2 ) ) + RHO3 * ( 0.5 * RHO4 * SIN ( THETA4 - THETA3 ) )
which is of the right form and can be written in the COLUMNS section as follows:
RHO1 OBJEQ = 0.5 * RHO2 * SIN ( THETA2 - THETA1 ) RHO2 OBJEQ = 0.5 * RHO3 * SIN ( THETA3 - THETA2 ) RHO3 OBJEQ = 0.5 * RHO4 * SIN ( THETA4 - THETA3 )
Notice that the formula begins with an equals sign. When this is used in the coefficient field, it always means that a formula is being used rather than a constant. The formula must be written on one line – it does not matter how long it is – and each token (variable, constant, operator, bracket or function name) must be delimited by spaces.
When a formula is used, you can only write one coefficient on the record – the option of a second coefficient only applies when both coefficients are constants.
The constraints for the distances between pairs of vertices are relationships of the form:
RHO1 * RHO1 + RHO2 * RHO2 - 2 * RHO1 * RHO2 * COS ( THETA2 - THETA1 ) <= 1
These can again be split into coefficients, for example:
RHO1 * ( RHO1 - 2 * RHO2 * COS ( THETA2 - THETA1 ) ) + RHO2 * ( RHO2 )
This looks a little strange, because RHO2 appears as a coefficient of itself, but that is perfectly all right. This section of the matrix contains a set of records (one for each of the ViVj constraints) like this:
RHO1 V1V2 = RHO1 - 2 * RHO2 * COS ( THETA2 - THETA1 ) RHO2 V1V2 = RHO2
Note that because the records for each column must all appear together, the coefficients for – for example – RHO1 in this segment must be merged in with those in the previous (OBJEQ) segment.
RHS |
The RHS record introduces the right hand side section.
The RHS section is formatted very much like a COLUMNS section with constant coefficients. There is a column name – it is actually the name of the right hand side – and then one or two entries per record. Again, only the non-zero entries are actually required.
RHS1 T2T1 .001 T3T2 .001 RHS1 T4T3 .001 V1V2 1 RHS1 V1V3 1 V1V4 1 RHS1 V2V3 1 V2V4 1 RHS1 V3V4 1
RHS1 is the name we have chosen for the right hand side. It is possible – although beyond the scope of this guide – to have more than one right hand side, and to select the one you want. Note that, in order to ensure we do have a polygon with N sides, we have made the relationship between theta(i) and theta(i-1) a strict inequality by adding 0.001 as the right hand side. If we did not, then two of the vertices could coincide and so the polygon would effectively lose one of its sides.
BOUNDS |
The BOUNDS record introduces the BOUNDS section which typically holds the values of constraints which involve single variables.
Like the RHS section, it is possible to have more than one set of BOUNDS, and to select the one you want to use. There is therefore in each record a bound name which identifies the set of bounds to which it belongs. We shall be using only ones set of bounds, called BOUND1.
Bounds constrain a variable by providing a lower limit or an upper limit to its value. By providing a limit of -∞ for the lower bound, it is possible to create a variable which can take on any value – a "free" variable. The following bound types are provided:
LO | a lower bound. |
UP | an upper bound. |
FX | a fixed bound (the upper and lower limits are equal). |
FR | a free variable (no lower or upper limit). |
MI | a "minus infinity" variable – it can take on any non-positive value. |
There are other types of bound which are used with integer programming, which is beyond the scope of this guide.
FR BOUND1 OBJX LO BOUND1 RHO1 0.01 UP BOUND1 RHO1 1 LO BOUND1 RHO2 0.01 UP BOUND1 RHO2 1 LO BOUND1 RHO3 0.01 UP BOUND1 RHO3 1 LO BOUND1 RHO4 0.01 UP BOUND1 RHO4 1 UP BOUND1 THETA4 3.1415926
A record in a BOUNDS section can contain up to four fields. The first one is the bound type (from the list above). The second is the name of the BOUNDS set being used (ours is always BOUND1). The third is the name of the variable or column being bounded. Unless the bound type is FR or MI, there is a fourth field which contains the value of the bound.
Although we know that the area is always positive (or at least non-negative), a more complicated problem might have an objective function which could be positive or negative – you could make a profit or a loss – and so OBJX needs to be able to take on po sitive and negative values. The fact that it is marked as "free" here does not mean that it can actually take on any value, because it is still constrained by the rest of the problem.
The upper bounds on RHO1 to RHO4 provide the rest of the restrictions which ensure that the distances between any two vertices are = 1, and the limit on THETA4 ensures that the whole polygon is above the X-axis. Just to make sure that we do not "lose" a side because the value of RHOi becomes zero, we set a lower bound of 0.01 on all the rhos, performing a similar function to the RHS values of .001 for TiTj.
ENDATA |
The last record in the file is the ENDATA record.
Although this is sufficient to define the model, it is usually better to give Xpress NonLinear some idea of where to start – that is, to provide a set of initial values for the variables. You do not have to provide values for everything, but you should try to provide them for every variable which appears in a non-linear coefficient, or which has a non-linear coefficient. In our current example, that means everything except OBJX.
SLPDATA |
The SLPDATA record introduces a variety of different special items for Xpress NonLinear. It comes as the last section in the model (before the ENDATA record). We are using it at this stage for defining initial values. These are done with an IV record.
IV IVSET1 RHO1 0.555 IV IVSET1 RHO2 0.888 IV IVSET1 RHO3 1 IV IVSET1 RHO4 0.888
Just as with the RHS and BOUNDS sections, it is possible to have more than one set of initial values – perhaps because the same structure is used to solve a whole range of problems where the answers are so different that it does not make much sense to start always from the same place. In this example, we are using only one set – IVSET1.
The IV record contains four fields. The first one is IV, which indicates the type of SLPDATA being provided. The second is the name of the set of initial values. The third is the name of the variable and the fourth is the value being provided.
In the case of IV records, it is possible – and indeed perhaps necessary – to provide initial values which are zero. The default value (which is used if no value is provided) is not zero, so if you want to start with a zero value you must say so.
Using the XSLP console-based interface
XSLP is a data-driven console-based interface for operating Xpress NonLinear, an extension of the Xpress Optimzier console.
The example will use screen-based input and output. You can also put the commands into a file and execute it in batch mode, or use the embedded TCL scripting language.
Commands are not case-sensitive except where the case is important (for example, the name of the objective function). We shall use upper case for commands and lower case for the arguments which would change for other models. Each parameter in a command must be separated by at least one space from the preceding parameter or command.
XSLP |
This starts the XSLP program. This checks for the existence of the Xpress Optimizer and Xpress NonLinear DLLs. If you are using an OEM version of the Xpress DLL, you may need a special password or license file from your usual supplier.
READPROB polygon |
This reads a non-linear problem from the file polygon.mat.
MAXIM |
This form of the maximize command does a non-linear optimization with the default settings of all the parameters (it will recognise the problem as an SLP one automatically).
WRITEPRTSOL |
This will use the normal Xpress function to write to solution in a text form to a file with the same name as the input, but with a ".prt" suffix.
Q |
This (the abbreviation for the QUIT command) terminates the XSLP console program.
Coefficients and terms
So far we have managed to express the formulae as coefficients. However, there are constraints – for example SIN(A) ≤ 0.5 – which cannot be expressed directly using coefficients. The extended MPS format has a special reserved column name – the equals sign – which is effectively a variable with a fixed value of 1.0, and which can be used to hold formulae of any type, whether they can be expressed as coefficients or not. The area formula and distance constraints could all be written in a more readable form by using the "equals column". The area formula is rather long to write in this guide, but the distance constraints look like this:
= V1V2 RHO1 * RHO1 + RHO2 * RHO2 - 2 * RHO1 * RHO2 * COS ( THETA2 - THETA1 ) = V1V3 RHO1 * RHO1 + RHO3 * RHO3 - 2 * RHO1 * RHO3 * COS ( THETA3 - THETA1 )
User functions
In this example, the most complicated function is the area calculation, and it is not a problem to model it explicitly as a formula. However, there are cases when it is not possible to do so, or when it is undesirable to do so – for example, when the formula is very large or contains conditional evaluations, or when it is simply easier to write it as an iterative calculation (in a do-loop) rather than explicitly. This section of the User Guide shows how to extend the Polygon model to calculate the area using a "user function".
A user function is essentially a function which is not built in to Xpress NonLinear. It can be written in a language such as C, and compiled into a DLL; it can be written as a set of formulae in an Excel spreadsheet (with or without a macro as well); it can be written entirely within an Excel macro. This example shows the area function written as an Excel macro.
A user function in an Excel macro
This is a function written as an Excel macro, in the sheet Sheet1 of the Excel workbook C:\xpressmp\examples\slp\spreadsheet\Polygon.xls.
Function Area(Values() As Variant, nArgs() As Variant) As Double n = nArgs(0) i = 3 Total = 0 For Count = 1 To n Rho1 = Values(i - 3) Theta1 = Values(i - 2) Rho2 = Values(i - 1) Theta2 = Values(i) Total = Total + 0.5 * Rho1 * Rho2 * Sin(Theta2 - Theta1) i = i + 2 If i > n Then Exit For Next Count Area = Total End Function
It takes two arguments, both arrays of type Variant (a general-purpose type which can contain any type of data). It returns a single value of type Double.
This calculates the area for a polygon with any number of sides, by iterating through all the adjacent triangles. The array Values contains pairs of items in the order RHO1, THETA1, RHO2, THETA2, etc. The first loop calculates the area between (RHO2,THETA2) and (RHO1,THETA1). Subsequent loops then add the area of the next triangle.
Notice that all the arrays which communicate with Xpress NonLinear count from zero.
In this example, we are calculating only one value, and so there is only one item to return. A more complicated function might calculate and return more than one value (for example, the circumference and the area). In such a case, the function must return an array of type Double, as in the abbreviated example below:
Dim DArray(1) As Double Function ArrayArea(Values() As Variant, nArgs() As Variant) As Double() ... DArray(0) = Total DArray(1) = Circum Area = DArray End Function
Extending the polygon model
The model needs to be modified slightly in order to use the new function. There are two parts – using the function in the model; and declaring the function and explaining how the interface works.
To use the function in the model, we give it a name – say "PolyArea". We can then use it like any other function.
PolyArea ( RHO1 , THETA1 , RHO2 , THETA2 , RHO3 , THETA3 , RHO4 , THETA4 )
The arguments RHO1 up to THETA4 are in the order that the function expects.
If the function returns an array, then we have to specify which item in the array is the one we want. In our case, there is only one value, and it is the first. The formula for the area would then become:
PolyArea ( RHO1 , THETA1 , RHO2 , THETA2 , RHO3 , THETA3 , RHO4 , THETA4 : 1 )
The colon (":") indicates that the next item specifies which array value is required. The number "1" indicates the first item.
The OBJEQ constraint will now have only two items – the OBJX entry and the new PolyArea function, which will be a coefficient of the special equals column. The relevant piece of the MPS file is:
OBJX OBJEQ -1 = OBJEQ = PolyArea ( RHO1 , THETA1 , RHO2 , THETA2 , RHO3 , THETA3 , RHO4 , THETA4 )
The function declaration is made in the SLPDATA section, using a record of type UF. There are several fields which can be used, but not all of them are necessary in this case.
UF PolyArea = Area ( VARIANT , VARIANT ) XLF = C:\Xpress...\Polygon.xls = Sheet1
The fields we have are as follows:
UF | indicates this is a user function declaration. |
PolyArea | the name of the function as used within the model. |
Area | the name of the function as used in the spreadsheet. If it is the same as that used in the model, it can be omitted (in which case the "=" sign is omitted as well). |
VARIANT | the arguments in brackets indicate the number and type of the arguments. For Excel macros, the type is always VARIANT, and the first two arguments are the array of values and the number of items in the array. |
XLF | indicates an Excel macro function (as opposed to spreadsheet formulae or a DLL). |
C:\Xpress.. | the name of the spreadsheet containing the macro (we've had to abbreviate the full path to fit on the page – the full name is in the file in the examples. |
Sheet1 | the name of the sheet containing the macro. |
Notice that the declaration does not itself say whether the function returns an array or a single item. Xpress NonLinear deduces this from the form of the function reference itself (whether or not there is a return item number).
The model can now be run using the Excel macro to calculate the values instead of using a formula inside the model itself.