How can I convert the character encoding of a text file?
Text format data files (other than the Mosel initializations format for which the !@encoding marker can be used) such as CSV files or files accessed via fopen that do not use UTF-8 encoding need to be converted with the 'enc:' prefix when accessing them from within a Mosel model.
Example:
! Encoding names are operating system dependent, eg CP1252, ISO88591 fopen(enc:GB18030,testdata.txt", F_INPUT)
It is usually preferrable to specify the encoding used by a data file as shown above, but Mosel also implements shorthands for encodings configured on the system running the model.
! Encoding aliases: ! raw, sys, wchar, fname, tty, ttyin, stdin, stdout, stderr initializations to "enc:sys,mmsheet.csv:testoutput.csv" ... end-initializations
Using the prefix enc:sys means that the default system encoding is employed (which corresponds to the behaviour of Mosel versions prior to Mosel 4).
On the API level, you can use the XPRNLS library to convert to/from UTF-8 encoding (please see the reference manual XPRNLS command tool and library for the full documentation of its functionality):
- this library is platform independent and has no external dependency
- it handles encoding conversions between UTF-8 and local encodings
- it implements UTF-8/16/32(LE+BE), ISO-8859-1/15, ASCII, CP1252
- other supported encodings depend on the operating system
// Open a file using the C function 'fopen' with a file name coming from Mosel f = fopen(XNLSconvstrto(XNLS_ENC_SYS,filename,-1,NULL),"r");
Alternatively, you can use the XPRNLS command tool for converting the character encoding of text files between any two supported encodings:
xprnls conv -f CP1252 -t UTF8 -o outfile.txt myfile.txt
Note: you can display the list of the available xprnls commands by entering
xprnls
at the command prompt.