[DDI-SRG] Data formats: locale and language

Joachim Wackerow joachim.wackerow at gesis.org
Wed Dec 12 08:42:44 EST 2007


Looking at the SAS formats I realized that we would need additional 
information like locale and/or language for specific formats.

For example some date formats like a string representation of "day in 
the week". Assuming strings like "SUN" or "MON" in the data file. This 
can be represented by a generic format, but additionally a definition of 
the used language would be necessary.

Similar with dates like 10.12.2007 (in Germany in ISO format 2007-12-10, 
in USA in ISO format 2007-10-12); using a generic format an additional 
information about the locale would be necessary. The alternative would 
be to have a specific format definition for each variation. But then the 
information is lost, that the format is locale dependent.

Reading numeric or monetary values with embedded grouping (or thousands) 
separator and decimal separator is another candidate for localization. 
We have already explicit elements for decimal and grouping separators. 
But a alternate way would be to use a generic numeric format with a locale.

The locale and language information should stay at the same place where 
the data format is defined. Both can be seen as attributes of data format.

In general I think both ways can make sense: definition of a specific 
format by a name (for a related type) and definition of a generic format 
with attributes like decimal separator.

SPSS has no NLS support, SAS has NLS support, but also old style fixed 
definitions, SQL has also both. When both ways of definitions are 
available, the work of describing the formats seems to be easier. The 
mapping table and the applications using the mapping table are getting 
more complicate. But doing formats without NLS seems to be a bad choice.

Achim


More information about the DDI-SRG mailing list