[DDI-SRG] Data formats: locale and language
Wendy Thomas
wlt at pop.umn.edu
Wed Dec 12 10:04:51 EST 2007
I-Lin
I think Achim is refering to describing the data as it is contained in the
data file (over which we have no entry control). While the cases he
discusses are rare they are a definately problem with both legacy data and
with data created by others than large scale organizations (creativity
reigns supreme).
Some of these seem to be storage related issues, but some like the SUN
MON type of information seems more related to the Variable description in
logical. The question there is, is Sunday Monday etc a
DateTimeRepresentation or a CodeRepresentation? Does this get converted a
it goes into a specific storage format irregardless of the users creation
of say an alternate variable with coding based on the original variable?
I think our use of dates within a DDI instance is covered. The question is
are the reqested changes for an expansion or change of the
DateTimeRepresentation (making sure something is stamped as a specific
representation type rather than a generic category, code, numeric, or text
response domain? Or is the request to expand or change the representation
of specifics to a physical store?
Can we get some walk through examples of where the problem lies?
Wendy
On Wed, 12 Dec 2007, I-Lin Kuo wrote:
> Hi Joachim,
>
> While I understand the intent, I'm not sure that localization covers is
> sufficient or the right solution
>
> First, the ISO date + locale example is not correct. ISO 8601 is locale
> neutral, and time elements are arranged in descending order. If 12
> represents the month, then the US example 2007-10-12 is not an ISO date.
> Secondly, the DateFormatStandardName + locale (=ISO + US) scheme of
> identifying formats is not expressive enough to cover 10-DEC-2007 and all
> the other possible variations on date that might occur, unless we greatly
> expand the set of allowed identifying formats (ORA-US). If we do allow
> nonstandard formats, do the formats then mean anything? ORA-US to me means
> Oracle date format, US, but might not mean that to someone else. I would
> vote for YYYY-MM-DD for date specifications rather than a name.
>
> In general, I favor specific markup to specific cases rather than a general
> approach of localization. For money and currency, I would simply prefer
> @unitOfCurrency and @decimalDelimiter @thousandsDelimiter to solve the
> problem rather than a more general localization approach. This may be for no
> other reason that the country of currency is no longer sufficient to specify
> whether the currency is in marks or euros.
>
> The other reason I don't favor the localization approach is that for the
> data format concern, I see date, number, and currency as the only issues.
> The other items on the list at
> http://en.wikipedia.org/wiki/Internationalization_and_localization are all
> already covered.
>
> On Dec 12, 2007 7:42 AM, Joachim Wackerow <joachim.wackerow at gesis.org>
> wrote:
>
>> Looking at the SAS formats I realized that we would need additional
>> information like locale and/or language for specific formats.
>>
>> For example some date formats like a string representation of "day in
>> the week". Assuming strings like "SUN" or "MON" in the data file. This
>> can be represented by a generic format, but additionally a definition of
>> the used language would be necessary.
>>
>> Similar with dates like 10.12.2007 (in Germany in ISO format 2007-12-10,
>> in USA in ISO format 2007-10-12); using a generic format an additional
>> information about the locale would be necessary. The alternative would
>> be to have a specific format definition for each variation. But then the
>> information is lost, that the format is locale dependent.
>>
>> Reading numeric or monetary values with embedded grouping (or thousands)
>> separator and decimal separator is another candidate for localization.
>> We have already explicit elements for decimal and grouping separators.
>> But a alternate way would be to use a generic numeric format with a
>> locale.
>>
>> The locale and language information should stay at the same place where
>> the data format is defined. Both can be seen as attributes of data format.
>>
>> In general I think both ways can make sense: definition of a specific
>> format by a name (for a related type) and definition of a generic format
>> with attributes like decimal separator.
>>
>> SPSS has no NLS support, SAS has NLS support, but also old style fixed
>> definitions, SQL has also both. When both ways of definitions are
>> available, the work of describing the formats seems to be easier. The
>> mapping table and the applications using the mapping table are getting
>> more complicate. But doing formats without NLS seems to be a bad choice.
>>
>> Achim
>> _______________________________________________
>> DDI-SRG mailing list
>> DDI-SRG at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>
>
>
>
> --
> I-Lin Kuo
>
Wendy L. Thomas Phone: +1 612.624.4389
Data Access Core Director Fax: +1 612.626.8375
Minnesota Population Center Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
More information about the DDI-SRG
mailing list