[DDI-SRG] Data formats: locale and language
Wendy Thomas
wlt at pop.umn.edu
Wed Dec 12 10:50:51 EST 2007
Thanks Achim
The case you are stating is much clearer now. I think the examples will
help clarify what changes are needed and where they are needed.
Wendy
On Wed, 12 Dec 2007, Joachim Wackerow wrote:
> I-Lin,
>
> As Wendy pointed out the requirement is to describe data in data files,
> where we have no control over the used representation, see my email on
> data types/data format from last week.
>
> I think it is not a question of going one way (detailed description) or
> another (or NLS support for generic description). My suggestion is to
> provide both ways.
>
> Regarding the small example there is probably a misunderstanding:
> 10.12.2007 is understood in Germany as a date corresponding to the ISO
> format 2007-12-10.
> 10.12.2007 can be understood in USA as a date corresponding to the ISO
> format 2007-10-12, because dates are often written in this order
> month/day/year.
>
> Indeed these are probably very specific cases, but we want to cover also
> these.
>
> Wendy:
> I'm not sure if things like SUN MON should be described on the logical
> level. You are right, SUN can be understood as a code, but to be able to
> make computations with this this code must be converted by an
> appropriate date format into a numeric representation, which is be done
> in statistical packages or database systems. The example with a
> representation for only the week day is probably poor. Imagine a
> representation year-week-weekday, this can be converted in a numeric
> date for computation. I'm not sure if the definition at the logical
> level would be sufficient for that.
>
>
>
> In general I have the impression that data formats should be describable
> by their characteristics but also by NLS attributes. This point I didn't
> mention last week.
>
> Which date formats are common in legacy or current data files? Is the
> use of date/time variables with specific representations common?
>
> The examples approach is reasonable. I can try some, but I'm not sure,
> if this will happen this week.
>
> Achim
>
> Wendy Thomas wrote:
>> I-Lin
>>
>> I think Achim is refering to describing the data as it is contained in
>> the data file (over which we have no entry control). While the cases he
>> discusses are rare they are a definately problem with both legacy data
>> and with data created by others than large scale organizations
>> (creativity reigns supreme).
>>
>> Some of these seem to be storage related issues, but some like the SUN
>> MON type of information seems more related to the Variable description
>> in logical. The question there is, is Sunday Monday etc a
>> DateTimeRepresentation or a CodeRepresentation? Does this get converted
>> a it goes into a specific storage format irregardless of the users
>> creation of say an alternate variable with coding based on the original
>> variable?
>>
>> I think our use of dates within a DDI instance is covered. The question
>> is are the reqested changes for an expansion or change of the
>> DateTimeRepresentation (making sure something is stamped as a specific
>> representation type rather than a generic category, code, numeric, or
>> text response domain? Or is the request to expand or change the
>> representation of specifics to a physical store?
>>
>> Can we get some walk through examples of where the problem lies?
>>
>> Wendy
>>
>>
>>
>>
>>
>> On Wed, 12 Dec 2007, I-Lin Kuo wrote:
>>
>>> Hi Joachim,
>>>
>>> While I understand the intent, I'm not sure that localization covers is
>>> sufficient or the right solution
>>>
>>> First, the ISO date + locale example is not correct. ISO 8601 is locale
>>> neutral, and time elements are arranged in descending order. If 12
>>> represents the month, then the US example 2007-10-12 is not an ISO date.
>>> Secondly, the DateFormatStandardName + locale (=ISO + US) scheme of
>>> identifying formats is not expressive enough to cover 10-DEC-2007 and all
>>> the other possible variations on date that might occur, unless we greatly
>>> expand the set of allowed identifying formats (ORA-US). If we do allow
>>> nonstandard formats, do the formats then mean anything? ORA-US to me
>>> means
>>> Oracle date format, US, but might not mean that to someone else. I would
>>> vote for YYYY-MM-DD for date specifications rather than a name.
>>>
>>> In general, I favor specific markup to specific cases rather than a
>>> general
>>> approach of localization. For money and currency, I would simply prefer
>>> @unitOfCurrency and @decimalDelimiter @thousandsDelimiter to solve the
>>> problem rather than a more general localization approach. This may be
>>> for no
>>> other reason that the country of currency is no longer sufficient to
>>> specify
>>> whether the currency is in marks or euros.
>>>
>>> The other reason I don't favor the localization approach is that for the
>>> data format concern, I see date, number, and currency as the only issues.
>>> The other items on the list at
>>> http://en.wikipedia.org/wiki/Internationalization_and_localization are
>>> all
>>> already covered.
>>>
>>> On Dec 12, 2007 7:42 AM, Joachim Wackerow <joachim.wackerow at gesis.org>
>>> wrote:
>>>
>>>> Looking at the SAS formats I realized that we would need additional
>>>> information like locale and/or language for specific formats.
>>>>
>>>> For example some date formats like a string representation of "day in
>>>> the week". Assuming strings like "SUN" or "MON" in the data file. This
>>>> can be represented by a generic format, but additionally a definition of
>>>> the used language would be necessary.
>>>>
>>>> Similar with dates like 10.12.2007 (in Germany in ISO format 2007-12-10,
>>>> in USA in ISO format 2007-10-12); using a generic format an additional
>>>> information about the locale would be necessary. The alternative would
>>>> be to have a specific format definition for each variation. But then the
>>>> information is lost, that the format is locale dependent.
>>>>
>>>> Reading numeric or monetary values with embedded grouping (or thousands)
>>>> separator and decimal separator is another candidate for localization.
>>>> We have already explicit elements for decimal and grouping separators.
>>>> But a alternate way would be to use a generic numeric format with a
>>>> locale.
>>>>
>>>> The locale and language information should stay at the same place where
>>>> the data format is defined. Both can be seen as attributes of data
>>>> format.
>>>>
>>>> In general I think both ways can make sense: definition of a specific
>>>> format by a name (for a related type) and definition of a generic format
>>>> with attributes like decimal separator.
>>>>
>>>> SPSS has no NLS support, SAS has NLS support, but also old style fixed
>>>> definitions, SQL has also both. When both ways of definitions are
>>>> available, the work of describing the formats seems to be easier. The
>>>> mapping table and the applications using the mapping table are getting
>>>> more complicate. But doing formats without NLS seems to be a bad choice.
>>>>
>>>> Achim
>>>> _______________________________________________
>>>> DDI-SRG mailing list
>>>> DDI-SRG at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>
>>>
>>>
>>>
>>> --
>>> I-Lin Kuo
>>>
>>
>> Wendy L. Thomas Phone: +1 612.624.4389
>> Data Access Core Director Fax: +1 612.626.8375
>> Minnesota Population Center Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>
>
> --
> GESIS - German Social Science Infrastructure Services
> http://www.gesis.org/en/
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>
Wendy L. Thomas Phone: +1 612.624.4389
Data Access Core Director Fax: +1 612.626.8375
Minnesota Population Center Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
More information about the DDI-SRG
mailing list