[DDI-SRG] Data formats: locale and language
Pascal Heus
pascal.heus at gmail.com
Wed Dec 12 12:15:44 EST 2007
Wendy:
I assume this will all take place at the format level in the
PhysicalDataProduct. I agree that we may have an language issue when it
comes to alphanumeric days or months (like JAN,FEV,MAR,AVR or
lundi,mardi,mercredi in French...). Could have a Format = something like
DD-MMM-YYYY with an extra xml:lang="FR"?
A more formal option would be have a mechanism to declare enumerations
for the date type (and possibly currencies). Maybe something like:
<DateFormat expression="MMM" type="month" lang="FR">
<Jan>JAN</Jan><Feb>FEV</Feb>....
</DateFormat>
This could be reusable though so could also be stored elsewhere and
referenced (in a translation section?).
For separators (comma/dot), I think we already have that in the
PhysicalDataProduct.
later,
*P
Wendy Thomas wrote:
> Thanks Achim
>
> The case you are stating is much clearer now. I think the examples will
> help clarify what changes are needed and where they are needed.
>
> Wendy
>
> On Wed, 12 Dec 2007, Joachim Wackerow wrote:
>
>
>> I-Lin,
>>
>> As Wendy pointed out the requirement is to describe data in data files,
>> where we have no control over the used representation, see my email on
>> data types/data format from last week.
>>
>> I think it is not a question of going one way (detailed description) or
>> another (or NLS support for generic description). My suggestion is to
>> provide both ways.
>>
>> Regarding the small example there is probably a misunderstanding:
>> 10.12.2007 is understood in Germany as a date corresponding to the ISO
>> format 2007-12-10.
>> 10.12.2007 can be understood in USA as a date corresponding to the ISO
>> format 2007-10-12, because dates are often written in this order
>> month/day/year.
>>
>> Indeed these are probably very specific cases, but we want to cover also
>> these.
>>
>> Wendy:
>> I'm not sure if things like SUN MON should be described on the logical
>> level. You are right, SUN can be understood as a code, but to be able to
>> make computations with this this code must be converted by an
>> appropriate date format into a numeric representation, which is be done
>> in statistical packages or database systems. The example with a
>> representation for only the week day is probably poor. Imagine a
>> representation year-week-weekday, this can be converted in a numeric
>> date for computation. I'm not sure if the definition at the logical
>> level would be sufficient for that.
>>
>>
>>
>> In general I have the impression that data formats should be describable
>> by their characteristics but also by NLS attributes. This point I didn't
>> mention last week.
>>
>> Which date formats are common in legacy or current data files? Is the
>> use of date/time variables with specific representations common?
>>
>> The examples approach is reasonable. I can try some, but I'm not sure,
>> if this will happen this week.
>>
>> Achim
>>
>> Wendy Thomas wrote:
>>
>>> I-Lin
>>>
>>> I think Achim is refering to describing the data as it is contained in
>>> the data file (over which we have no entry control). While the cases he
>>> discusses are rare they are a definately problem with both legacy data
>>> and with data created by others than large scale organizations
>>> (creativity reigns supreme).
>>>
>>> Some of these seem to be storage related issues, but some like the SUN
>>> MON type of information seems more related to the Variable description
>>> in logical. The question there is, is Sunday Monday etc a
>>> DateTimeRepresentation or a CodeRepresentation? Does this get converted
>>> a it goes into a specific storage format irregardless of the users
>>> creation of say an alternate variable with coding based on the original
>>> variable?
>>>
>>> I think our use of dates within a DDI instance is covered. The question
>>> is are the reqested changes for an expansion or change of the
>>> DateTimeRepresentation (making sure something is stamped as a specific
>>> representation type rather than a generic category, code, numeric, or
>>> text response domain? Or is the request to expand or change the
>>> representation of specifics to a physical store?
>>>
>>> Can we get some walk through examples of where the problem lies?
>>>
>>> Wendy
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 12 Dec 2007, I-Lin Kuo wrote:
>>>
>>>
>>>> Hi Joachim,
>>>>
>>>> While I understand the intent, I'm not sure that localization covers is
>>>> sufficient or the right solution
>>>>
>>>> First, the ISO date + locale example is not correct. ISO 8601 is locale
>>>> neutral, and time elements are arranged in descending order. If 12
>>>> represents the month, then the US example 2007-10-12 is not an ISO date.
>>>> Secondly, the DateFormatStandardName + locale (=ISO + US) scheme of
>>>> identifying formats is not expressive enough to cover 10-DEC-2007 and all
>>>> the other possible variations on date that might occur, unless we greatly
>>>> expand the set of allowed identifying formats (ORA-US). If we do allow
>>>> nonstandard formats, do the formats then mean anything? ORA-US to me
>>>> means
>>>> Oracle date format, US, but might not mean that to someone else. I would
>>>> vote for YYYY-MM-DD for date specifications rather than a name.
>>>>
>>>> In general, I favor specific markup to specific cases rather than a
>>>> general
>>>> approach of localization. For money and currency, I would simply prefer
>>>> @unitOfCurrency and @decimalDelimiter @thousandsDelimiter to solve the
>>>> problem rather than a more general localization approach. This may be
>>>> for no
>>>> other reason that the country of currency is no longer sufficient to
>>>> specify
>>>> whether the currency is in marks or euros.
>>>>
>>>> The other reason I don't favor the localization approach is that for the
>>>> data format concern, I see date, number, and currency as the only issues.
>>>> The other items on the list at
>>>> http://en.wikipedia.org/wiki/Internationalization_and_localization are
>>>> all
>>>> already covered.
>>>>
>>>> On Dec 12, 2007 7:42 AM, Joachim Wackerow <joachim.wackerow at gesis.org>
>>>> wrote:
>>>>
>>>>
>>>>> Looking at the SAS formats I realized that we would need additional
>>>>> information like locale and/or language for specific formats.
>>>>>
>>>>> For example some date formats like a string representation of "day in
>>>>> the week". Assuming strings like "SUN" or "MON" in the data file. This
>>>>> can be represented by a generic format, but additionally a definition of
>>>>> the used language would be necessary.
>>>>>
>>>>> Similar with dates like 10.12.2007 (in Germany in ISO format 2007-12-10,
>>>>> in USA in ISO format 2007-10-12); using a generic format an additional
>>>>> information about the locale would be necessary. The alternative would
>>>>> be to have a specific format definition for each variation. But then the
>>>>> information is lost, that the format is locale dependent.
>>>>>
>>>>> Reading numeric or monetary values with embedded grouping (or thousands)
>>>>> separator and decimal separator is another candidate for localization.
>>>>> We have already explicit elements for decimal and grouping separators.
>>>>> But a alternate way would be to use a generic numeric format with a
>>>>> locale.
>>>>>
>>>>> The locale and language information should stay at the same place where
>>>>> the data format is defined. Both can be seen as attributes of data
>>>>> format.
>>>>>
>>>>> In general I think both ways can make sense: definition of a specific
>>>>> format by a name (for a related type) and definition of a generic format
>>>>> with attributes like decimal separator.
>>>>>
>>>>> SPSS has no NLS support, SAS has NLS support, but also old style fixed
>>>>> definitions, SQL has also both. When both ways of definitions are
>>>>> available, the work of describing the formats seems to be easier. The
>>>>> mapping table and the applications using the mapping table are getting
>>>>> more complicate. But doing formats without NLS seems to be a bad choice.
>>>>>
>>>>> Achim
>>>>> _______________________________________________
>>>>> DDI-SRG mailing list
>>>>> DDI-SRG at icpsr.umich.edu
>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>>
>>>>>
>>>>
>>>> --
>>>> I-Lin Kuo
>>>>
>>>>
>>> Wendy L. Thomas Phone: +1 612.624.4389
>>> Data Access Core Director Fax: +1 612.626.8375
>>> Minnesota Population Center Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>>
>> --
>> GESIS - German Social Science Infrastructure Services
>> http://www.gesis.org/en/
>> _______________________________________________
>> DDI-SRG mailing list
>> DDI-SRG at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>
>>
>
> Wendy L. Thomas Phone: +1 612.624.4389
> Data Access Core Director Fax: +1 612.626.8375
> Minnesota Population Center Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-srg/attachments/20071212/83f3770d/attachment.html
More information about the DDI-SRG
mailing list