[DDI-SRG] iIn-line data / defined range of values / data display format
Wendy Thomas
wlt at pop.umn.edu
Wed Aug 15 10:12:22 EDT 2007
I agree to an exent about the measurement unit except that often we are
trying to replicate terminology of legacy data. I will check on this
further as I think we handle it differently in NCubes and I want to be
sure. If this allows for clearer, programable specification, that should
be the goal. However in general measurement unit can be absolutely
anything. In effect, a measurement unit has both a structural and content
component. It is a percent of something and the something is what changes.
I still have to get the stuff out on data types in all their glorious
mutations and will add this question to the mix.
wendy
On Wed, 15 Aug 2007, Joachim Wackerow wrote:
> Thanks for the quick answer. Some comments are below in the text.
>
> Wendy Thomas wrote:
>> On Wed, 15 Aug 2007, Joachim Wackerow wrote:
>>
>>> In the discussion with Larry Hoyle about a SAS converter I noticed
>>> several things. Now I have three questions:
>>>
>>>
>>> DataSet, in-line data
>>>
>>> Both transpositions of a data matrix are possible to represent in
>>> DataSet. A rectangular data file has normally the variables as columns
>>> and the cases as rows (or records). A similar representation is possible
>>> in DataSet, but then for every data item a VariableReference is
>>> necessary. The markup gets a bit wordy. For a transposed version of the
>>> data matrix only one VariableReference per variable is necessary and the
>>> Value element is repeated.
>>>
>>> For an application it would be helpful to differentiate between both
>>> cases. That should be indicated in DataSet. Otherwise the application
>>> has to check the chosen representation. This can be error prone. Both
>>> approaches can be used in one representation, which wouldn't make sense.
>>>
>>
>> Arofan pretty much borrowed this from something else and so a discussion
>> should wait for his input (after this week).
> OK. This should be clarified. If I don't oversee anything, this is a
> candidate for a filing a bug.
>
>>
>>>
>>> Defined range of values, CodeScheme / CategoryScheme
>>>
>>> Variables with interval or ratio measurement can have ranges of data
>>> with different code values but same category labels.
>>>
>>> Example:
>>> BMI
>>> low-<18.5 = "Underweight"
>>> 18.5-24.9 = "Normal weight"
>>> 25-29.9 = "Overweight"
>>> 30-high = "Obesity"
>>>
>>> For such a variable it would make sense to define ranges of values
>>> associated with the same category. A derived variable with a related
>>> recode would be not necessary.
>>>
>>> For my understanding we have no possibility to represent this approach.
>>>
>>> A solution would be in Code of CodeScheme to have Range as a choice for
>>> Value.
>>
>> I am uncomfortable using "ranges" as a code scheme. This is, in effect,
>> an unrealized recode, simply providing the definitions of these terms in
>> relation to the ranges. In analysing the data it is designed to use, in
>> this case, the actual BMI. If I were using it as a category "obese" etc
>> I would recode this information either on the fly or by creating a new
>> variable (or overwriting the BMI variable). I would make a separate
>> scheme of virtual recodes and create a variable without a physical
>> representation.
> In SAS this kind of flexible labeling is possible. For one variable
> different "views" are possible. It is realized by so-called user-defined
> formats. They can be associated permanently (only one) or dynamically
> for one procedure run.
>
> I checked it now, you are completely right it is indeed a virtual recode
> which is made on the fly when running the procedure. So it should be
> defined in DDI as a new variable which is derived and defined in the way
> you are proposing.
>
>>
>>>
>>>
>>> Data Display Format
>>>
>>> A need for a display format exists for example for variables whose
>>> values are proportions (percent). Another example would be currency. For
>>> these type of variable no CodeScheme and no CategoryScheme is necessary.
>>> How can we define this. It seems to be an attribute of the variable
>>> itself in LogicalProduct. Is this also a missing feature?
>>> We have a data format in PhysicalInstance. But that is the data format
>>> of the data itself.
>>>
>>
>> Basically, the short answer is DDI doesn't deal with "display". What you
>> want is a data type that indicates Percent, Dollar, Euro etc etc etc. So
>> I'd have an integer with 2 decimal places that is a US Currancy.
>> Actually this information is there in Measurement Unit. This is a string
>> field because it can be "housing units" "percent of tepees", basically
>> anything. There is/was also aggregation type which is currently missing
>> percent (long story..won't go into it because I can't help getting catty
>> about it).
> "display" is some kind of misleading here. Is not only a display format
> but some kind of characteristic of the variable. An application has a
> chance to choose the appropriate format if the data type is string or
> numeric, but not in the case of the mentioned cases.
>
> I oversaw MeasurementUnit, this is really the right place. But it should
> perhaps have an expandable controlled vocabulary not only string. String
> is too loose for an application to chose an appropriate format.
>
> Achim
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>
Wendy L. Thomas Phone: +1 612.624.4389
Data Access Core Director Fax: +1 612.626.8375
Minnesota Population Center Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
More information about the DDI-SRG
mailing list