[DDI-SRG] iIn-line data / defined range of values / data display format

Joachim Wackerow joachim.wackerow at gesis.org
Wed Aug 15 10:04:33 EDT 2007


Thanks for the quick answer. Some comments are below in the text.

Wendy Thomas wrote:
> On Wed, 15 Aug 2007, Joachim Wackerow wrote:
> 
>> In the discussion with Larry Hoyle about a SAS converter I noticed
>> several things. Now I have three questions:
>>
>>
>> DataSet, in-line data
>>
>> Both transpositions of a data matrix are possible to represent in
>> DataSet. A rectangular data file has normally the variables as columns
>> and the cases as rows (or records). A similar representation is possible
>> in DataSet, but then for every data item a VariableReference is
>> necessary. The markup gets a bit wordy. For a transposed version of the
>> data matrix only one VariableReference per variable is necessary and the
>> Value element is repeated.
>>
>> For an application it would be helpful to differentiate between both
>> cases. That should be indicated in DataSet. Otherwise the application
>> has to check the chosen representation. This can be error prone. Both
>> approaches can be used in one representation, which wouldn't make sense.
>>
> 
> Arofan pretty much borrowed this from something else and so a discussion 
> should wait for his input (after this week).
OK. This should be clarified. If I don't oversee anything, this is a 
candidate for a filing a bug.

> 
>>
>> Defined range of values, CodeScheme / CategoryScheme
>>
>> Variables with interval or ratio measurement can have ranges of data
>> with different code values but same category labels.
>>
>> Example:
>> BMI
>>     low-<18.5 =  "Underweight"
>>     18.5-24.9 =  "Normal weight"
>>     25-29.9 =  "Overweight"
>>     30-high =  "Obesity"
>>
>> For such a variable it would make sense to define ranges of values
>> associated with the same category. A derived variable with a related
>> recode would be not necessary.
>>
>> For my understanding we have no possibility to represent this approach.
>>
>> A solution would be in Code of CodeScheme to have Range as a choice for
>> Value.
> 
> I am uncomfortable using "ranges" as a code scheme. This is, in effect, 
> an unrealized recode, simply providing the definitions of these terms in 
> relation to the ranges. In analysing the data it is designed to use, in 
> this case, the actual BMI. If I were using it as a category "obese" etc 
> I would recode this information either on the fly or by creating a new 
> variable (or overwriting the BMI variable). I would make a separate 
> scheme of virtual recodes and create a variable without a physical 
> representation.
In SAS this kind of flexible labeling is possible. For one variable 
different "views" are possible. It is realized by so-called user-defined 
formats. They can be associated permanently (only one) or dynamically 
for one procedure run.

I checked it now, you are completely right it is indeed a virtual recode 
which is made on the fly when running the procedure. So it should be 
defined in DDI as a new variable which is derived and defined in the way 
you are proposing.

> 
>>
>>
>> Data Display Format
>>
>> A need for a display format exists for example for variables whose
>> values are proportions (percent). Another example would be currency. For
>> these type of variable no CodeScheme and no CategoryScheme is necessary.
>>  How can we define this. It seems to be an attribute of the variable
>> itself in LogicalProduct. Is this also a missing feature?
>> We have a data format in PhysicalInstance. But that is the data format
>> of the data itself.
>>
> 
> Basically, the short answer is DDI doesn't deal with "display". What you 
> want is a data type that indicates Percent, Dollar, Euro etc etc etc. So 
> I'd have an integer with 2 decimal places that is a US Currancy. 
> Actually this information is there in Measurement Unit. This is a string 
> field because it can be "housing units" "percent of tepees", basically 
> anything. There is/was also aggregation type which is currently missing 
> percent (long story..won't go into it because I can't help getting catty 
> about it).
"display" is some kind of misleading here. Is not only a display format 
but some kind of characteristic of the variable. An application has a 
chance to choose the appropriate format if the data type is string or 
numeric, but not in the case of the mentioned cases.

I oversaw MeasurementUnit, this is really the right place. But it should 
perhaps have an expandable controlled vocabulary not only string. String 
is too loose for an application to chose an appropriate format.

Achim


More information about the DDI-SRG mailing list