[DDI-SRG] iIn-line data / defined range of values / data display format

Wendy Thomas wlt at pop.umn.edu
Wed Aug 15 09:30:58 EDT 2007


On Wed, 15 Aug 2007, Joachim Wackerow wrote:

> In the discussion with Larry Hoyle about a SAS converter I noticed
> several things. Now I have three questions:
>
>
> DataSet, in-line data
>
> Both transpositions of a data matrix are possible to represent in
> DataSet. A rectangular data file has normally the variables as columns
> and the cases as rows (or records). A similar representation is possible
> in DataSet, but then for every data item a VariableReference is
> necessary. The markup gets a bit wordy. For a transposed version of the
> data matrix only one VariableReference per variable is necessary and the
> Value element is repeated.
>
> For an application it would be helpful to differentiate between both
> cases. That should be indicated in DataSet. Otherwise the application
> has to check the chosen representation. This can be error prone. Both
> approaches can be used in one representation, which wouldn't make sense.
>

Arofan pretty much borrowed this from something else and so a discussion 
should wait for his input (after this week).

>
> Defined range of values, CodeScheme / CategoryScheme
>
> Variables with interval or ratio measurement can have ranges of data
> with different code values but same category labels.
>
> Example:
> BMI
>     low-<18.5 =  "Underweight"
>     18.5-24.9 =  "Normal weight"
>     25-29.9 =  "Overweight"
>     30-high =  "Obesity"
>
> For such a variable it would make sense to define ranges of values
> associated with the same category. A derived variable with a related
> recode would be not necessary.
>
> For my understanding we have no possibility to represent this approach.
>
> A solution would be in Code of CodeScheme to have Range as a choice for
> Value.

I am uncomfortable using "ranges" as a code scheme. This is, in effect, an 
unrealized recode, simply providing the definitions of these terms in 
relation to the ranges. In analysing the data it is designed to use, in 
this case, the actual BMI. If I were using it as a category "obese" etc I 
would recode this information either on the fly or by creating a new 
variable (or overwriting the BMI variable). I would make a separate scheme 
of virtual recodes and create a variable without a physical 
representation.

>
>
> Data Display Format
>
> A need for a display format exists for example for variables whose
> values are proportions (percent). Another example would be currency. For
> these type of variable no CodeScheme and no CategoryScheme is necessary.
>  How can we define this. It seems to be an attribute of the variable
> itself in LogicalProduct. Is this also a missing feature?
> We have a data format in PhysicalInstance. But that is the data format
> of the data itself.
>

Basically, the short answer is DDI doesn't deal with "display". What you 
want is a data type that indicates Percent, Dollar, Euro etc etc etc. So 
I'd have an integer with 2 decimal places that is a US Currancy. Actually 
this information is there in Measurement Unit. This is a string field 
because it can be "housing units" "percent of tepees", basically anything. 
There is/was also aggregation type which is currently missing percent 
(long story..won't go into it because I can't help getting catty about 
it).


Wendy

> Any comments?
>
> Achim
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-SRG mailing list