[DDI-SRG] Missing value range and interval measurement

Wendy Thomas wlt at pop.umn.edu
Fri May 16 09:32:21 EDT 2008


Hi all,
We also need to differentiate between "missing values" and invalid ranges. 
Invalid ranges in numeric response domains are already avialable in the 
definition of NumericRepresentation. Invalid content for CodeSchemes is 
whatever is NOT in the CodeScheme. In addition, CodeScheme should contain 
the missing value codes and definitions.

I see two problems in the use of an attribute in Representation for 
missing values. First is the inability to define a range and the second is 
an inability to provide meaning for the value (refused, don't know, no 
response, etc) in anything other than codescheme.

I do not think we should limit DDI to the current constraints of a piece 
of statistical software. This part of the definition in DDI is not linked 
to a specific storage package nor should it be. We may need to provide a 
means of describing multiple sets of missing value definitions that can be 
used by multiple types of Representations so they don't need to be 
described multiple times, but defined once and reused as they are for 
CodeSchemes.

Wendy

p.s. this is the reason we didn't fix this in 3.0 It has several possible 
approaches and a number of issues linked to it.


On Fri, 16 May 2008, Joachim Wackerow wrote:

> Hi Sigbjoern,
>
> Yes, you are right: missingValue="<=0 9 20..25 >=99" would be possible
> to define. But as you assumed, that was not the intention in terms of a
> good machine-actionability. It should be not necessary to parse a
> content of a field.
>
> This field should be more restricted by a regular expression to numbers
> and a restricted set of strings.
>
> We should add something for the missing value range in 3.1. I'll file a bug.
>
> I like the idea to invent missing schemes. This should be considered in
> the related discussion.
>
> Achim
>
>> In addition to SPSS, NSDstat and Nesstar Publisher use missing ranges.
>> In addition you have unbounded missing ranges with either a lower limit
>> or an upper limit (e.g. <= 0  or >= 99). It was possible to represent
>> this in DDI 2 using the <invalrng> element. I guess you can represent
>> this in the missingValue attribute of r:RepresentationType if you do it
>> like this: missingValue="<=0 9 20..25 >=99"
>> This will define any value equal or lower than 0 as missing.
>> The value 9 will be missing.
>> Any value in the range 20 to 25 will be missing.
>> And any value larger or equal to 99 will be missing.
>>
>> The definition of the missingValue attribute does not prevent you from
>> defining it like this, but I guess it was not the way it was intended to
>> be used so it should probably not be used that way.
>>
>> In SAS/STATA and SPSS (unless you use range) there is limit on how many
>> missing values you can define, in the DDI 3.0 there is no limit. I think
>> one should try to be compatible with the market leading statistical
>> packages for social science. For a future version of Nesstar Publisher
>> we are planing to add compatibility for SAS/Stata type of missing (they
>> will now be recoded on import), we currently have SPSS compatibility.
>> The user will, for each variable, have to choose which missing scheme to
>> use, SAS type or SPSS type. Maybe this is the way to go for DDI as well?
>> I.e. you have to define which missing scheme to use (SAS or SPSS). Are
>> there other missing schemes used in other statistical packages that
>> aren't a subset of the SAS or SPSS schemes and are more extensive than them?
>>
>> Sigbjoern
>>
>>> Joachim Wackerow wrote:
>>>
>>>> Currently I'm working again on the SPSS converter.
>>>>
>>>> I'm wondering how to express in DDI a missing value range of a variable
>>>> with an interval measurement level.
>>>>
>>>> For example the variable temperature. A missing value range is 20-25
>>>> Celsius. The values are expressed as floating numbers like 20.17 etc.,
>>>> which are not all known in advance. This can be expressed in SPSS. In
>>>> DDI I don't see a way. Do I miss something?
>>>>
>>>> For variables with ordinal measurement (i.e. with categories like
>>>> occupation) a workaround can be used for expressing a missing value
>>>> range in DDI. Each category in the missing value range must be defined
>>>> as missing. This can produce some overhead. Category entries with just
>>>> the missing definition can be produced without labels depending on the
>>>> definition in files of the statistical packages.
>>>>
>>>> Any ideas? I think we discussed already a similar issue?
>>>>
>>>> Achim
>>>> _______________________________________________
>>>> DDI-SRG mailing list
>>>> DDI-SRG at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> DDI-SRG mailing list
>>> DDI-SRG at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>
>>
>
>
> -- 
> GESIS - German Social Science Infrastructure Services
> http://www.gesis.org/en/
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-SRG mailing list