[DDI-SRG] Missing value range and interval measurement

Joachim Wackerow joachim.wackerow at gesis.org
Fri May 16 05:11:13 EDT 2008


Hi Sigbjoern,

Yes, you are right: missingValue="<=0 9 20..25 >=99" would be possible 
to define. But as you assumed, that was not the intention in terms of a 
good machine-actionability. It should be not necessary to parse a 
content of a field.

This field should be more restricted by a regular expression to numbers 
and a restricted set of strings.

We should add something for the missing value range in 3.1. I'll file a bug.

I like the idea to invent missing schemes. This should be considered in 
the related discussion.

Achim

> In addition to SPSS, NSDstat and Nesstar Publisher use missing ranges. 
> In addition you have unbounded missing ranges with either a lower limit 
> or an upper limit (e.g. <= 0  or >= 99). It was possible to represent 
> this in DDI 2 using the <invalrng> element. I guess you can represent 
> this in the missingValue attribute of r:RepresentationType if you do it 
> like this: missingValue="<=0 9 20..25 >=99"
> This will define any value equal or lower than 0 as missing.
> The value 9 will be missing.
> Any value in the range 20 to 25 will be missing.
> And any value larger or equal to 99 will be missing.
> 
> The definition of the missingValue attribute does not prevent you from 
> defining it like this, but I guess it was not the way it was intended to 
> be used so it should probably not be used that way.
> 
> In SAS/STATA and SPSS (unless you use range) there is limit on how many 
> missing values you can define, in the DDI 3.0 there is no limit. I think 
> one should try to be compatible with the market leading statistical 
> packages for social science. For a future version of Nesstar Publisher 
> we are planing to add compatibility for SAS/Stata type of missing (they 
> will now be recoded on import), we currently have SPSS compatibility. 
> The user will, for each variable, have to choose which missing scheme to 
> use, SAS type or SPSS type. Maybe this is the way to go for DDI as well? 
> I.e. you have to define which missing scheme to use (SAS or SPSS). Are 
> there other missing schemes used in other statistical packages that 
> aren't a subset of the SAS or SPSS schemes and are more extensive than them?
> 
> Sigbjoern
> 
>> Joachim Wackerow wrote:
>>   
>>> Currently I'm working again on the SPSS converter.
>>>
>>> I'm wondering how to express in DDI a missing value range of a variable 
>>> with an interval measurement level.
>>>
>>> For example the variable temperature. A missing value range is 20-25 
>>> Celsius. The values are expressed as floating numbers like 20.17 etc., 
>>> which are not all known in advance. This can be expressed in SPSS. In 
>>> DDI I don't see a way. Do I miss something?
>>>
>>> For variables with ordinal measurement (i.e. with categories like 
>>> occupation) a workaround can be used for expressing a missing value 
>>> range in DDI. Each category in the missing value range must be defined 
>>> as missing. This can produce some overhead. Category entries with just 
>>> the missing definition can be produced without labels depending on the 
>>> definition in files of the statistical packages.
>>>
>>> Any ideas? I think we discussed already a similar issue?
>>>
>>> Achim
>>> _______________________________________________
>>> DDI-SRG mailing list
>>> DDI-SRG at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>
>>>   
>>>     
>>
>> _______________________________________________
>> DDI-SRG mailing list
>> DDI-SRG at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>   
> 


-- 
GESIS - German Social Science Infrastructure Services
http://www.gesis.org/en/


More information about the DDI-SRG mailing list