[DDI-SRG] NCube: valid N and missing N of cube (fwd)

Joachim Wackerow joachim.wackerow at gesis.org
Tue Dec 9 05:23:59 EST 2008


Yes, this makes sense when the whole process should be documented.

The background of my question is using ncubes for on-line tabulation 
purposes. Here inline ncubes function as transport and exchange medium. 
Just similar information as contained in the provided SPSS sample should 
be described and exposed. For this purpose it is important to know the 
overall total and the valid total (missing total can then be computed or 
vice versa). Here it is just important if a value is missing or not. The 
background of how the missing value became a missing, is here not important.

Currently I see no other way as defining the universe including the 
missings and using special categories at each dimension for defining 
valid total and missing total. Then the overall valid total and missing 
total can be described in a ncube DataItem as you noted.

The drawback of this approach are poor semantics in a machine-actionable 
sense. The program knows just by convention that the special categories 
are totals. Furthermore here only the totals for the ncube are important 
not the totals per dimension. It seems to be a complicated way to 
describe a common number. The "case processing summary" of SPSS is a 
common approach to describe core numbers of a table from the perspective 
of data analysis.

I would propose two machine-actionable additions: a way of defining 
categories for totals and a way of defining totals for the ncube 
instance as overall numbers. These would be also usable as checksums.

Suggestion:
- Invention of a boolean attribute "total" at "Category". This way 
categories can be defined for valid total and missing total in 
combination with the existing attribute "missing".
(This is related to the past discussion on codes and categories, totals, 
including external codes. It would make sense not just for ncubes but 
also for hierarchical coding/category schemes)
- Invention of two attributes or elements at NCubeInstance for defining 
the valid total and missing total of the ncube.

Achim

Wendy Thomas wrote:
> Ahh...well....this is generally reflected in your universe field (if each 
> dimension has a total with a code of 0 then cell 0,0,0,0 in a 4 dimensional 
> cube. Thats where the valid N should go. The information about missing or 
> imputed would go in  an imputation table. The thing is, if you are aggregating 
> and have missing values you generally want to do something to deal with them 
> (imputation). You goal is standardly defined universe
> 
> so are they missing because they are missing by definition or just missing 
> data?
> 
> Cleaning and processing of missing data comes prior to aggregation.
> 
> If you handle them like say a national statistical agency, you will have a 
> separate set of tables expressing the number of  missing items and if they are 
> handled by substitution or imputation. Data holes just really aren't 
> acceptable. Your only other alternative is to use the 0,0,0,0 approach and 
> adjust your universe to read blahblahblah who responded to each of the 
> following questions OR include missing as an identifiable value in each 
> dimension.
> 
> Wendy
> 
> 
> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
> 
>> I'm talking about cases not cells. Yes I'm talking about case-wise deletion 
>> at crosstabs (excluding missing values). It would be nice to have the missing 
>> N in addition to the table which already has the valid N (in an implicit or 
>> explicit way). For example SPSS prints out the "case processing summary" with 
>> valid N, missing N, and total N. Suppressing the missing values is usually 
>> the default.
>>
>> See attached SPSS printout sample.
>>
>> Achim
>>
>> Wendy Thomas wrote:
>>> Are you talking about the valid number of cells in an NCube? This is in the 
>>> NCube structure. @cellCount  @isClean=true implies no missing. Missing by 
>>> default of the structure are identified using the attribute and attaching it 
>>> to a definition of the cells.
>>>
>>> Are you talking about a specific instance of an NCube in terms of the 
>>> application of cell-wise suppression to specific content? (currently not 
>>> doable although you can identify cell level suppression code as an attribute 
>>> and additional measure)
>>>
>>> Wendy
>>>
>>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>>
>>>> I'm wondering where to store valid N, missing N, and total N of a table
>>>> represented in a ncube. I'm talking about the valid N and missing N of
>>>> the table not just of one dimension. The total N of a table can be
>>>> stored in pi:CaseQuantity, but actually this can be misleading.
>>>>
>>>> Do I miss something or should this be added in a future version?
>>>>
>>>> Achim
>>>>
>>>> -- 
>>>> GESIS - Leibniz Institute for the Social Sciences
>>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>>> Phone: +49 (0)621 1246 262
>>>> Fax: +49 (0)621 1246 100
>>>> E-mail: joachim.wackerow at gesis.org
>>>> www.gesis.org/en/institute/
>>>> _______________________________________________
>>>> DDI-SRG mailing list
>>>> DDI-SRG at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>
>>> Wendy L. Thomas                          Phone: +1 612.624.4389
>>> Data Access Core Director         Fax:   +1 612.626.8375
>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>
>>
> 
> Wendy L. Thomas                          Phone: +1 612.624.4389
> Data Access Core Director		 Fax:   +1 612.626.8375
> Minnesota Population Center              Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg


-- 
GESIS - Leibniz Institute for the Social Sciences
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Visiting address: B2 1, 68159 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow at gesis.org
www.gesis.org/en/institute/


More information about the DDI-SRG mailing list