[DDI-SRG] NCube: valid N and missing N of cube (fwd)

Wendy Thomas wlt at pop.umn.edu
Tue Dec 9 10:03:41 EST 2008


Achim

Sounds like a resonable idea that we should hash out with examples and 
discussion through a normal bug debate. Please put this recommendation in 
Mantis and include our discussions as background information

Wendy


On Tue, 9 Dec 2008, Joachim Wackerow wrote:

> Yes, this makes sense when the whole process should be documented.
>
> The background of my question is using ncubes for on-line tabulation
> purposes. Here inline ncubes function as transport and exchange medium.
> Just similar information as contained in the provided SPSS sample should
> be described and exposed. For this purpose it is important to know the
> overall total and the valid total (missing total can then be computed or
> vice versa). Here it is just important if a value is missing or not. The
> background of how the missing value became a missing, is here not important.
>
> Currently I see no other way as defining the universe including the
> missings and using special categories at each dimension for defining
> valid total and missing total. Then the overall valid total and missing
> total can be described in a ncube DataItem as you noted.
>
> The drawback of this approach are poor semantics in a machine-actionable
> sense. The program knows just by convention that the special categories
> are totals. Furthermore here only the totals for the ncube are important
> not the totals per dimension. It seems to be a complicated way to
> describe a common number. The "case processing summary" of SPSS is a
> common approach to describe core numbers of a table from the perspective
> of data analysis.
>
> I would propose two machine-actionable additions: a way of defining
> categories for totals and a way of defining totals for the ncube
> instance as overall numbers. These would be also usable as checksums.
>
> Suggestion:
> - Invention of a boolean attribute "total" at "Category". This way
> categories can be defined for valid total and missing total in
> combination with the existing attribute "missing".
> (This is related to the past discussion on codes and categories, totals,
> including external codes. It would make sense not just for ncubes but
> also for hierarchical coding/category schemes)
> - Invention of two attributes or elements at NCubeInstance for defining
> the valid total and missing total of the ncube.
>
> Achim
>
> Wendy Thomas wrote:
>> Ahh...well....this is generally reflected in your universe field (if each
>> dimension has a total with a code of 0 then cell 0,0,0,0 in a 4 dimensional
>> cube. Thats where the valid N should go. The information about missing or
>> imputed would go in  an imputation table. The thing is, if you are aggregating
>> and have missing values you generally want to do something to deal with them
>> (imputation). You goal is standardly defined universe
>>
>> so are they missing because they are missing by definition or just missing
>> data?
>>
>> Cleaning and processing of missing data comes prior to aggregation.
>>
>> If you handle them like say a national statistical agency, you will have a
>> separate set of tables expressing the number of  missing items and if they are
>> handled by substitution or imputation. Data holes just really aren't
>> acceptable. Your only other alternative is to use the 0,0,0,0 approach and
>> adjust your universe to read blahblahblah who responded to each of the
>> following questions OR include missing as an identifiable value in each
>> dimension.
>>
>> Wendy
>>
>>
>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>
>>> I'm talking about cases not cells. Yes I'm talking about case-wise deletion
>>> at crosstabs (excluding missing values). It would be nice to have the missing
>>> N in addition to the table which already has the valid N (in an implicit or
>>> explicit way). For example SPSS prints out the "case processing summary" with
>>> valid N, missing N, and total N. Suppressing the missing values is usually
>>> the default.
>>>
>>> See attached SPSS printout sample.
>>>
>>> Achim
>>>
>>> Wendy Thomas wrote:
>>>> Are you talking about the valid number of cells in an NCube? This is in the
>>>> NCube structure. @cellCount  @isClean=true implies no missing. Missing by
>>>> default of the structure are identified using the attribute and attaching it
>>>> to a definition of the cells.
>>>>
>>>> Are you talking about a specific instance of an NCube in terms of the
>>>> application of cell-wise suppression to specific content? (currently not
>>>> doable although you can identify cell level suppression code as an attribute
>>>> and additional measure)
>>>>
>>>> Wendy
>>>>
>>>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>>>
>>>>> I'm wondering where to store valid N, missing N, and total N of a table
>>>>> represented in a ncube. I'm talking about the valid N and missing N of
>>>>> the table not just of one dimension. The total N of a table can be
>>>>> stored in pi:CaseQuantity, but actually this can be misleading.
>>>>>
>>>>> Do I miss something or should this be added in a future version?
>>>>>
>>>>> Achim
>>>>>
>>>>> --
>>>>> GESIS - Leibniz Institute for the Social Sciences
>>>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>>>> Phone: +49 (0)621 1246 262
>>>>> Fax: +49 (0)621 1246 100
>>>>> E-mail: joachim.wackerow at gesis.org
>>>>> www.gesis.org/en/institute/
>>>>> _______________________________________________
>>>>> DDI-SRG mailing list
>>>>> DDI-SRG at icpsr.umich.edu
>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>>
>>>> Wendy L. Thomas                          Phone: +1 612.624.4389
>>>> Data Access Core Director         Fax:   +1 612.626.8375
>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>> University of Minnesota
>>>> 50 Willey Hall
>>>> 225 19th Avenue South
>>>> Minneapolis, MN 55455
>>>
>>>
>>
>> Wendy L. Thomas                          Phone: +1 612.624.4389
>> Data Access Core Director		 Fax:   +1 612.626.8375
>> Minnesota Population Center              Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>> _______________________________________________
>> DDI-SRG mailing list
>> DDI-SRG at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>
>
> -- 
> GESIS - Leibniz Institute for the Social Sciences
> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
> Visiting address: B2 1, 68159 Mannheim, Germany
> Phone: +49 (0)621 1246 262
> Fax: +49 (0)621 1246 100
> E-mail: joachim.wackerow at gesis.org
> www.gesis.org/en/institute/
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-SRG mailing list