[DDI-SRG] NCube: valid N and missing N of cube (fwd)
Joachim Wackerow
joachim.wackerow at gesis.org
Tue Dec 9 05:23:59 EST 2008
Yes, this makes sense when the whole process should be documented.
The background of my question is using ncubes for on-line tabulation
purposes. Here inline ncubes function as transport and exchange medium.
Just similar information as contained in the provided SPSS sample should
be described and exposed. For this purpose it is important to know the
overall total and the valid total (missing total can then be computed or
vice versa). Here it is just important if a value is missing or not. The
background of how the missing value became a missing, is here not important.
Currently I see no other way as defining the universe including the
missings and using special categories at each dimension for defining
valid total and missing total. Then the overall valid total and missing
total can be described in a ncube DataItem as you noted.
The drawback of this approach are poor semantics in a machine-actionable
sense. The program knows just by convention that the special categories
are totals. Furthermore here only the totals for the ncube are important
not the totals per dimension. It seems to be a complicated way to
describe a common number. The "case processing summary" of SPSS is a
common approach to describe core numbers of a table from the perspective
of data analysis.
I would propose two machine-actionable additions: a way of defining
categories for totals and a way of defining totals for the ncube
instance as overall numbers. These would be also usable as checksums.
Suggestion:
- Invention of a boolean attribute "total" at "Category". This way
categories can be defined for valid total and missing total in
combination with the existing attribute "missing".
(This is related to the past discussion on codes and categories, totals,
including external codes. It would make sense not just for ncubes but
also for hierarchical coding/category schemes)
- Invention of two attributes or elements at NCubeInstance for defining
the valid total and missing total of the ncube.
Achim
Wendy Thomas wrote:
> Ahh...well....this is generally reflected in your universe field (if each
> dimension has a total with a code of 0 then cell 0,0,0,0 in a 4 dimensional
> cube. Thats where the valid N should go. The information about missing or
> imputed would go in an imputation table. The thing is, if you are aggregating
> and have missing values you generally want to do something to deal with them
> (imputation). You goal is standardly defined universe
>
> so are they missing because they are missing by definition or just missing
> data?
>
> Cleaning and processing of missing data comes prior to aggregation.
>
> If you handle them like say a national statistical agency, you will have a
> separate set of tables expressing the number of missing items and if they are
> handled by substitution or imputation. Data holes just really aren't
> acceptable. Your only other alternative is to use the 0,0,0,0 approach and
> adjust your universe to read blahblahblah who responded to each of the
> following questions OR include missing as an identifiable value in each
> dimension.
>
> Wendy
>
>
> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>
>> I'm talking about cases not cells. Yes I'm talking about case-wise deletion
>> at crosstabs (excluding missing values). It would be nice to have the missing
>> N in addition to the table which already has the valid N (in an implicit or
>> explicit way). For example SPSS prints out the "case processing summary" with
>> valid N, missing N, and total N. Suppressing the missing values is usually
>> the default.
>>
>> See attached SPSS printout sample.
>>
>> Achim
>>
>> Wendy Thomas wrote:
>>> Are you talking about the valid number of cells in an NCube? This is in the
>>> NCube structure. @cellCount @isClean=true implies no missing. Missing by
>>> default of the structure are identified using the attribute and attaching it
>>> to a definition of the cells.
>>>
>>> Are you talking about a specific instance of an NCube in terms of the
>>> application of cell-wise suppression to specific content? (currently not
>>> doable although you can identify cell level suppression code as an attribute
>>> and additional measure)
>>>
>>> Wendy
>>>
>>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>>
>>>> I'm wondering where to store valid N, missing N, and total N of a table
>>>> represented in a ncube. I'm talking about the valid N and missing N of
>>>> the table not just of one dimension. The total N of a table can be
>>>> stored in pi:CaseQuantity, but actually this can be misleading.
>>>>
>>>> Do I miss something or should this be added in a future version?
>>>>
>>>> Achim
>>>>
>>>> --
>>>> GESIS - Leibniz Institute for the Social Sciences
>>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>>> Phone: +49 (0)621 1246 262
>>>> Fax: +49 (0)621 1246 100
>>>> E-mail: joachim.wackerow at gesis.org
>>>> www.gesis.org/en/institute/
>>>> _______________________________________________
>>>> DDI-SRG mailing list
>>>> DDI-SRG at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>
>>> Wendy L. Thomas Phone: +1 612.624.4389
>>> Data Access Core Director Fax: +1 612.626.8375
>>> Minnesota Population Center Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>
>>
>
> Wendy L. Thomas Phone: +1 612.624.4389
> Data Access Core Director Fax: +1 612.626.8375
> Minnesota Population Center Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
> _______________________________________________
> DDI-SRG mailing list
> DDI-SRG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
--
GESIS - Leibniz Institute for the Social Sciences
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Visiting address: B2 1, 68159 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow at gesis.org
www.gesis.org/en/institute/
More information about the DDI-SRG
mailing list