[DDI-SRG] NCube: valid N and missing N of cube (fwd)
Joachim Wackerow
joachim.wackerow at gesis.org
Tue Dec 9 12:29:46 EST 2008
It is now in Mantis as 228 (with relationship to 122) and 229.
Achim
Wendy Thomas wrote:
> Achim
>
> Sounds like a resonable idea that we should hash out with examples and
> discussion through a normal bug debate. Please put this recommendation
> in Mantis and include our discussions as background information
>
> Wendy
>
>
> On Tue, 9 Dec 2008, Joachim Wackerow wrote:
>
>> Yes, this makes sense when the whole process should be documented.
>>
>> The background of my question is using ncubes for on-line tabulation
>> purposes. Here inline ncubes function as transport and exchange medium.
>> Just similar information as contained in the provided SPSS sample should
>> be described and exposed. For this purpose it is important to know the
>> overall total and the valid total (missing total can then be computed or
>> vice versa). Here it is just important if a value is missing or not. The
>> background of how the missing value became a missing, is here not
>> important.
>>
>> Currently I see no other way as defining the universe including the
>> missings and using special categories at each dimension for defining
>> valid total and missing total. Then the overall valid total and missing
>> total can be described in a ncube DataItem as you noted.
>>
>> The drawback of this approach are poor semantics in a machine-actionable
>> sense. The program knows just by convention that the special categories
>> are totals. Furthermore here only the totals for the ncube are important
>> not the totals per dimension. It seems to be a complicated way to
>> describe a common number. The "case processing summary" of SPSS is a
>> common approach to describe core numbers of a table from the perspective
>> of data analysis.
>>
>> I would propose two machine-actionable additions: a way of defining
>> categories for totals and a way of defining totals for the ncube
>> instance as overall numbers. These would be also usable as checksums.
>>
>> Suggestion:
>> - Invention of a boolean attribute "total" at "Category". This way
>> categories can be defined for valid total and missing total in
>> combination with the existing attribute "missing".
>> (This is related to the past discussion on codes and categories, totals,
>> including external codes. It would make sense not just for ncubes but
>> also for hierarchical coding/category schemes)
>> - Invention of two attributes or elements at NCubeInstance for defining
>> the valid total and missing total of the ncube.
>>
>> Achim
>>
>> Wendy Thomas wrote:
>>> Ahh...well....this is generally reflected in your universe field (if
>>> each
>>> dimension has a total with a code of 0 then cell 0,0,0,0 in a 4
>>> dimensional
>>> cube. Thats where the valid N should go. The information about
>>> missing or
>>> imputed would go in an imputation table. The thing is, if you are
>>> aggregating
>>> and have missing values you generally want to do something to deal
>>> with them
>>> (imputation). You goal is standardly defined universe
>>>
>>> so are they missing because they are missing by definition or just
>>> missing
>>> data?
>>>
>>> Cleaning and processing of missing data comes prior to aggregation.
>>>
>>> If you handle them like say a national statistical agency, you will
>>> have a
>>> separate set of tables expressing the number of missing items and if
>>> they are
>>> handled by substitution or imputation. Data holes just really aren't
>>> acceptable. Your only other alternative is to use the 0,0,0,0
>>> approach and
>>> adjust your universe to read blahblahblah who responded to each of the
>>> following questions OR include missing as an identifiable value in each
>>> dimension.
>>>
>>> Wendy
>>>
>>>
>>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>>
>>>> I'm talking about cases not cells. Yes I'm talking about case-wise
>>>> deletion
>>>> at crosstabs (excluding missing values). It would be nice to have
>>>> the missing
>>>> N in addition to the table which already has the valid N (in an
>>>> implicit or
>>>> explicit way). For example SPSS prints out the "case processing
>>>> summary" with
>>>> valid N, missing N, and total N. Suppressing the missing values is
>>>> usually
>>>> the default.
>>>>
>>>> See attached SPSS printout sample.
>>>>
>>>> Achim
>>>>
>>>> Wendy Thomas wrote:
>>>>> Are you talking about the valid number of cells in an NCube? This
>>>>> is in the
>>>>> NCube structure. @cellCount @isClean=true implies no missing.
>>>>> Missing by
>>>>> default of the structure are identified using the attribute and
>>>>> attaching it
>>>>> to a definition of the cells.
>>>>>
>>>>> Are you talking about a specific instance of an NCube in terms of the
>>>>> application of cell-wise suppression to specific content?
>>>>> (currently not
>>>>> doable although you can identify cell level suppression code as an
>>>>> attribute
>>>>> and additional measure)
>>>>>
>>>>> Wendy
>>>>>
>>>>> On Mon, 8 Dec 2008, Joachim Wackerow wrote:
>>>>>
>>>>>> I'm wondering where to store valid N, missing N, and total N of a
>>>>>> table
>>>>>> represented in a ncube. I'm talking about the valid N and missing
>>>>>> N of
>>>>>> the table not just of one dimension. The total N of a table can be
>>>>>> stored in pi:CaseQuantity, but actually this can be misleading.
>>>>>>
>>>>>> Do I miss something or should this be added in a future version?
>>>>>>
>>>>>> Achim
>>>>>>
>>>>>> --
>>>>>> GESIS - Leibniz Institute for the Social Sciences
>>>>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>>>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>>>>> Phone: +49 (0)621 1246 262
>>>>>> Fax: +49 (0)621 1246 100
>>>>>> E-mail: joachim.wackerow at gesis.org
>>>>>> www.gesis.org/en/institute/
>>>>>> _______________________________________________
>>>>>> DDI-SRG mailing list
>>>>>> DDI-SRG at icpsr.umich.edu
>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>>>
>>>>> Wendy L. Thomas Phone: +1 612.624.4389
>>>>> Data Access Core Director Fax: +1 612.626.8375
>>>>> Minnesota Population Center Email: wlt at pop.umn.edu
>>>>> University of Minnesota
>>>>> 50 Willey Hall
>>>>> 225 19th Avenue South
>>>>> Minneapolis, MN 55455
>>>>
>>>>
>>>
>>> Wendy L. Thomas Phone: +1 612.624.4389
>>> Data Access Core Director Fax: +1 612.626.8375
>>> Minnesota Population Center Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>> _______________________________________________
>>> DDI-SRG mailing list
>>> DDI-SRG at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>
>>
>> --
>> GESIS - Leibniz Institute for the Social Sciences
>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>> Visiting address: B2 1, 68159 Mannheim, Germany
>> Phone: +49 (0)621 1246 262
>> Fax: +49 (0)621 1246 100
>> E-mail: joachim.wackerow at gesis.org
>> www.gesis.org/en/institute/
>> _______________________________________________
>> DDI-SRG mailing list
>> DDI-SRG at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>
>
> Wendy L. Thomas Phone: +1 612.624.4389
> Data Access Core Director Fax: +1 612.626.8375
> Minnesota Population Center Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
More information about the DDI-SRG
mailing list