[DDI-ADG] nCube elements

Jostein Ryssevik Jostein.Ryssevik at nsd.uib.no
Tue Aug 30 04:52:51 EDT 2005


At 16:20 29.08.2005 -0400, Mary Vardigan wrote:

>
><measure> 4.4.13  Measure
>
>
>    * Optional
>    * Repeatable
>    * Attributes: 
> <http://webapp.icpsr.umich.edu/cocoon/DDI-LIBRARY/?element-definition=codeBook>ID, 
> xml:lang, source, varRef, aggrMeth, measUnit, scale, origin, additivity
>
>Description: The element measure indicates the measurement features of the 
>cell content: type of aggregation used, measurement unit, and measurement 
>scale. An origin point is recorded for anchored scales, to be used in 
>determining relative movement along the scale. Additivity indicates 
>whether an aggregate is a stock (like the population at a given point in 
>time) or a flow (like the number of births or deaths over a certain period 
>of time). The non-additive flag is to be used for measures that for 
>logical reasons cannot be aggregated to a higher level - for instance, 
>data that only make sense at a certain level of aggregation, like a 
>classification. Two nCubes may be identical except for their measure - for 
>example, a count of persons by age and percent of persons by age. Measure 
>is an empty element that includes the following attributes: "varRef" is an 
>IDREF; "aggrMeth" indicates the type of aggregation method used, for 
>example 'sum', 'average', 'count'; "measUnit" records the measurement 
>unit, for example 'km', 'miles', etc.; "scale" records unit of scale, for 
>example 'x1', 'x1000'; "origin" records the point of origin for anchored 
>scales;"additivity" records type of additivity such as 'stock', 'flow', 
>'non-additive'.
>
>We are assuming that this element will be replaced by a "container" of 
>attributes or characteristics that will apply at all levels, some of which 
>J will be supplying from SDMX. Thus, the existing attributes like 
>"Aggregation Method" will become part of this larger set of 
>characteristics. "Measure" is a particularly problematic term because 
>Nesstar uses it in a different way to mean the variable itself.

Let me explain why Nesstar is using the measure element in this way, and 
why I think it make sense to think twice before we make radical changes to 
this construct.

In the nCube element, the dmns- and measure-elements are playing similar 
roles.  Both of them are pointing back to var-elements in the variable 
description section and are in this way indicating how concrete variables 
are used to construct a multidimensional table. The dmns-elements lists the 
variables that establish the dimensionality of the table/cube, and the 
measure-elements list the variables that are populating the cells of the 
cube. This can be a single variable, or multiple variables in a 
multi-measure-cube. Both elements are in addition holding a series of 
attributes that adds specific cube-related variable information that are 
missing in the variable-elements. For dimensions the most important is 
cohort that is used to describe what parts of a variable/classification 
that actually in used in the dimension  For measures, the most important 
attributes relates to the logical and mathematical properties of the 
measure, like aggregation method, additivity etc. In oo-terms you could say 
that measure as well as dmns inherits from their var-elements and add a few 
more attributes that are specific to the role the variables play in the cube.

Please note that this use of the terms (as well as the logic) are fully 
compliant with the way the concepts dimension and measure are used in the 
OLAP, data warehouse and data mining community.

One reason to forget that a measure really is a variable derives from the 
relationship between crosstabs and cubes. If you use SPSS to create a 
crosstab from micro data you are only specifying the dimension variables 
but you are still creating a cube. The reason is that the measure variable, 
which in this case is the counts/frequencies of the sample or population 
that the micro-data describe, is implicit (by crossing gender and age in a 
census, you get the population count for age- and gender-groups). 
Population is not a defined variable in the microdataset, but it still a 
variable in statistical terms. This can easily be seen if you instead of 
running a standard crosstab runs an old-fashioned crossbreak and add a 
variable like income as a "summary" variable (and ask for the aggregation 
method "mean"). You will then get another table/cube displaying mean income 
for age- and gender-groups.The cube has the same dimensionality as the 
previous one, but another measure variable.

So, please do not make radical changes to the measure-attribute that will 
prevent us from meeting this very basic and standard requirement.

All the best,
Jostein




















More information about the DDI-ADG mailing list