[DDI-ADG] nCube elements
Jostein Ryssevik
Jostein.Ryssevik at nsd.uib.no
Tue Aug 30 04:52:51 EDT 2005
At 16:20 29.08.2005 -0400, Mary Vardigan wrote:
>
><measure> 4.4.13 Measure
>
>
> * Optional
> * Repeatable
> * Attributes:
> <http://webapp.icpsr.umich.edu/cocoon/DDI-LIBRARY/?element-definition=codeBook>ID,
> xml:lang, source, varRef, aggrMeth, measUnit, scale, origin, additivity
>
>Description: The element measure indicates the measurement features of the
>cell content: type of aggregation used, measurement unit, and measurement
>scale. An origin point is recorded for anchored scales, to be used in
>determining relative movement along the scale. Additivity indicates
>whether an aggregate is a stock (like the population at a given point in
>time) or a flow (like the number of births or deaths over a certain period
>of time). The non-additive flag is to be used for measures that for
>logical reasons cannot be aggregated to a higher level - for instance,
>data that only make sense at a certain level of aggregation, like a
>classification. Two nCubes may be identical except for their measure - for
>example, a count of persons by age and percent of persons by age. Measure
>is an empty element that includes the following attributes: "varRef" is an
>IDREF; "aggrMeth" indicates the type of aggregation method used, for
>example 'sum', 'average', 'count'; "measUnit" records the measurement
>unit, for example 'km', 'miles', etc.; "scale" records unit of scale, for
>example 'x1', 'x1000'; "origin" records the point of origin for anchored
>scales;"additivity" records type of additivity such as 'stock', 'flow',
>'non-additive'.
>
>We are assuming that this element will be replaced by a "container" of
>attributes or characteristics that will apply at all levels, some of which
>J will be supplying from SDMX. Thus, the existing attributes like
>"Aggregation Method" will become part of this larger set of
>characteristics. "Measure" is a particularly problematic term because
>Nesstar uses it in a different way to mean the variable itself.
Let me explain why Nesstar is using the measure element in this way, and
why I think it make sense to think twice before we make radical changes to
this construct.
In the nCube element, the dmns- and measure-elements are playing similar
roles. Both of them are pointing back to var-elements in the variable
description section and are in this way indicating how concrete variables
are used to construct a multidimensional table. The dmns-elements lists the
variables that establish the dimensionality of the table/cube, and the
measure-elements list the variables that are populating the cells of the
cube. This can be a single variable, or multiple variables in a
multi-measure-cube. Both elements are in addition holding a series of
attributes that adds specific cube-related variable information that are
missing in the variable-elements. For dimensions the most important is
cohort that is used to describe what parts of a variable/classification
that actually in used in the dimension For measures, the most important
attributes relates to the logical and mathematical properties of the
measure, like aggregation method, additivity etc. In oo-terms you could say
that measure as well as dmns inherits from their var-elements and add a few
more attributes that are specific to the role the variables play in the cube.
Please note that this use of the terms (as well as the logic) are fully
compliant with the way the concepts dimension and measure are used in the
OLAP, data warehouse and data mining community.
One reason to forget that a measure really is a variable derives from the
relationship between crosstabs and cubes. If you use SPSS to create a
crosstab from micro data you are only specifying the dimension variables
but you are still creating a cube. The reason is that the measure variable,
which in this case is the counts/frequencies of the sample or population
that the micro-data describe, is implicit (by crossing gender and age in a
census, you get the population count for age- and gender-groups).
Population is not a defined variable in the microdataset, but it still a
variable in statistical terms. This can easily be seen if you instead of
running a standard crosstab runs an old-fashioned crossbreak and add a
variable like income as a "summary" variable (and ask for the aggregation
method "mean"). You will then get another table/cube displaying mean income
for age- and gender-groups.The cube has the same dimensionality as the
previous one, but another measure variable.
So, please do not make radical changes to the measure-attribute that will
prevent us from meeting this very basic and standard requirement.
All the best,
Jostein
More information about the DDI-ADG
mailing list