[DDI-ADG] More on aggregate data.
Sanda Ionescu
sandai at icpsr.umich.edu
Fri Aug 26 15:16:20 EDT 2005
Hi, all.
First of all, thank you, Jostein, for your messages - I think they're
really helpful in moving us along.
While we talk about aggregate data, I think it is important to keep in mind
the modular structure we envisage for Version 3.0.
A desirable scenario for covering aggregate data might be to end up with
three different modules:
1) a module documenting the logical structure of the data (provided for in
the Logical Product- nCube package in the V 3.0 spreadsheet) and including
dimensions and cubes descriptions.
2) a module primarily designed for data exchange, containing both data and
some metadata, modeled after the SDMX specification - I particularly liked
the example I sent yesterday, extracted from the Generic Sample, although I
am not sure what level of validation we would actually need.
3) finally, a module describing the physical structure of an external data
file, that we (the archive) might choose to describe and distribute in a
legacy format (like Census data, etc.)
(this would be an (improved?) version of the Phys. Rec. Structure Package
in the V 3.0 spreadsheet).
Obviously, there will be links (cross-references) between the modules,
particularly between 1) and 3) and 1) and 2).
With these three modules, producers or distributors would have the
flexibility to use any combination of data and metadata they would find
suitable to their purposes, and the data could sit either within or outside
the DDI instance.
Some questions:
Module 1) -- what do we need to add to make it more functional, (and
SDMX-compatible) ? J ??
also, while I'm looking at the above-mentioned spreadsheet, I notice that
variables are described twice, once as "variables" and once as "variable
dimensions". I think that's probably a mistake -- in this module we only
need to describe "dimensions."
Module 2) -- I fully agree with Jostein's remark that time variables need
to be accounted for as "dimensions". Other than that, what other
changes/adjustments do we need? And, I'm sure others will agree, even if we
adopt a structure similar to SDMX, we might want tag names that are more
suggestive of their contents. (J, I'm afraid we might need to rely on you
to provide an outline of this section, when we agree on what goes in.)
Module 3) -- Right now the LocMap only provides for identifying cells in a
flat delimited file. Do we want to add anything here?
Sanda.
Sanda Ionescu,
Research Associate
Inter-university Consortium for Political and Social Research (ICPSR)
The University of Michigan
P.O. Box 1248
Ann Arbor, MI 48106
Phone: (734) 615-7890
Fax: (734) 615-7890
(734) 647-8200
More information about the DDI-ADG
mailing list