[DDI-ADG] progress on aggregate data?
Wendy Thomas
wlt at pop.umn.edu
Wed Aug 24 12:35:06 EDT 2005
I just wanted to note that being able to collapse a dimension or create
totals is not an attribute of being able to describe a 2 or 3 dimensional
storage unit. We can currently do the following with the 2.0 nCube
description
identify a specific cell of data based on its matrix coordinate.
ex. locate male, american indian, 18-24 years of age by determining the
catValu's of those labels in each dimension (1,3,3) and then locating the
data item for the specified nCube with that coordinate pattern and using
the phyLoc info to pull the cell content.
Collapse along any dimension. This is simply a function of whether the
cell contents are "additive" (stock) and can currently be handled by DDI.
The problem arrises with nested categories. With the new nested category
added to 2.1 we can do "clean" nested categories using the levels.
However, there is no explicit information concerning the additive nature
of these and there is no way to deal well with ragged hierarchies that
don't "behave well".
So the issue of describing a 2 or 3 dimensional physical store is simply
the lack of a way to address it...not what you can do with the data
described by the nCube.
Wendy
On Wed, 24 Aug 2005, Katherine McNeill-Harman wrote:
> At the end of yesterday's phone conversation, I mentioned to J that I
> thought that we had the least documentation on what changes/structures to
> suggest for aggregate data (i.e. no single spreadsheet from which we were
> working). So he's going to try to compile something but, as he said in his
> other email, we need to pool our thoughts on this. So I'm starting the
> ball rolling.
>
> Following are the goals for aggregate data we'd outlined in Edinburgh; what
> changes have we agreed upon that will accomplish these?
> - Accommodate data files in formats w/ integrated data and metadata (e.g.
> Excel files) self documenting.
> - Evaluate broad utility of nCubes
> - Need ability to describe method of aggregation
> - Need of additional tags to describe aggregate data (not nCubes)
> - Review tag names
> - Role of modules for different kinds of data
> - Align to SDMX
>
> I've been looking over my notes on our changes/proposals; here is what's
> been said in principle, but again, we'll want to document this and consider
> how to accomplish it (I put the date when I have it discussed):
>
> - DDI is missing a way to describe the physical structure of a spreadsheet;
> need physical description for rows/columns/layers to say how they relate to
> each other; this would enable machine-actionable collapsing of rows or
> columns (e.g. collapsing of age groups) and creation of subtotals and
> totals; it should also accommodate various levels and irregular nested
> categories, and be able to identify the lowest level (8/9)
> - (8/9)
> - need to be able to mark up and represent existing tables (e.g. from print
> volumes) (8/16)
> - enable creation of a single file containing both data and metadata; this
> format would be optional and could be applied when appropriate (8/16)
> - unlike SDMX, enable a single file that contains both the data and the
> structure (8/23)
> - be able to apply attribute information at all levels (from cells on up);
> could add to n-cube in the measure element a sub-element that defines
> attributes that can be attached to any level; provide a structure by which
> authors could define these. However, it's not the case as with other 3.0
> features that items at a lower level override things at a higher level;
> therefore, the structure will need to be such that it's clear that
> attributes can be defined only at one chosen level (i.e. can't have
> conflicting attributes at different levels). (8/23)
> - ability to locate the desired cell within the cube (8/23)
> - hinging is important, yet may be addressed by comparative data group; SRG
> liaison will check (8/23)
>
> I'd ask others to help add to and clarify these. In addition, many of the
> above I just have articulated as goals, so I'm not clear if we've yet
> figured out how to accomplish these.
>
> Kate
>
> ___________________________________________
> Katherine McNeill-Harman
> Data Services Librarian
> Dewey Library for Management and Social Sciences
> Massachusetts Institute of Technology
> 77 Massachusetts Avenue, E53-100
> Cambridge, MA 02139
> mcneillh at mit.edu
> 617-253-0787
>
> _______________________________________________
> DDI-ADG mailing list
> DDI-ADG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
>
Wendy L. Thomas Phone: +1 612.624.4389
Data Access Core Director Fax: +1 612.626.8375
Minnesota Population Center Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
More information about the DDI-ADG
mailing list