[DDI-ADG] progress on aggregate data?

Mary Vardigan maryv at icpsr.umich.edu
Thu Aug 25 08:45:50 EDT 2005


Thanks much, Kate. Responses below. --Mary and Sanda

At 10:29 AM 8/24/2005, Katherine McNeill-Harman wrote:
>At the end of yesterday's phone conversation, I mentioned to J that I 
>thought that we had the least documentation on what changes/structures to 
>suggest for aggregate data (i.e. no single spreadsheet from which we were 
>working).  So he's going to try to compile something but, as he said in 
>his other email, we need to pool our thoughts on this.  So I'm starting 
>the ball rolling.
>
>Following are the goals for aggregate data we'd outlined in Edinburgh; 
>what changes have we agreed upon that will accomplish these?
>- Accommodate data files in formats w/ integrated data and metadata (e.g. 
>Excel files) self documenting.

We agree that this is a worthy goal and suggest that data values be 
incorporated into the cell description, which is Data Item within LocMap, 
just where Wendy placed the physical table cell description. This sets up 
two options: (1) describing the position of the data value in a separate 
rectangular file (already in the spec), and (2) the position of the row and 
column, plus the data value, to describe a self-documenting table. Wendy, 
you mentioned that defining row and column is not enough to specify a 
physical table. Can you and J and Jostein perhaps come up with all the 
elements needed?

>- Evaluate broad utility of nCubes

Consensus seemed to be that the logical description of the aggregate data 
nCubes is fine. What we need to improve is the physical description, which 
we addressed above.

>- Need ability to describe method of aggregation

We have this -- an existing attribute called aggMeth.

>- Need of additional tags to describe aggregate data (not nCubes)

Not sure what this means. In SDMX there are attributes that J mentioned in 
the last call (information about the measurement, like source and 
observation status -- projection, actual count, etc.). Adding this 
information could help to make the ncubes more robust and make the DDI more 
compatible with SDMX.

>- Review tag names

This is important and we haven't done it yet. Sanda and I will try to make 
a stab at this next week and we can all review. For example, "cohort" is 
not exactly the right term for how it is used.

>- Role of modules for different kinds of data

Not sure what is meant by this. Have we covered the two main possibilities 
through 1 and 2 above?

>- Align to SDMX

J, we need your help in determining what is needed here. If we include the 
values and add the attributes you mentioned, are there other things we need 
to do?


>I've been looking over my notes on our changes/proposals; here is what's 
>been said in principle, but again, we'll want to document this and 
>consider how to accomplish it (I put the date when I have it discussed):
>
>- DDI is missing a way to describe the physical structure of a 
>spreadsheet; need physical description for rows/columns/layers to say how 
>they relate to each other; this would enable machine-actionable collapsing 
>of rows or columns (e.g. collapsing of age groups) and creation of 
>subtotals and totals; it should also accommodate various levels and 
>irregular nested categories, and be able to identify the lowest level (8/9)

Not sure what layers are.

>- (8/9)
>- need to be able to mark up and represent existing tables (e.g. from 
>print volumes) (8/16)
>- enable creation of a single file containing both data and metadata; this 
>format would be optional and could be applied when appropriate (8/16)
>- unlike SDMX, enable a single file that contains both the data and the 
>structure (8/23)
>- be able to apply attribute information at all levels (from cells on up); 
>could add to n-cube in the measure element a sub-element that defines 
>attributes that can be attached to any level; provide a structure by which 
>authors could define these.  However, it's not the case as with other 3.0 
>features that items at a lower level override things at a higher level; 
>therefore, the structure will need to be such that it's clear that 
>attributes can be defined only at one chosen level (i.e. can't have 
>conflicting attributes at different levels). (8/23)

J, please help us determine the level at which the SDMX type attributes 
should apply.

>- ability to locate the desired cell within the cube (8/23)
>- hinging is important, yet may be addressed by comparative data group; 
>SRG liaison will check (8/23)

Do layers relate to hinging? Hinging is possible now but only within one 
DDI instance. Between two instances, we have to establish comparability at 
the variable level.

One other point, related to Wendy's earlier message: We are still confused 
when the discussion moves to 2- versus 3-dimensional structures. We need 
examples. Is Census data format 2 or 3 dimensional? How about an Excel 
spreadsheet? What are the differences in dimensionality?



>I'd ask others to help add to and clarify these.  In addition, many of the 
>above I just have articulated as goals, so I'm not clear if we've yet 
>figured out how to accomplish these.
>
>Kate
>
>___________________________________________
>Katherine McNeill-Harman
>Data Services Librarian
>Dewey Library for Management and Social Sciences
>Massachusetts Institute of Technology
>77 Massachusetts Avenue, E53-100
>Cambridge, MA 02139
>mcneillh at mit.edu
>617-253-0787
>_______________________________________________
>DDI-ADG mailing list
>DDI-ADG at icpsr.umich.edu
>http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg

Mary Vardigan
Assistant Director
Inter-university Consortium for Political and Social Research (ICPSR)
University of Michigan
P.O. Box 1248, Ann Arbor, MI 48106-1248
Phone: 734-615-7908
Fax: 734-647-8200
www.icpsr.umich.edu 



More information about the DDI-ADG mailing list