[DDI-ADG] Concerns about nCube spec?

Wendy Thomas wlt at pop.umn.edu
Fri Mar 19 12:06:39 EST 2004


Actually, I discussed this with the SRG yesterday during our weekly
meeting. This is really a storage issue and not an aggregate data issue.
Any data can be stored in a variety of formats not currently defined
within DDI. Even I-lin (the author of the statement in question) agreed
that it has no more to do with aggregate data description than it has to
do with microdata description, its just that aggregate data was the
context within which he first heard about it.

There are a number of things about the basic aggregate description model
that can be improved. I can certainly try to come up with a quick list as
we've been using it for a few years now and have run into a lot of
refinements we'd like to have. Basically I suggest that the group
brainstorm functionality that they'd like to have, relationships that need
to be defined etc. and work from that to determine what the current model
doesn't handle or what it could handle better. Also, there should be a
Manual for Proposal Development coming out from Mary, Tom and SRG within a
month which should also help the group frame the discussion and work.

I'd really say let the description of other storage methods sit for a bit.
The group already has three big chunks to deal with (aggregate, geography,
time) and believe me, having made a stab at describing generic 2 and 3
dimensional storage (from hense we have the element <blsht> "basic layer
sheet" or supply your own vowels) you can get a sense of the framer's
state of mind by the end of the day. Its really possible that the
discussions of the SRG over the next few months will provide a better
framework for dealing with the question of storage description.

Don't intend to throw a wet blanket on this. If you all feel that this has
priority over the other areas, go for it. You should just be aware that
given the length of the review process and the date of Version 3.0 you
would need to have any proposals described and ready to begin the review
process by September 2004 for inclusion in Version 3.0 (assuming it moves
smoothly through review). I think it would be good to have dealt with at
least 2 or the 3 areas of this group in Version 3.0.

Wendy



On Fri, 19 Mar 2004, Julie Linden wrote:

> Thanks for the response. I'm just trying to understand the issues here; I
> apologize to others for whom these issues may be very clear and familiar.
> Would one approach for our group be to try to figure out how to describe
> these other types of storage systems that the DDI does not currently cover?
> And if so, would we try to do so within the DDI framework? How would our
> efforts fit into the work that the Structural Reform Group is doing -- do
> we need to wait for the SRG to get further along in its work? If the DDI is
> going to end up as more modular -- and maybe that's not the right word or
> even the right concept, so I hope someone will correct me! -- then
> aggregate data could potentially be described by the appropriate aggregate
> data "module" -- e.g. there would be one module for fixed / delimited
> records, one module for bundled arrays, one for CUBE storage, etc?
>
> Thanks
> Julie
>
> At 01:14 PM 3/17/2004 -0600, Wendy Thomas wrote:
> >Hi,
> >
> >I could be wrong but I believe this is addressing the issue that aggregate
> >data is freqently held in 2 and 3 dimensional storage systems
> >(spreadsheets, layered spreadsheets) or bundled as data objects where a
> >"cell" contains an array of items in a fixed order. The locMap does not
> >address any new types of storage and DDI only address fixed and delimited
> >records. This is really a separate issue from aggregate data description
> >as any type of file could be stored in these alternative formats. So what
> >the locMap supplies is the link to the data item (cell) description by
> >giving you its matrix (nCube) number and cell coordinates. Currently the
> >phyLoc line can only provide a pointer to a fixed format or delimited
> >file. This is one of the problems for the "European aggregate" data. CBS
> >was using CUBE storage (3 dimensional) and I know that Jostein had
> >described their data storage system as one of those using bundled arrays.
> >
> >NHGIS is making use of the current aggregate description to search for
> >data items and tables, create the table template on the fly and populate
> >it with data from a fixed format data file containing multiple nCubes per
> >record of data. NESSTAR uses it for describing and manipulating files of
> >single nCubes for multiple locations.
> >
> >Wendy Thomas
> >
> >On Wed, 17 Mar 2004, Julie Linden wrote:
> >
> > > Hello everyone,
> > >
> > > I've been thinking about how to address the "aggregate" part of our
> > > Working Group's charge. The "Possible Configuration of DDI Working Groups"
> > > document that was distributed at the Expert Committee meeting in October
> > > states: "While considerable time and effort have already gone into the
> > > creation of an aggregate/tabular extension to the existing DDI
> > > specification (nCubes), there is concern that the aggregate model may be
> > > overly complex. The group needs to take a fresh look at this issue."
> > >
> > > As someone who is just beginning to get familiar with how the current DDI
> > > handles aggregate data, it's hard for me to begin envisioning how it could
> > > be simplified or overhauled. I thought that perhaps a starting point would
> > > be to review what concerns have been raised. I read through the Structural
> > > Reform Group's postings on ezboard, and found one comment that suggests a
> > > concern, but doesn't spell it out:
> > >
> > > "Logical  Physical file format mapping: How are the logical concepts in
> > > the DDI mapped to the underlying physical files? What kinds of physical
> > > file formats are there (rectangular, cards, SPSS, STATA, SAS, Census
> > > aggregate data, European aggregate data)? Should DDI even be tackling this
> > > question? There is an existing difference of opinion already regarding
> > > this in the nCubes specification."
> > >
> > > Can someone on this group describe the issues/concerns explicitly?
> > >
> > > thanks,
> > > Julie
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > DDI-ADG mailing list
> > > DDI-ADG at icpsr.umich.edu
> > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
> > >
> >
> >Wendy L. Thomas                          Phone: +1 612.624.4389
> >Data Access Core Director               Fax:   +1 612.626.8375
> >Minnesota Population Center              Email: wlt at pop.umn.edu
> >University of Minnesota
> >537 Heller Hall
> >271 19th Avenue South
> >Minneapolis, MN 55455
>
> _______________________________________________
> DDI-ADG mailing list
> DDI-ADG at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
537 Heller Hall
271 19th Avenue South
Minneapolis, MN 55455



More information about the DDI-ADG mailing list