[DDI-ADG] Latest Spreadsheet

Katherine McNeill-Harman mcneillh at MIT.EDU
Wed Oct 12 10:02:33 EDT 2005


For others' information, here's the response I got from J on the issue of 
how to describe module 3:

"The point I was trying to avoid is that module 3 *has* to exist in the 
same instance as the meta-data.  Although it most likely often will, it may 
not be the case.  I believe the model is constructed such that one could 
have a separate XML instance for each component (imagine the course of the 
life cycle of a study).  One could have defined the logical structure, and 
everything else in one instance when designing the study.  Some time later, 
the data may actually be collected, at which point they may create another 
instance (with module 3 containing the data), which references the earlier 
file containing the meta data. So they are separate files.  The key to 
module 3 is that the data doesn't exist external to the DDI instance 
describing the physical structure.  None the less, I am not worried about 
this point being lost on the SRG.  I will make sure that the final 
description accurately describes our intent."

Does this explain it to others?  I guess I understand it a bit more, but 
that does still seem to say that the data has to exist in the same instance 
as at least part of the metadata (the physical structure), even if you end 
up w/other DDI files applying to the study.

I just want to make sure that everyone is comfortable with what we're sending.
Kate

At 05:24 PM 10/11/2005 -0400, Katherine McNeill-Harman wrote:
>Understood--and others, don't be confused by the order of responses; I 
>believe J was responding to my question about a
>spreadsheet that contains multiple sheets, not to Ilona's comment about 
>Module 3.
>
>And I see that J also sent out a "final" version of the package to us; I 
>sent out a separate email directly to him suggesting that we stick w/what 
>seems to be our collective understanding of module 3 referring to a single 
>combined ddi/data file, and that--given the time pressure--it's late to be 
>recommending a change.  So I hope that feedback is taken and incorporated 
>into the truly final version sent on.
>
>Kate
>
>At 05:03 PM 10/11/2005 -0400, J Gager wrote:
>>No, module 2 describes a spreadsheet containing data, but it is a little
>>more limited than what you described.  Wendy and I discussed the case
>>you described below, and it is out of scope of our groups work, but it
>>is on the radar for the SRG.  This sort of information would be better
>>suited in the gross file description.
>>
>>Hypothetically, the spreadsheet you described below could contain a
>>section of data that could be used in module 2, however, the table must
>>contain data for only 1 nCube (see my response to Kate ealier for
>>claification on this).
>>
>>-----Original Message-----
>>From: Ilona Einowski [mailto:ilona_e at berkeley.edu]
>>Sent: Tuesday, October 11, 2005 5:01 PM
>>To: 'Katherine McNeill-Harman'; 'Wendy Thomas'; jgager at umich.edu
>>Cc: 'DDI-ADG'
>>Subject: RE: [DDI-ADG] Latest Spreadsheet
>>
>>
>>OK...and here is my 2cents....
>>
>>I though Module 3 was an example of a spreadsheet where the whole
>>shebang - table titles, row labels, column headers, data cells,
>>footnotes, etc were represented....
>>
>>Did I miss the boat on this????
>>
>>Ilona
>>
>>-----Original Message-----
>>From: ddi-adg-bounces at icpsr.umich.edu
>>[mailto:ddi-adg-bounces at icpsr.umich.edu] On Behalf Of Katherine
>>McNeill-Harman
>>Sent: Tuesday, October 11, 2005 1:37 PM
>>To: Wendy Thomas; jgager at umich.edu
>>Cc: 'DDI-ADG'
>>Subject: RE: [DDI-ADG] Latest Spreadsheet
>>
>>Based on J's response, I guess I would wonder when the data would not be
>>in the same DDI instance of the meta data.  You would have two DDI
>>metadata files and only one would have the data?  I'm having a hard time
>>conceptualizing this.  Know we're short on time, but think this is a
>>basic thing we should try to agree on quickly if possible (i.e. to get
>>on the same page ourselves about our recommendation as opposed to
>>waiting to talk to the SRG).
>>
>>Kate
>>
>>At 03:21 PM 10/11/2005 -0500, Wendy Thomas wrote:
>> >Module 1 describes data that resides in an external file of data only
>> >Module 2 describes data that resides in an external file that has both
>> >data and some level of metadata (category labels, title line etc)
>> >Module 3 describes a data that resides in the metadata, there is no
>> >external file
>> >
>> >At least that's the way it reads to me.
>> >
>> >wendy
>> >
>> >
>> >
>> >
>> >On Tue, 11 Oct 2005, J Gager wrote:
>> >
>> > > Module 3 *is* designed to hold data inline.  The point I was trying
>> > > to make is that I am not sure we want to say the data always has to
>> > > be in the same DDI instance of the meta data.  Module 3 does not
>> > > support any external data files.
>> > >
>> > > -----Original Message-----
>> > > From: Mary Vardigan [mailto:vardigan at umich.edu]
>> > > Sent: Tuesday, October 11, 2005 4:07 PM
>> > > To: Katherine McNeill-Harman; jgager at umich.edu; DDI-ADG
>> > > Subject: RE: [DDI-ADG] Latest Spreadsheet
>> > >
>> > >
>> > >
>> > > Kate, J, and others,
>> > >
>> > >
>> > >
>> > > I hesitate to put in my two cents since I haven't been as involved
>> > > in this lately and may have missed some critical information, but I
>> > > was under the same impression as Kate that Module 3 was designed to
>> > > hold data values inline and not point to an external file. I know we
>>
>> > > are really pressed for time, though, so rather than discuss this
>> > > over email or in a phone call, perhaps Sanda and I can raise it
>> > > during the SRG meeting next week and get clarification there. We
>> > > will then report back after the meeting. Does this work?
>> > >
>> > >
>> > >
>> > > Mary
>> > >
>> > >
>> > >   _____
>> > >
>> > >
>> > > From: ddi-adg-bounces at icpsr.umich.edu
>> > > [mailto:ddi-adg-bounces at icpsr.umich.edu] On Behalf Of Katherine
>> > > McNeill-Harman
>> > > Sent: Tuesday, October 11, 2005 2:29 PM
>> > > To: jgager at umich.edu; 'DDI-ADG'
>> > > Subject: RE: [DDI-ADG] Latest Spreadsheet
>> > >
>> > >
>> > >
>> > > Comments w/in (others, please comment as well; the most significant
>> > > item starts w/a *** below):
>> > >
>> > > At 12:59 PM 10/11/2005 -0400, J Gager wrote:
>> > >
>> > >
>> > >
>> > > Kate -
>> > >
>> > > Thanks for your comments.  Please see responses below.  In general,
>> > > I didn't think any change was significant enough to warrant further
>> > > discussion.  If anyone is still uncomfortable with these changes
>> > > after this discussion, then we can schedule another meeting, but
>> > > time is very very short for me, and the write ups for this aggregate
>>
>> > > piece are far more complicated and time consuming than I anticipated
>>
>> > > (there are a lot of details that need to be cleary explained).
>> > >
>> > > J
>> > >
>> > > -----Original Message-----
>> > >
>> > > From: Katherine McNeill-Harman [mailto:mcneillh at MIT.EDU]
>> > >
>> > > Sent: Tuesday, October 11, 2005 12:03 PM
>> > >
>> > > To: jgager at umich.edu; DDI-ADG
>> > >
>> > > Subject: Re: [DDI-ADG] Latest Spreadsheet
>> > >
>> > > J and others,
>> > >
>> > > Have a couple of questions/concerns about this new sheet; would be
>> > > interested in others' opinions:
>> > >
>> > > 1) Can you explain a bit the reasons for changing the descriptions
>> > > at the top of each module sheet?  I don't care that we use exactly
>> > > the words I drafted, but yours seem to have different meaning and I
>> > > want to make sure we're all on the same page.  Namely,
>> > >
>> > > - For modules 2 and 3, you seem to be emphasizing that it's for a
>> > > "single nCube structure"--can you expand upon what you mean by that?
>> > >
>> > >
>> > >
>> > > What is meant by this for module 2 is that the file cannot contain
>> > > multiple cubes.  For instance if there were 2 cubes, say population
>> > > by region, gender, and age (cube 1) and population by region and
>> > > gender (cube 2), the combination of these 2 cubes in the data file
>> > > would look something like this.
>> > >
>> > > MN    M    50-    5300
>> > >
>> > > MN    M    50+   6700
>> > >
>> > > MN    M    12000
>> > >
>> > > MN    F    50-    6800
>> > >
>> > > MN    F    50+    5000
>> > >
>> > > MN    F    11800
>> > >
>> > > Module 2 does not support this.  Its intention is describe a file
>> > > where all rows describe the same cube data.
>> > >
>> > >
>> > >
>> > > The same thing applies for module 3, since it is grouped by nCube.
>> > > You would used module 3 to describe a single nCube at a time, and
>> > > not a mix of nCubes and non cubed data.
>> > >
>> > >
>> > > That's more clear, however, in module 2, how would one treat, e.g.,
>> > > a spreadsheet file containing multiple sheets with a different cube
>> > > on each sheet?
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >  - Also, I'd like to ask you to consider putting back some of the
>> > > wording I'd written for module 3 that makes it clear that the
>> > > metadata and data are in one single DDI file; I don't think that
>> > > comes across in your phrasing.
>> > >
>> > >
>> > >
>> > > I think it is too constricting.  It does allow for all to be in one
>> > > file, but don't we want to allow the data to also be used this way
>> > > in a seperate file?
>> > >
>> > >
>> > > ***I believe that it's only the former, that if it lives in a
>> > > separate file it would be under module 1 or 2.  Others, please
>>confirm.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > - Lastly, for module 1, I believe it'd be helpful to include some
>> > > explicit reference to the fact that the external file contains no
>> > > metadata (to distinguish it from module 2).
>> > >
>> > >
>> > >
>> > > That may be misleading, since we are saying that attributes can
>> > > exist in there, which are technically metadata.  The distinction
>> > > lies in the fact that module 1 data files do not state any of the
>> > > cube coordinate values in them.
>> > >
>> > >
>> > > I can live w/that if others don't have any other suggestions.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 2) I'm not sure about including only the additions in the first
>> > > sheet, as I think the other fields provide helpful context.  Might
>> > > there be a way to distinguish the new fields (as you had done, e.g.,
>>
>> > > w/color) while still keeping all of them?
>> > >
>> > >
>> > >
>> > > It was really a time, and an issue of focusing attention.  The model
>> > > was incomplete to start with, and the effort of flushing out all
>> > > existing things from the tag library, and putting in definitions for
>>
>> > > them is more work than it would be worth (in my opinion).
>> > >
>> > >
>> > > Understand.  I'd still lean the other way but will happily go w/the
>> > > group concensus.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 3) Your change to module 1, while I have no objections, seems to be
>> > > significant enough that we should discuss it as a group.  I don't
>> > > quite understand the purpose of this.  Is this something you think
>> > > you could explain/we could discuss in more detail over email (or
>>maybe
>>phone)?
>> > >
>> > >
>> > >
>> > > The purpose of the change is to basically not change what is already
>> > > in place.  In speaking with Wendy, she pointed out how important it
>> > > is to many people marking up data files to do so in the order in
>> > > which the data occurs in the file.  So there may be a mix of cubed
>> > > and non cubed data.  Further more, module 1 did not allow for any
>> > > non cubed data (everything was grouped into a nCube container).  The
>>
>> > > change simply replaced the inclusion of a data item into an nCube by
>>
>> > > containership, with inclusion by reference.  The concept we
>> > > initially had is still there, just represented differently.
>> > >
>> > >
>> > > Sounds OK to me; I'll leave others to comment.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 4) Plus a couple of other questions about the elements:
>> > >
>> > > -- in describing the attribute location choice, you refer separately
>> > > to a data file vs. a spreadsheet.  I understand what you're trying
>> > > to do, but am a little concerned about the mutually-exclusive manner
>>
>> > > in which they're described (b/other places we use the term "data
>> > > file" to include all sorts of formats, including spreadsheets, and
>> > > think it should still keep that broad meaning).  So I'd suggest
>> > > changing the terms to say something like "fixed-format/delimited
>> > > data file" and "spreadsheet data file" to distinguish the types to
>> > > clarify that we consider them both data files.
>> > >
>> > > - In module 2 F18, what do you mean by " the structure describes all
>> > > data and meta data for the cube"--that sounds to me more like module
>>3.
>> > >
>> > > - Module 3 F19; the notes contains a question; can that be deleted
>> > > or should it be moved to G19?
>> > >
>> > >
>> > >
>> > > I will make these corrections.
>> > >
>> > >
>> > >
>> > > When I agreed to cancelling today's meeting, I didn't realize that
>> > > you'd have such significant changes, so if it's best to discuss
>> > > these over the phone, maybe we can arrange another call.
>> > >
>> > >
>> > >
>> > > Time is VERY critical.  We are presenting this to the SRG in one
>> > > week, and need to send this out ASAP.  I think the important thing
>> > > is that we have something for the group to work with.  I just don't
>> > > have the time to finish the proposals AND meet.
>> > >
>> > >
>> > >
>> > > Kate
>> > >
>> > > P.S.  Plus a couple of typos
>> > >
>> > > - Module 1, F25, should be "measurement"--also applies to M2 F31
>> > >
>> > > - Module 1, F20, ID should be capitalized
>> > >
>> > > - Module 2, F20, should be "coordinates"
>> > >
>> > >
>> > >
>> > > Will fix.
>> > >
>> > >
>> > >
>> > > At 09:29 AM 10/11/2005 -0400, J Gager wrote:
>> > >
>> > >
>> > >
>> > > All -
>> > >
>> > >
>> > >
>> > > Here is the latest spreadsheet.  Note there are few significant
>> > > changes that stemmed from a long discussion Wendy and I had.
>> > >
>> > >
>> > >
>> > > The first is the Logical sheet.  I have gone back to just including
>> > > the new fields.  I felt it best to do this, since we weren't
>> > > changing any existing fields, and I want the focus to only be on
>>these
>>additions.
>> > >
>> > >
>> > >
>> > > The second is the Physical Sheets - I have changed the name of these
>> > > to Record Layout, since that is what we a truly representing.
>> > >
>> > >
>> > >
>> > > Finally, I have changed module 1, to allow for data items to exist
>> > > outside of nCubes.  Basically what I have done is create a way to
>> > > reference an nCube and its attached attributes.  The basic concept
>> > > that we had originally is still there, it is just less deviant from
>> > > the original, and oft used structure.
>> > >
>> > >
>> > >
>> > > Please let me know of any structural issues ASAP as the samples and
>> > > write up is based on this.
>> > >
>> > >
>> > >
>> > > J
>> > >
>> > > _______________________________________________
>> > >
>> > > DDI-ADG mailing list
>> > >
>> > > DDI-ADG at icpsr.umich.edu
>> > >
>> > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
>> > >
>> > >
>> > >
>> > > ___________________________________________
>> > >
>> > > Katherine McNeill-Harman
>> > >
>> > > Data Services Librarian
>> > >
>> > > Dewey Library for Management and Social Sciences
>> > >
>> > > Massachusetts Institute of Technology
>> > >
>> > > 77 Massachusetts Avenue, E53-100
>> > >
>> > > Cambridge, MA 02139
>> > >
>> > > mcneillh at mit.edu
>> > >
>> > > 617-253-0787
>> > >
>> > > ___________________________________________
>> > > Katherine McNeill-Harman
>> > > Data Services Librarian
>> > > Dewey Library for Management and Social Sciences Massachusetts
>> > > Institute of Technology
>> > > 77 Massachusetts Avenue, E53-100
>> > > Cambridge, MA 02139
>> > > mcneillh at mit.edu
>> > > 617-253-0787
>> > >
>> > >
>> >
>> >Wendy L. Thomas                          Phone: +1 612.624.4389
>> >Data Access Core Director               Fax:   +1 612.626.8375
>> >Minnesota Population Center              Email: wlt at pop.umn.edu
>> >University of Minnesota
>> >50 Willey Hall
>> >225 19th Avenue South
>> >Minneapolis, MN 55455
>>
>>___________________________________________
>>Katherine McNeill-Harman
>>Data Services Librarian
>>Dewey Library for Management and Social Sciences Massachusetts Institute
>>of Technology 77 Massachusetts Avenue, E53-100 Cambridge, MA 02139
>>mcneillh at mit.edu 617-253-0787
>>
>>_______________________________________________
>>DDI-ADG mailing list
>>DDI-ADG at icpsr.umich.edu
>>http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
>
>___________________________________________
>Katherine McNeill-Harman
>Data Services Librarian
>Dewey Library for Management and Social Sciences
>Massachusetts Institute of Technology
>77 Massachusetts Avenue, E53-100
>Cambridge, MA 02139
>mcneillh at mit.edu
>617-253-0787
>_______________________________________________
>DDI-ADG mailing list
>DDI-ADG at icpsr.umich.edu
>http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg

___________________________________________
Katherine McNeill-Harman
Data Services Librarian
Dewey Library for Management and Social Sciences
Massachusetts Institute of Technology
77 Massachusetts Avenue, E53-100
Cambridge, MA 02139
mcneillh at mit.edu
617-253-0787 



More information about the DDI-ADG mailing list