[DDI-ADG] Latest Spreadsheet
Mary Vardigan
vardigan at umich.edu
Tue Oct 11 16:07:17 EDT 2005
Kate, J, and others,
I hesitate to put in my two cents since I haven't been as involved in
this lately and may have missed some critical information, but I was
under the same impression as Kate that Module 3 was designed to hold
data values inline and not point to an external file. I know we are
really pressed for time, though, so rather than discuss this over email
or in a phone call, perhaps Sanda and I can raise it during the SRG
meeting next week and get clarification there. We will then report back
after the meeting. Does this work?
Mary
________________________________
From: ddi-adg-bounces at icpsr.umich.edu
[mailto:ddi-adg-bounces at icpsr.umich.edu] On Behalf Of Katherine
McNeill-Harman
Sent: Tuesday, October 11, 2005 2:29 PM
To: jgager at umich.edu; 'DDI-ADG'
Subject: RE: [DDI-ADG] Latest Spreadsheet
Comments w/in (others, please comment as well; the most significant item
starts w/a *** below):
At 12:59 PM 10/11/2005 -0400, J Gager wrote:
Kate -
Thanks for your comments. Please see responses below. In general, I
didn't think any change was significant enough to warrant further
discussion. If anyone is still uncomfortable with these changes after
this discussion, then we can schedule another meeting, but time is very
very short for me, and the write ups for this aggregate piece are far
more complicated and time consuming than I anticipated (there are a lot
of details that need to be cleary explained).
J
-----Original Message-----
From: Katherine McNeill-Harman [mailto:mcneillh at MIT.EDU]
Sent: Tuesday, October 11, 2005 12:03 PM
To: jgager at umich.edu; DDI-ADG
Subject: Re: [DDI-ADG] Latest Spreadsheet
J and others,
Have a couple of questions/concerns about this new sheet; would be
interested in others' opinions:
1) Can you explain a bit the reasons for changing the descriptions at
the top of each module sheet? I don't care that we use exactly the
words I drafted, but yours seem to have different meaning and I want to
make sure we're all on the same page. Namely,
- For modules 2 and 3, you seem to be emphasizing that it's for a
"single nCube structure"--can you expand upon what you mean by that?
What is meant by this for module 2 is that the file cannot contain
multiple cubes. For instance if there were 2 cubes, say population by
region, gender, and age (cube 1) and population by region and gender
(cube 2), the combination of these 2 cubes in the data file would look
something like this.
MN M 50- 5300
MN M 50+ 6700
MN M 12000
MN F 50- 6800
MN F 50+ 5000
MN F 11800
Module 2 does not support this. Its intention is describe a file where
all rows describe the same cube data.
The same thing applies for module 3, since it is grouped by nCube. You
would used module 3 to describe a single nCube at a time, and not a mix
of nCubes and non cubed data.
That's more clear, however, in module 2, how would one treat, e.g., a
spreadsheet file containing multiple sheets with a different cube on
each sheet?
- Also, I'd like to ask you to consider putting back some of the
wording I'd written for module 3 that makes it clear that the metadata
and data are in one single DDI file; I don't think that comes across in
your phrasing.
I think it is too constricting. It does allow for all to be in one
file, but don't we want to allow the data to also be used this way in a
seperate file?
***I believe that it's only the former, that if it lives in a separate
file it would be under module 1 or 2. Others, please confirm.
- Lastly, for module 1, I believe it'd be helpful to include some
explicit reference to the fact that the external file contains no
metadata (to distinguish it from module 2).
That may be misleading, since we are saying that attributes can exist in
there, which are technically metadata. The distinction lies in the fact
that module 1 data files do not state any of the cube coordinate values
in them.
I can live w/that if others don't have any other suggestions.
2) I'm not sure about including only the additions in the first sheet,
as I think the other fields provide helpful context. Might there be a
way to distinguish the new fields (as you had done, e.g., w/color) while
still keeping all of them?
It was really a time, and an issue of focusing attention. The model was
incomplete to start with, and the effort of flushing out all existing
things from the tag library, and putting in definitions for them is more
work than it would be worth (in my opinion).
Understand. I'd still lean the other way but will happily go w/the
group concensus.
3) Your change to module 1, while I have no objections, seems to be
significant enough that we should discuss it as a group. I don't quite
understand the purpose of this. Is this something you think you could
explain/we could discuss in more detail over email (or maybe phone)?
The purpose of the change is to basically not change what is already in
place. In speaking with Wendy, she pointed out how important it is to
many people marking up data files to do so in the order in which the
data occurs in the file. So there may be a mix of cubed and non cubed
data. Further more, module 1 did not allow for any non cubed data
(everything was grouped into a nCube container). The change simply
replaced the inclusion of a data item into an nCube by containership,
with inclusion by reference. The concept we initially had is still
there, just represented differently.
Sounds OK to me; I'll leave others to comment.
4) Plus a couple of other questions about the elements:
-- in describing the attribute location choice, you refer separately to
a data file vs. a spreadsheet. I understand what you're trying to do,
but am a little concerned about the mutually-exclusive manner in which
they're described (b/other places we use the term "data file" to include
all sorts of formats, including spreadsheets, and think it should still
keep that broad meaning). So I'd suggest changing the terms to say
something like "fixed-format/delimited data file" and "spreadsheet data
file" to distinguish the types to clarify that we consider them both
data files.
- In module 2 F18, what do you mean by " the structure describes all
data and meta data for the cube"--that sounds to me more like module 3.
- Module 3 F19; the notes contains a question; can that be deleted or
should it be moved to G19?
I will make these corrections.
When I agreed to cancelling today's meeting, I didn't realize that you'd
have such significant changes, so if it's best to discuss these over the
phone, maybe we can arrange another call.
Time is VERY critical. We are presenting this to the SRG in one week,
and need to send this out ASAP. I think the important thing is that we
have something for the group to work with. I just don't have the time
to finish the proposals AND meet.
Kate
P.S. Plus a couple of typos
- Module 1, F25, should be "measurement"--also applies to M2 F31
- Module 1, F20, ID should be capitalized
- Module 2, F20, should be "coordinates"
Will fix.
At 09:29 AM 10/11/2005 -0400, J Gager wrote:
All -
Here is the latest spreadsheet. Note there are few significant changes
that stemmed from a long discussion Wendy and I had.
The first is the Logical sheet. I have gone back to just including the
new fields. I felt it best to do this, since we weren't changing any
existing fields, and I want the focus to only be on these additions.
The second is the Physical Sheets - I have changed the name of these to
Record Layout, since that is what we a truly representing.
Finally, I have changed module 1, to allow for data items to exist
outside of nCubes. Basically what I have done is create a way to
reference an nCube and its attached attributes. The basic concept that
we had originally is still there, it is just less deviant from the
original, and oft used structure.
Please let me know of any structural issues ASAP as the samples and
write up is based on this.
J
_______________________________________________
DDI-ADG mailing list
DDI-ADG at icpsr.umich.edu
http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg
___________________________________________
Katherine McNeill-Harman
Data Services Librarian
Dewey Library for Management and Social Sciences
Massachusetts Institute of Technology
77 Massachusetts Avenue, E53-100
Cambridge, MA 02139
mcneillh at mit.edu
617-253-0787
___________________________________________
Katherine McNeill-Harman
Data Services Librarian
Dewey Library for Management and Social Sciences
Massachusetts Institute of Technology
77 Massachusetts Avenue, E53-100
Cambridge, MA 02139
mcneillh at mit.edu
617-253-0787
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-adg/attachments/20051011/8fbbbbfc/attachment-0001.html
More information about the DDI-ADG
mailing list