From ilona_e at uclink.berkeley.edu Mon Mar 1 13:55:08 2004 From: ilona_e at uclink.berkeley.edu (Ilona Einowski) Date: Mon Mar 1 13:55:23 2004 Subject: [DDI-ADG] Resending message with reports Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: DDI-GEO1 from Atle.doc Type: application/msword Size: 79360 bytes Desc: not available Url : http://lion.icpsr.umich.edu/pipermail/ddi-adg/attachments/20040301/2acdf57d/DDI-GEO1fromAtle-0001.doc -------------- next part -------------- Relevance of SDMX to the DDI Expert Committee: Fredric Gey 1/26/2004 BACKGROUND: During the first meeting of the DDI Expert Committee in mid-October 2003, an XML expert (Arofan Gregory, AEON Consulting) attended from the SDMX project. The existence of SDMX was unknown by most members of the Expert Committee, so I volunteered to investigate its relevance to the DDI. What is SDMX? SDMX stands for Statistical Data and Metadata Exchange. It is an initiative of statistical agencies of six organizations: BIS (Bank for International Settlements), ECB (European Central Bank), EuroStat (European Commission Statistical Agency), IMF (International Monetary Fund) and UN (United Nations). Its goal is to create standards for exchange of data (and accompanying metadata) between these agencies. >From a DDI perspective, its two most relevant projects are the: - Batch time series data exchange - Metadata common vocabulary Both of these projects seem to have been underway for almost two years and have developed extensive draft documents of process and terminology for statistical data exchange (see http://www.sdmx.org/General/Projects/GesmesTS_rel3.pdf and http://www.sdmx.org/General/Projects/MCV-draft-20031001.pdf , 192 page and 111 page documents. However, in addition the initiative has a project which has developed a detailed information model to support the data exchange efforts, something which DDI experts agree needs to be added to the DDI. FOCUS OF SDMX: The focus of the agencies is primarily the exchange of economic data that is time-sequenced (i.e time series data). As such, most (but not all) of the data is aggregated from primary sources (i.e. it is not microdata). Thus SDMX is most applicable to the aggregated n-cube specification of the DDI, although certain of their detailed attributes (unit of measure, collection details of when collected and whether averaged over a period) may be applicable to microdata documentation as well. The Batch time series data exchange document presents concepts which do not seem to have been dealt with in the DDI. Among them are frequency (i.e. is the data collected daily, monthly, annually, etc), unit of measure (i.e. currency unit -- different but related to the DDI Analysis Unit), unit multiplier (i.e. is the actual data recorded in thousands, millions, etc). To quote from the first document above: "In general, some statistical concepts are necessary across all key families to qualify the contained information. These are: · Reference area · Frequency (always a dimension) · Descriptive title (see also comment below) · Collection (e.g., end of period, averaged, or summed over period) · Unit (e.g., currency of denomination) · Unit multiplier (e.g., expressed in millions) · Availability (which institutions can a series become available to) · Decimals (i.e., number of decimal digits used in a time series) · Observation status (e.g., estimate, provisional, normal) Therefore, those concepts that are not dimensions within a key family have to be present in that key family as mandatory attributes." RELEVANCE OF SDMX: The SDMX initiative has direct relevance to the DDI because data may become available to Data Archives from these agencies in the near future. We have a chance to influence the development of SDMX by commenting on draft documents and by incorporating into future versions of the DDI those SDMX concepts and definitions which extend its usefulness for archiving and data exchange. In addition SDMX has developed some prototype information models which DDI should be able to utilize in its own model development From julie.linden at yale.edu Wed Mar 17 14:03:01 2004 From: julie.linden at yale.edu (Julie Linden) Date: Wed Mar 17 14:03:07 2004 Subject: [DDI-ADG] Concerns about nCube spec? Message-ID: Hello everyone, I've been thinking about how to address the "aggregate" part of our Working Group's charge. The "Possible Configuration of DDI Working Groups" document that was distributed at the Expert Committee meeting in October states: "While considerable time and effort have already gone into the creation of an aggregate/tabular extension to the existing DDI specification (nCubes), there is concern that the aggregate model may be overly complex. The group needs to take a fresh look at this issue." As someone who is just beginning to get familiar with how the current DDI handles aggregate data, it's hard for me to begin envisioning how it could be simplified or overhauled. I thought that perhaps a starting point would be to review what concerns have been raised. I read through the Structural Reform Group's postings on ezboard, and found one comment that suggests a concern, but doesn't spell it out: "Logical Physical file format mapping: How are the logical concepts in the DDI mapped to the underlying physical files? What kinds of physical file formats are there (rectangular, cards, SPSS, STATA, SAS, Census aggregate data, European aggregate data)? Should DDI even be tackling this question? There is an existing difference of opinion already regarding this in the nCubes specification." Can someone on this group describe the issues/concerns explicitly? thanks, Julie From wlt at pop.umn.edu Wed Mar 17 14:14:59 2004 From: wlt at pop.umn.edu (Wendy Thomas) Date: Wed Mar 17 14:15:16 2004 Subject: [DDI-ADG] Concerns about nCube spec? In-Reply-To: Message-ID: Hi, I could be wrong but I believe this is addressing the issue that aggregate data is freqently held in 2 and 3 dimensional storage systems (spreadsheets, layered spreadsheets) or bundled as data objects where a "cell" contains an array of items in a fixed order. The locMap does not address any new types of storage and DDI only address fixed and delimited records. This is really a separate issue from aggregate data description as any type of file could be stored in these alternative formats. So what the locMap supplies is the link to the data item (cell) description by giving you its matrix (nCube) number and cell coordinates. Currently the phyLoc line can only provide a pointer to a fixed format or delimited file. This is one of the problems for the "European aggregate" data. CBS was using CUBE storage (3 dimensional) and I know that Jostein had described their data storage system as one of those using bundled arrays. NHGIS is making use of the current aggregate description to search for data items and tables, create the table template on the fly and populate it with data from a fixed format data file containing multiple nCubes per record of data. NESSTAR uses it for describing and manipulating files of single nCubes for multiple locations. Wendy Thomas On Wed, 17 Mar 2004, Julie Linden wrote: > Hello everyone, > > I've been thinking about how to address the "aggregate" part of our > Working Group's charge. The "Possible Configuration of DDI Working Groups" > document that was distributed at the Expert Committee meeting in October > states: "While considerable time and effort have already gone into the > creation of an aggregate/tabular extension to the existing DDI > specification (nCubes), there is concern that the aggregate model may be > overly complex. The group needs to take a fresh look at this issue." > > As someone who is just beginning to get familiar with how the current DDI > handles aggregate data, it's hard for me to begin envisioning how it could > be simplified or overhauled. I thought that perhaps a starting point would > be to review what concerns have been raised. I read through the Structural > Reform Group's postings on ezboard, and found one comment that suggests a > concern, but doesn't spell it out: > > "Logical Physical file format mapping: How are the logical concepts in > the DDI mapped to the underlying physical files? What kinds of physical > file formats are there (rectangular, cards, SPSS, STATA, SAS, Census > aggregate data, European aggregate data)? Should DDI even be tackling this > question? There is an existing difference of opinion already regarding > this in the nCubes specification." > > Can someone on this group describe the issues/concerns explicitly? > > thanks, > Julie > > > > > _______________________________________________ > DDI-ADG mailing list > DDI-ADG@icpsr.umich.edu > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > Wendy L. Thomas Phone: +1 612.624.4389 Data Access Core Director Fax: +1 612.626.8375 Minnesota Population Center Email: wlt@pop.umn.edu University of Minnesota 537 Heller Hall 271 19th Avenue South Minneapolis, MN 55455 From julie.linden at yale.edu Fri Mar 19 11:47:31 2004 From: julie.linden at yale.edu (Julie Linden) Date: Fri Mar 19 11:47:36 2004 Subject: [DDI-ADG] Concerns about nCube spec? In-Reply-To: References: Message-ID: <5.2.0.9.2.20040319114226.01b976a0@netid.mail.yale.edu> Thanks for the response. I'm just trying to understand the issues here; I apologize to others for whom these issues may be very clear and familiar. Would one approach for our group be to try to figure out how to describe these other types of storage systems that the DDI does not currently cover? And if so, would we try to do so within the DDI framework? How would our efforts fit into the work that the Structural Reform Group is doing -- do we need to wait for the SRG to get further along in its work? If the DDI is going to end up as more modular -- and maybe that's not the right word or even the right concept, so I hope someone will correct me! -- then aggregate data could potentially be described by the appropriate aggregate data "module" -- e.g. there would be one module for fixed / delimited records, one module for bundled arrays, one for CUBE storage, etc? Thanks Julie At 01:14 PM 3/17/2004 -0600, Wendy Thomas wrote: >Hi, > >I could be wrong but I believe this is addressing the issue that aggregate >data is freqently held in 2 and 3 dimensional storage systems >(spreadsheets, layered spreadsheets) or bundled as data objects where a >"cell" contains an array of items in a fixed order. The locMap does not >address any new types of storage and DDI only address fixed and delimited >records. This is really a separate issue from aggregate data description >as any type of file could be stored in these alternative formats. So what >the locMap supplies is the link to the data item (cell) description by >giving you its matrix (nCube) number and cell coordinates. Currently the >phyLoc line can only provide a pointer to a fixed format or delimited >file. This is one of the problems for the "European aggregate" data. CBS >was using CUBE storage (3 dimensional) and I know that Jostein had >described their data storage system as one of those using bundled arrays. > >NHGIS is making use of the current aggregate description to search for >data items and tables, create the table template on the fly and populate >it with data from a fixed format data file containing multiple nCubes per >record of data. NESSTAR uses it for describing and manipulating files of >single nCubes for multiple locations. > >Wendy Thomas > >On Wed, 17 Mar 2004, Julie Linden wrote: > > > Hello everyone, > > > > I've been thinking about how to address the "aggregate" part of our > > Working Group's charge. The "Possible Configuration of DDI Working Groups" > > document that was distributed at the Expert Committee meeting in October > > states: "While considerable time and effort have already gone into the > > creation of an aggregate/tabular extension to the existing DDI > > specification (nCubes), there is concern that the aggregate model may be > > overly complex. The group needs to take a fresh look at this issue." > > > > As someone who is just beginning to get familiar with how the current DDI > > handles aggregate data, it's hard for me to begin envisioning how it could > > be simplified or overhauled. I thought that perhaps a starting point would > > be to review what concerns have been raised. I read through the Structural > > Reform Group's postings on ezboard, and found one comment that suggests a > > concern, but doesn't spell it out: > > > > "Logical Physical file format mapping: How are the logical concepts in > > the DDI mapped to the underlying physical files? What kinds of physical > > file formats are there (rectangular, cards, SPSS, STATA, SAS, Census > > aggregate data, European aggregate data)? Should DDI even be tackling this > > question? There is an existing difference of opinion already regarding > > this in the nCubes specification." > > > > Can someone on this group describe the issues/concerns explicitly? > > > > thanks, > > Julie > > > > > > > > > > _______________________________________________ > > DDI-ADG mailing list > > DDI-ADG@icpsr.umich.edu > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > > > >Wendy L. Thomas Phone: +1 612.624.4389 >Data Access Core Director Fax: +1 612.626.8375 >Minnesota Population Center Email: wlt@pop.umn.edu >University of Minnesota >537 Heller Hall >271 19th Avenue South >Minneapolis, MN 55455 From wlt at pop.umn.edu Fri Mar 19 12:06:39 2004 From: wlt at pop.umn.edu (Wendy Thomas) Date: Fri Mar 19 12:06:51 2004 Subject: [DDI-ADG] Concerns about nCube spec? In-Reply-To: <5.2.0.9.2.20040319114226.01b976a0@netid.mail.yale.edu> Message-ID: Actually, I discussed this with the SRG yesterday during our weekly meeting. This is really a storage issue and not an aggregate data issue. Any data can be stored in a variety of formats not currently defined within DDI. Even I-lin (the author of the statement in question) agreed that it has no more to do with aggregate data description than it has to do with microdata description, its just that aggregate data was the context within which he first heard about it. There are a number of things about the basic aggregate description model that can be improved. I can certainly try to come up with a quick list as we've been using it for a few years now and have run into a lot of refinements we'd like to have. Basically I suggest that the group brainstorm functionality that they'd like to have, relationships that need to be defined etc. and work from that to determine what the current model doesn't handle or what it could handle better. Also, there should be a Manual for Proposal Development coming out from Mary, Tom and SRG within a month which should also help the group frame the discussion and work. I'd really say let the description of other storage methods sit for a bit. The group already has three big chunks to deal with (aggregate, geography, time) and believe me, having made a stab at describing generic 2 and 3 dimensional storage (from hense we have the element "basic layer sheet" or supply your own vowels) you can get a sense of the framer's state of mind by the end of the day. Its really possible that the discussions of the SRG over the next few months will provide a better framework for dealing with the question of storage description. Don't intend to throw a wet blanket on this. If you all feel that this has priority over the other areas, go for it. You should just be aware that given the length of the review process and the date of Version 3.0 you would need to have any proposals described and ready to begin the review process by September 2004 for inclusion in Version 3.0 (assuming it moves smoothly through review). I think it would be good to have dealt with at least 2 or the 3 areas of this group in Version 3.0. Wendy On Fri, 19 Mar 2004, Julie Linden wrote: > Thanks for the response. I'm just trying to understand the issues here; I > apologize to others for whom these issues may be very clear and familiar. > Would one approach for our group be to try to figure out how to describe > these other types of storage systems that the DDI does not currently cover? > And if so, would we try to do so within the DDI framework? How would our > efforts fit into the work that the Structural Reform Group is doing -- do > we need to wait for the SRG to get further along in its work? If the DDI is > going to end up as more modular -- and maybe that's not the right word or > even the right concept, so I hope someone will correct me! -- then > aggregate data could potentially be described by the appropriate aggregate > data "module" -- e.g. there would be one module for fixed / delimited > records, one module for bundled arrays, one for CUBE storage, etc? > > Thanks > Julie > > At 01:14 PM 3/17/2004 -0600, Wendy Thomas wrote: > >Hi, > > > >I could be wrong but I believe this is addressing the issue that aggregate > >data is freqently held in 2 and 3 dimensional storage systems > >(spreadsheets, layered spreadsheets) or bundled as data objects where a > >"cell" contains an array of items in a fixed order. The locMap does not > >address any new types of storage and DDI only address fixed and delimited > >records. This is really a separate issue from aggregate data description > >as any type of file could be stored in these alternative formats. So what > >the locMap supplies is the link to the data item (cell) description by > >giving you its matrix (nCube) number and cell coordinates. Currently the > >phyLoc line can only provide a pointer to a fixed format or delimited > >file. This is one of the problems for the "European aggregate" data. CBS > >was using CUBE storage (3 dimensional) and I know that Jostein had > >described their data storage system as one of those using bundled arrays. > > > >NHGIS is making use of the current aggregate description to search for > >data items and tables, create the table template on the fly and populate > >it with data from a fixed format data file containing multiple nCubes per > >record of data. NESSTAR uses it for describing and manipulating files of > >single nCubes for multiple locations. > > > >Wendy Thomas > > > >On Wed, 17 Mar 2004, Julie Linden wrote: > > > > > Hello everyone, > > > > > > I've been thinking about how to address the "aggregate" part of our > > > Working Group's charge. The "Possible Configuration of DDI Working Groups" > > > document that was distributed at the Expert Committee meeting in October > > > states: "While considerable time and effort have already gone into the > > > creation of an aggregate/tabular extension to the existing DDI > > > specification (nCubes), there is concern that the aggregate model may be > > > overly complex. The group needs to take a fresh look at this issue." > > > > > > As someone who is just beginning to get familiar with how the current DDI > > > handles aggregate data, it's hard for me to begin envisioning how it could > > > be simplified or overhauled. I thought that perhaps a starting point would > > > be to review what concerns have been raised. I read through the Structural > > > Reform Group's postings on ezboard, and found one comment that suggests a > > > concern, but doesn't spell it out: > > > > > > "Logical Physical file format mapping: How are the logical concepts in > > > the DDI mapped to the underlying physical files? What kinds of physical > > > file formats are there (rectangular, cards, SPSS, STATA, SAS, Census > > > aggregate data, European aggregate data)? Should DDI even be tackling this > > > question? There is an existing difference of opinion already regarding > > > this in the nCubes specification." > > > > > > Can someone on this group describe the issues/concerns explicitly? > > > > > > thanks, > > > Julie > > > > > > > > > > > > > > > _______________________________________________ > > > DDI-ADG mailing list > > > DDI-ADG@icpsr.umich.edu > > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > > > > > > >Wendy L. Thomas Phone: +1 612.624.4389 > >Data Access Core Director Fax: +1 612.626.8375 > >Minnesota Population Center Email: wlt@pop.umn.edu > >University of Minnesota > >537 Heller Hall > >271 19th Avenue South > >Minneapolis, MN 55455 > > _______________________________________________ > DDI-ADG mailing list > DDI-ADG@icpsr.umich.edu > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > Wendy L. Thomas Phone: +1 612.624.4389 Data Access Core Director Fax: +1 612.626.8375 Minnesota Population Center Email: wlt@pop.umn.edu University of Minnesota 537 Heller Hall 271 19th Avenue South Minneapolis, MN 55455 From julie.linden at yale.edu Fri Mar 19 12:37:50 2004 From: julie.linden at yale.edu (Julie Linden) Date: Fri Mar 19 12:38:06 2004 Subject: [DDI-ADG] Concerns about nCube spec? In-Reply-To: Message-ID: Thanks again for your comments, Wendy. I don't feel that it's a wet blanket at all -- instead, it helps to clarify where the AGT group can best focus its efforts. I particularly like your suggestion: Basically I suggest that the group > brainstorm functionality that they'd like to have, relationships that need > to be defined etc. and work from that to determine what the current model > doesn't handle or what it could handle better. ...and would be very happy to see your list of desired refinements. Do others think this is a good way to start? If so, perhaps we can move the conversation over to ezBoard. thanks, Julie On Fri, 19 Mar 2004, Wendy Thomas wrote: > Actually, I discussed this with the SRG yesterday during our weekly > meeting. This is really a storage issue and not an aggregate data issue. > Any data can be stored in a variety of formats not currently defined > within DDI. Even I-lin (the author of the statement in question) agreed > that it has no more to do with aggregate data description than it has to > do with microdata description, its just that aggregate data was the > context within which he first heard about it. > > There are a number of things about the basic aggregate description model > that can be improved. I can certainly try to come up with a quick list as > we've been using it for a few years now and have run into a lot of > refinements we'd like to have. Basically I suggest that the group > brainstorm functionality that they'd like to have, relationships that need > to be defined etc. and work from that to determine what the current model > doesn't handle or what it could handle better. Also, there should be a > Manual for Proposal Development coming out from Mary, Tom and SRG within a > month which should also help the group frame the discussion and work. > > I'd really say let the description of other storage methods sit for a bit. > The group already has three big chunks to deal with (aggregate, geography, > time) and believe me, having made a stab at describing generic 2 and 3 > dimensional storage (from hense we have the element "basic layer > sheet" or supply your own vowels) you can get a sense of the framer's > state of mind by the end of the day. Its really possible that the > discussions of the SRG over the next few months will provide a better > framework for dealing with the question of storage description. > > Don't intend to throw a wet blanket on this. If you all feel that this has > priority over the other areas, go for it. You should just be aware that > given the length of the review process and the date of Version 3.0 you > would need to have any proposals described and ready to begin the review > process by September 2004 for inclusion in Version 3.0 (assuming it moves > smoothly through review). I think it would be good to have dealt with at > least 2 or the 3 areas of this group in Version 3.0. > > Wendy > > > > On Fri, 19 Mar 2004, Julie Linden wrote: > > > Thanks for the response. I'm just trying to understand the issues here; I > > apologize to others for whom these issues may be very clear and familiar. > > Would one approach for our group be to try to figure out how to describe > > these other types of storage systems that the DDI does not currently cover? > > And if so, would we try to do so within the DDI framework? How would our > > efforts fit into the work that the Structural Reform Group is doing -- do > > we need to wait for the SRG to get further along in its work? If the DDI is > > going to end up as more modular -- and maybe that's not the right word or > > even the right concept, so I hope someone will correct me! -- then > > aggregate data could potentially be described by the appropriate aggregate > > data "module" -- e.g. there would be one module for fixed / delimited > > records, one module for bundled arrays, one for CUBE storage, etc? > > > > Thanks > > Julie > > > > At 01:14 PM 3/17/2004 -0600, Wendy Thomas wrote: > > >Hi, > > > > > >I could be wrong but I believe this is addressing the issue that aggregate > > >data is freqently held in 2 and 3 dimensional storage systems > > >(spreadsheets, layered spreadsheets) or bundled as data objects where a > > >"cell" contains an array of items in a fixed order. The locMap does not > > >address any new types of storage and DDI only address fixed and delimited > > >records. This is really a separate issue from aggregate data description > > >as any type of file could be stored in these alternative formats. So what > > >the locMap supplies is the link to the data item (cell) description by > > >giving you its matrix (nCube) number and cell coordinates. Currently the > > >phyLoc line can only provide a pointer to a fixed format or delimited > > >file. This is one of the problems for the "European aggregate" data. CBS > > >was using CUBE storage (3 dimensional) and I know that Jostein had > > >described their data storage system as one of those using bundled arrays. > > > > > >NHGIS is making use of the current aggregate description to search for > > >data items and tables, create the table template on the fly and populate > > >it with data from a fixed format data file containing multiple nCubes per > > >record of data. NESSTAR uses it for describing and manipulating files of > > >single nCubes for multiple locations. > > > > > >Wendy Thomas > > > > > >On Wed, 17 Mar 2004, Julie Linden wrote: > > > > > > > Hello everyone, > > > > > > > > I've been thinking about how to address the "aggregate" part of our > > > > Working Group's charge. The "Possible Configuration of DDI Working Groups" > > > > document that was distributed at the Expert Committee meeting in October > > > > states: "While considerable time and effort have already gone into the > > > > creation of an aggregate/tabular extension to the existing DDI > > > > specification (nCubes), there is concern that the aggregate model may be > > > > overly complex. The group needs to take a fresh look at this issue." > > > > > > > > As someone who is just beginning to get familiar with how the current DDI > > > > handles aggregate data, it's hard for me to begin envisioning how it could > > > > be simplified or overhauled. I thought that perhaps a starting point would > > > > be to review what concerns have been raised. I read through the Structural > > > > Reform Group's postings on ezboard, and found one comment that suggests a > > > > concern, but doesn't spell it out: > > > > > > > > "Logical Physical file format mapping: How are the logical concepts in > > > > the DDI mapped to the underlying physical files? What kinds of physical > > > > file formats are there (rectangular, cards, SPSS, STATA, SAS, Census > > > > aggregate data, European aggregate data)? Should DDI even be tackling this > > > > question? There is an existing difference of opinion already regarding > > > > this in the nCubes specification." > > > > > > > > Can someone on this group describe the issues/concerns explicitly? > > > > > > > > thanks, > > > > Julie > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > DDI-ADG mailing list > > > > DDI-ADG@icpsr.umich.edu > > > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > > > > > > > > > >Wendy L. Thomas Phone: +1 612.624.4389 > > >Data Access Core Director Fax: +1 612.626.8375 > > >Minnesota Population Center Email: wlt@pop.umn.edu > > >University of Minnesota > > >537 Heller Hall > > >271 19th Avenue South > > >Minneapolis, MN 55455 > > > > _______________________________________________ > > DDI-ADG mailing list > > DDI-ADG@icpsr.umich.edu > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg > > > > Wendy L. Thomas Phone: +1 612.624.4389 > Data Access Core Director Fax: +1 612.626.8375 > Minnesota Population Center Email: wlt@pop.umn.edu > University of Minnesota > 537 Heller Hall > 271 19th Avenue South > Minneapolis, MN 55455 >