From wackerow at zuma-mannheim.de Fri Feb 6 10:03:10 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:07:59 2005 Subject: [DDI-CDG] Proposal of a structure for collections of surveys Message-ID: <4023ACAE.5090401@zuma-mannheim.de> DDI Expert Alliance Substantive Content Working Group Comparative Data/Families of Datasets Subgroup Joachim Wackerow, 2004-02-06 Proposal for a structure for collections of surveys Intention The intention is to provide a higher level of description for families of datasets with the integration of the current DDI structure and with use of common W3C XML technologies in a modular sense. The collection concept is located on top of the current DDI structure. The current DDI structure should only be changed as necessary to integrate this technologies. Concept Families of datasets could be collections of surveys on one specific axis like time, space, etc. The current DDI structure describes independent surveys. A collection is a set of several survey descriptions. The collection has several individual members of a family of surveys. The collection describes the context of all members and the common connection of the individual members. The relation from collection to surveys is the relation of a common subset of several sets to the sets themselves. Collection (common sub set) --------------------- | | | ------------ | | Survey 1 | | ------------ | | | ------------ | | Survey 2 | | ------------ | | --------------------- The collection has a list of the individual surveys. The collection schema itself integrates the current DDI structure to describe the information, which is common to the individual members of the collection. This could be a part of the study description as also the description of common variables. With a central description of common information there is no need of redundant information in the individual DDI descriptions. Collection (description of common information) ---------------------------------------------- | | | | Survey 1 Survey 2 (description of specific information on the individual surveys) To integrate a part of one XML file ?nto another, one could use the XInclude technology. With this mechanism it is possible to integrate the common information stored in the collection file into a DDI file. With XInclude one could integrate the information stored in one XML file into another XML file. In addition it is possible with XPointer/XPath expressions to integrate only a part of the information. Technical issues To realize this structure in XML it is necessary to have a XML Schema of DDI. This schema could be integrated in the new XML Schema of the collection. Furthermore the DDI XML schema needs to be extented for the XInclude mechanism. The DDI schema used in the collection schema must be changed in few points. The element "codeBook" should have the new attribute "level" with the value "collection_of_surveys" to distinguish it to codebooks on indivual surveys. The existing element "varGrp" with a variable list in the attribute "var" seems not appropriate for this use; a new element "NEW_varGrp" would be necessary to describe variable groups over the indivual DDI files. This element has the new subelement "NEW_variable_identifier" to point to variables in the individual DDI files. Attached are XML example files to illustrate the concept. These files are well-formed, working examples (for XInclude). Because of the new extensions it is not possible to validate them against the current DDI DTD. See attached example files: collection_of_surveys.xml MZ1996.xml Both schemas (collection and existing codebook) could be also integrated in a single XML schema. DocBook has a similar structure (set, book, chapter). See: http://www.docbook.org/tdg/en/html/set.html#d0e181280 MARC has the hierarchy: collection, record. See: http://www.loc.gov/standards/marcxml///xml/spy/spy.html Collection of collections Additionally it would be possible with the described structure to have a collection of collections (recursive pattern), which themselves have surveys as members. For example a collection in the time axis could have members of collections in the geography domain; this collections have individual surveys as members. Super Collection | --------------- | | Collection 1 Collection 2 | | | | Survey Level: 1a 1b 2a 2b See attached example files: collection_of_collections.xml Notes on XInclude In general the use of the XInclude mechanism has other benefits. It would be possible to organize parts of a single DDI description in separate files. This could be desirable at the editing process of extensive surveys: the study description and the data description could be edited and validated by different persons in separate files. For the purpose of organisation of many DDI structured information (XML databases, XML transformation, XML based searching), it would be also more flexible to have parts of a DDI file. To realize this mechanism it would be necessary to allow elements like "stdyDscr" as root element in addition to "codeBook". Similar in DocBook it is possible to have different elements as root element. Construction of new variables For the comparison of data it would nice to have a mechanism for the description of new constructed variables. Often variables of different surveys are only comparable by means of new constructed variables. To describe this construction (often conditionally) it is possible to integrate a foreign DTD/XML Schema like MathML in the DDI structure. Furthermore it would be possible with such a mechanism to generate for example a SPSS command setup with if-statements. See the discussion thread "Adding tags to the DDI" in January on the DDI users list, especially: http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000153.html http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000147.html http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000151.html http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000158.html One remark was, that MathML is complex and hard to read. Perhaps there exists another XML language to express formal conditional expressions like this. I think, it would be better to use an existing DTD/schema than to build a new language for this purpose. See attached example: single_study_mathml_ddi.xml I'm looking forward to your comments and to the discussion Best regards, Achim PS: After one week cross country skiing I'll be back to work at 2004-02-16 -------------- next part -------------- A non-text attachment was scrubbed... Name: collection_of_surveys.xml Type: text/xml Size: 1959 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_surveys.xml -------------- next part -------------- A non-text attachment was scrubbed... Name: MZ1996.xml Type: text/xml Size: 1787 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/MZ1996.xml -------------- next part -------------- A non-text attachment was scrubbed... Name: collection_of_collections.xml Type: text/xml Size: 1588 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_collections.xml -------------- next part -------------- A non-text attachment was scrubbed... Name: single_study_mathml_ddi.xml Type: text/xml Size: 2327 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/single_study_mathml_ddi.xml From maryv at icpsr.umich.edu Wed Feb 18 14:49:03 2004 From: maryv at icpsr.umich.edu (Mary Vardigan) Date: Tue Mar 8 09:08:00 2005 Subject: [DDI-CDG] DDI Developments Message-ID: <5.1.0.14.0.20040211164448.02922890@icpsr.umich.edu> Dear Expert Committee members, I hope you are all doing well as we near the end of February. This feels like such a long winter here in Michigan! There is a great deal of DDI activity to report: (1) Authoritative version of the DDI DTD. Mark Diggory has set up a Sourceforge site for the DDI project. The authoritative version of the specification will reside on the Sourceforge site, although the DDI Alliance site (www.ddialliance.org) will continue to serve as the distribution site. The Sourceforge site has excellent CVS (Concurrent Versions System) and issue tracking capabilities permitting tracking of revisions. https://sourceforge.net/projects/ddi-alliance (2) Process for handling minor changes. The Steering Committee is authorized through the Alliance Bylaws to set up an expedited process for handling minor changes to the specification. We have designed such a process and also have created a form to facilitate the submission of all requests for changes to the DDI specification. The form (a Word template), which contains an explanation of the expedited process, is available at: http://www.icpsr.umich.edu/DDI/users/dtd/index.html (3) DDI Licensing. The consensus in the Steering and Expert Committees is that this is an important issue to address. Mark Diggory continues to pursue this and has identified some organizations to contact for legal advice. (4) IASSIST 2004 DDI sessions. Several sessions and workshops at the upcoming IASSIST meeting (May 25-28) in Madison, WI, will be devoted to the DDI. There will be two companion workshops on Tuesday, May 25, intended to provide both introductory instruction in understanding and using the DDI (morning) and more advanced training (afternoon). The morning workshop needs volunteers to act as roaming experts to help participants in hands-on markup exercises. If you are interested in helping, please contact Bill Block (block@hist.umn.edu). Also, Wendy Thomas and the Structural Reform Group are organizing a session about the Alliance and its activities. This session will feature speakers from the Structural Reform, Substantive Content, and Usability Working Groups. Wendy's description of the session can be viewed under the Structural Reform Group's forum on ezboard. IASSIST 2004 will also feature some sessions planned by individuals not on the Expert Committee. (5) SRG Workplan and timeline. The Structural Reform Group has worked extremely hard to prepare a detailed workplan with an associated timeline. The workplan and related notes are now available on ezboard under the Structural Reform Group's forum. Please post comments on ezboard or send them to ddi-srg@icpsr.umich.edu. The workplan will be condensed and posted on the DDI site in coming weeks. (6) Upcoming DDI meeting. The Expert Committee is scheduled to meet after IASSIST on Saturday, May 29, in Madison. This is Memorial Day weekend in the U.S., which may make attendance difficult for some, but we do encourage everyone to try to attend. More details about the agenda will be available in the spring. I will send another DDI Digest message soon to update everyone on specific projects of the various Working Groups. Regards, Mary Mary Vardigan Director, Collection Delivery Inter-university Consortium for Political and Social Research (ICPSR) University of Michigan P.O. Box 1248, Ann Arbor, MI 48106-1248 Phone: 734-615-7908 Fax: 734-647-8200 www.icpsr.umich.edu From wackerow at zuma-mannheim.de Thu Feb 19 13:28:11 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:08:00 2005 Subject: [DDI-CDG] Proposal of a structure for collections of surveys In-Reply-To: <4023ACAE.5090401@zuma-mannheim.de> References: <4023ACAE.5090401@zuma-mannheim.de> Message-ID: <4035003B.1020009@zuma-mannheim.de> In addition to the proposal of 2004-02-06 I made a 'quick and dirty' example CDG DDI DTD. It was usefull for myself to clarify some points. It is not necessary to use XML Schema, as I thought. I think the structural reform group is planning to use XML Schema anyway. The example CDG DDI DTD is a super set of the DDI DTD V2.0. XInclude is included in two places. With this DTD it is possible to validate the attached example CDG DDI files. The XPointer expression is simple, if an ID attribute and the DTD is used. The attached ZIP file includes the following files: DDI-CDG.dtd - super set of DDI DTD version 2.0 DTD_diff.txt - Difference file to original DTD Tables.dtd - original Tables DTD, necessary for DDI DTD MZ1996_dtd.xml - individual codebook according DDI-CDG.dtd collection_of_surveys_dtd.xml - collection of surveys, codebook description as a subset of individual codebooks, according DDI-CDG.dtd collection_of_collections_dtd.xml - the same for collections of collections example.xsl - example stylesheet to extract information on a variable from a individual codebook; the information is actually stored in the collection codebook. example.bat - the command for the XSL transformer xsltproc (http://www.xmlsoft.org/) Kind regards Achim -------------- next part -------------- A non-text attachment was scrubbed... Name: CDG_proposal_2004-02-17.zip Type: application/x-zip-compressed Size: 45120 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040219/71be6322/CDG_proposal_2004-02-17.bin From wlt at pop.umn.edu Thu Feb 19 13:37:19 2004 From: wlt at pop.umn.edu (Wendy Thomas) Date: Tue Mar 8 09:08:00 2005 Subject: [DDI-CDG] Proposal of a structure for collections of surveys In-Reply-To: <4035003B.1020009@zuma-mannheim.de> Message-ID: Hi Everyone, I've been buried in work lately (and the work of the Structural Reform Group) so I haven't had a chance yet to delve into all your writings. I just wanted to provide an update on the direction the SRG is going in terms of schema. We are basically going towards multiple technical implementations of the DDI stucture. Whatever can be represented in both DTD and Schema will be. There may be more, or more tightly defined extensions in the Schema implemnetation, but we are not close to fleshing this out yet. How is everything in the varioius parts of Germany? I had hoped to get there this spring but there is no way I can get away. Maybe over the summer if everything works out. Now that I have a few things out of the way for SRG tasks, I'll turn my attention to this group. It will be a nice break for me :-) Best to all Wendy On Thu, 19 Feb 2004, Joachim Wackerow wrote: > In addition to the proposal of 2004-02-06 I made a 'quick and dirty' > example CDG DDI DTD. It was usefull for myself to clarify some points. > It is not necessary to use XML Schema, as I thought. I think the > structural reform group is planning to use XML Schema anyway. > > The example CDG DDI DTD is a super set of the DDI DTD V2.0. XInclude is > included in two places. With this DTD it is possible to validate the > attached example CDG DDI files. The XPointer expression is simple, if an > ID attribute and the DTD is used. > > The attached ZIP file includes the following files: > > DDI-CDG.dtd - super set of DDI DTD version 2.0 > DTD_diff.txt - Difference file to original DTD > Tables.dtd - original Tables DTD, necessary for DDI DTD > MZ1996_dtd.xml - individual codebook according DDI-CDG.dtd > collection_of_surveys_dtd.xml - collection of surveys, codebook > description as a subset of individual codebooks, according DDI-CDG.dtd > collection_of_collections_dtd.xml - the same for collections of collections > example.xsl - example stylesheet to extract information on a variable > from a individual codebook; the information is actually stored in the > collection codebook. > example.bat - the command for the XSL transformer xsltproc > (http://www.xmlsoft.org/) > > Kind regards Achim > Wendy L. Thomas Phone: +1 612.624.4389 Data Access Core Director Fax: +1 612.626.8375 Minnesota Population Center Email: wlt@pop.umn.edu University of Minnesota 537 Heller Hall 271 19th Avenue South Minneapolis, MN 55455 From wackerow at zuma-mannheim.de Fri Feb 20 09:17:35 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:08:01 2005 Subject: [DDI-CDG] Proposal of a structure for collections of surveys In-Reply-To: <4023ACAE.5090401@zuma-mannheim.de> References: <4023ACAE.5090401@zuma-mannheim.de> Message-ID: <403616FF.9030906@zuma-mannheim.de> To complete the technical proof of concept find attached files for a 'quick and dirty' example CDG DDI DTD with MathMl included for the DDI element catgry. DDI-MathML.dtd - DDI DTD version 2.0 including MathMl for catgry DDI-MathML_diff.txt - Difference file to original DTD Tables.dtd - original Tables DTD, necessary for DDI DTD single_study_mathml_ddi_dtd.xml - example codebook markup with a constructed variable, to validate with DDI-MathMl.dtd With this mechanism it should be possible to use a foreign DTD/Schema like MathMl for: - To document skip patterns - Construction of new variables - Modifying existing variables to adjust categories for comparison with other studies. Again it was usefull for myself to see the working example. Kind regards, Achim -------------- next part -------------- A non-text attachment was scrubbed... Name: CDG_proposal_2004-02-20.zip Type: application/x-zip-compressed Size: 41466 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040220/6734537f/CDG_proposal_2004-02-20.bin