From watteler at za.uni-koeln.de Fri May 7 08:36:14 2004 From: watteler at za.uni-koeln.de (Oliver Watteler) Date: Tue Mar 8 09:08:03 2005 Subject: [DDI-CDG] ACTION Message-ID: <200405071236.i47CaFdL225468@unix4.za.uni-koeln.de> Dear all, unfortunately, you haven't heard from me in a while. Sorry for that, but we have undergone a kind of internal revision by the German Federal Government. Well, this is a reason but no excuse, as we say. Two things need to be done urgently, since the IASSIST Conference is around the corner. First, I would like you to send me answers to the three questions I posed in January, which were the following: 1. Which comparative studies you are currently working with are supposed to be documented with DDI? 2. Which topics do you want to be integrated in the DDI-standard? 3. Do we want an enhanced DTD or could we think of a meta-DTD, which works as a kind of "parentheses" for single XML-documents of a study group or family? (In German something can be a parentheses for subordinate things, meaning that it holds them together. I could not think of an English equivalent for this phrase.) Second, I am preparing a draft of the scope paper we planned last year. This will be closely related to the proposals of the other groups, the revision maual and timeline drawn by the Alliance, and to the ideas disseminated by Achim Wackerow, Meinhard Moschner and me. Greetings from Cologne, Oliver. --- Central Archive for Empirical Social Res. Archiving and Documentation Bachemer Str.40 50931 K?ln Germany Tel.: ++49-221-47694-76 Fax.: ++49-221-47694-44 http://www.gesis.org/za/ From wackerow at zuma-mannheim.de Mon May 10 12:45:22 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:08:03 2005 Subject: [DDI-CDG] XInclude and DDI, unique =?windows-1252?q?ID=92s_over_several_XML?= =?windows-1252?q?_files?= Message-ID: <409FB1A2.5070407@zuma-mannheim.de> Dear all, Regarding my notes to ?XInclude and DDI? [1] Tom Piazza pointed me to the problem that ID's must be unique across all the members of a compound document composed by XInclude. I see the following ways to make sure that several files combined by XInclude have unique ID?s: 1. Organisational approach The codebook editor must obey the rule, that every id attribute must have a study prefix study (IDNo). For example: ID="V7" -> ID="MZ1996_V7" I think this approach is only practical in small working groups. We are practicing this alternative in another project. There we document survey instruments with DocBook. Each instrument is documented in a separate DocBook article instance. All documents are combined with XInclude in one DocBook book instance. 2. Formal approach by XSL A DDI instance could be rewritten by a XSL style sheet to rename all ID's in the mentioned way. Example stylesheet see the attachment. 3. Formal approach by XML Schema With XML Schema it is possible to set constraints on attributes [2]. Perhaps we could formalize the creation of the prefix by this means. I don't know a similar technique for a DTD. Perhaps it is possible to define a identity constraint on the attribute ?ID? with ?keyref? of XML Schema. Logic: New_ID = /codeBook/stdyDscr/citation/titlStmt/IDNo + ID Example: ?MZ1996_V7? = ?MZ1996? + ?_? + ?V7? References [1] Notes on XInclude and DDI http://lion.icpsr.umich.edu/pipermail/ddi-cdg/2004-March/000010.html [2] XML Schema Part 1: Structures, Identity-constraint Definitions http://www.w3.org/TR/xmlschema-1/#cIdentity-constraint_Definitions Regards, Achim -------------- next part -------------- A non-text attachment was scrubbed... Name: prefix_ID_with_IDNo.xsl Type: text/xml Size: 1519 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040510/e1ac874f/prefix_ID_with_IDNo.xml From watteler at za.uni-koeln.de Wed May 12 08:25:48 2004 From: watteler at za.uni-koeln.de (Oliver Watteler) Date: Tue Mar 8 09:08:04 2005 Subject: [DDI-CDG] Draft for message to Alliance Message-ID: <200405121225.i4CCPndL409636@unix4.za.uni-koeln.de> Dear colleagues, as announced last week I send you a brief summary of our past "work". It does not include all the technical details that still need to be discussed, but lists 11 tasks I deem necessary in order to set up our proposal. I know that some of you are involved in major projects and others are preparing work-shops or presentations for IASSIST. Still, I would appreciate a quick answer or comment on this summary, because it needs to be on Ilona Einowsky's desk on Friday. Thank you. Greetings from Cologne, yours Oliver. --- Central Archive for Empirical Social Res. Archiving and Documentation Bachemer Str.40 50931 K?ln Germany Tel.: ++49-221-47694-76 Fax.: ++49-221-47694-44 http://www.gesis.org/za/ -------------- next part -------------- A non-text attachment was scrubbed... Name: Proposal 1_draft_12052004.doc Type: application/msword Size: 53760 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040512/b581eb11/Proposal1_draft_12052004.doc From wackerow at zuma-mannheim.de Wed May 19 04:07:58 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:08:04 2005 Subject: [DDI-CDG] Thoughts on relation of files / overlap of CDG and CF Message-ID: <40AB15DE.8050106@zuma-mannheim.de> Dear all, I would like to express some quick thoughts in preparation of the discussion of the proposal of the complex file group. As Tom Piazza already mentioned months ago an important overlap exists between the CDG and the CF group (unfortunately I had not enough time to try a integrated model). The CDG groups primary task is to deal with the combination of data files on the variable level with the goal to compare groups of cases. This is done in vertical direction by harmonizing variables with the same meaning. The CF groups primary task is to deal with the combination of data files on the case level. This is done mainly in horizontal direction with more or less parallel data files by matching common ID variables. (A data file in this sense is a logical rectangular file, which could consist of several physical files). I would suggest folloing goals for describing the relation of files: - Clear description possibilities, easy to use for authors. - Use of common accepted formal language fragments with known semantics. - Precise formulation of relation with the intention that an application could generate the combined data file. The second point means, that we should build on common formal expressions of relations like SQL (join and union clause) or e.g. SPSS (match files, add files). From my perspective I see following central questions: - Should a codebook describe only a single study (what is a study)? - Should a codebook describe the relation of several studies? In addition: - Description of the outcome of the relation? - Integration of all the details of the included studies (e.g. with XInclude)? This arises the question, should we differenciate in a clear way between a simple codebook for one study and a "super codebook" for the relation of several studies. It is not clear to me how this approaches could integrated in the structure of the current DTD. With this background I have problems to understand why the current main parts (stdyDscr, dataDscr) are repeatable elements. I would prefer a structure for a "super codebook" like this: description of relation (optional reference to other codebooks) ... (in case of integration of description of every included study) Codebook 1 ... Codebook 2 ... The element "set" (or collection) could also expressed as "codeBook type='set'". For harmonizing variables (comparison purposes) and building new ID variables (matching cases purposes) I see the need for a means to describe the construction of new variables. Perhaps we could use MathML, instead of an own language for this purpose. Referencing other files If other "files" should referenced like codebooks or data files in general we should think in URI's not only in local files. For the naming of URI attributes we should use W3C standard XLink, e.g. the simple link attribute "xlink:href". Kind regards, Achim