From wackerow at zuma-mannheim.de Thu Mar 4 09:20:44 2004 From: wackerow at zuma-mannheim.de (Joachim Wackerow) Date: Tue Mar 8 09:08:02 2005 Subject: [DDI-CDG] Notes on XInclude and DDI Message-ID: <40473B3C.6040007@zuma-mannheim.de> Hi, I would like to lay out some background to the XInclude mechanism and its sense for DDI in general and especially for the subjects "Family of data sets" and "Complex files". XInclude is a include mechanism by W3C. Citation from http://www.w3.org/TR/xinclude/ ?Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This specification introduces a generic mechanism for merging XML documents (as represented by their information sets) for use by applications that need such a facility. The syntax leverages existing XML constructs - elements, attributes, and URI references.? The general idea is to include a whole XML document "b" in another XML document "a" whithout to have to copy the information physically into document a. The include is processed by the used XML parser. With a generic XSL stylesheet (a so called identity transformation) it would be possible to produce a document "a2" with all information physically in it. But this should be necessary only for rare cases, there should be no editing done in this generated document. By this include mechanism it is possible to store each information just once in a XML document, but it could be used in other XML documents. There is no risk to double the same information in several files, an important issue for updating DDI documents. See: http://www.w3.org/TR/xinclude/#basic-example Similar mechanims are existent in programming languages like the header files ?.h? in ?C?. In Microsoft Word also exists a similar mechanism to include graphic files. One could decide wether a whole graphic file is included into a Word document (in this case often the document size increases dramatically) or wether only a pointer to the external graphic file is included. With the second alternative each change in the external graphic file is immediately reflected in the Word document. In SPSS it is possible to include external command files in another command file by the command ?INCLUDE FILE=?. In addition it is possible with XInclude to include only parts of other XML files by the mechanism of XPointer, which itself uses XPath to address the fragments. A relative simple XPointer expression is possible, if one uses ID's of the other XML file (see the example included with the proposal). See: http://www.w3.org/TR/xinclude/#range-example For complex files and family of datasets I think for every "study" in a logic sense one DDI codebook should exist. With hierarchical files one codebook for the person-level "study", one codebook for the household-level "study", one codebook for the constructed new person-level "study" with the added household information. With time series one codebook for each time ("individual study") and one codebook for the whole collection, the new "study" (for comparison analyses). If a information is specific to a single "study", it should be stored in this DDI codebook. If a information is generic to a collection of studies or is specific to a new created "study", it should be stored in the corresponding DDI codebook. By this mechanism it would be possible to weave a distributed set of DDI codebooks, which include parts of other codebooks or refer to parts of other codebooks. To refer to another XML file or an part of it, it exists the XLink/XPointer mechanism, which is a generic development of the HTML link concept (address: name="top1", link: href="url#tops"). So it is not necessary to develop a specific programming tool. An XInclude aware XML parser/XSL processor is able to do the work, if an XSL stylesheet is applied. Obviously it is necessary to develop stylesheets to transform the information of the DDI XML files into other formats like HTML, FO/PDF, whatever. Attached you will find a generic stylesheet to produce a DDI codebook with all information physically in it (you could use it with the examples attached to the proposal), it is called an identity transformation. I used the XSL processor XSLTPROC, which has integrated the functionality of the whole XInclude specification. See: http://xmlsoft.org/XSLT/ for Windows: http://www.zlatkovic.com/libxml.en.html In general it seems desirable to allow XInclude for at least the major parts of DDI (docDscr, stdyDscr, dataDscr). By this means it would be possible to let edit several people several valid parts of the document. The several parts could be integrated in one master document per study by means of XInclude. From the application perspective it would be also more flexible to have different files for example for the study description and for the data description. A search in the study description would have to load only the study information not the whole DDI information. Including the XInclude mechanism into DDI would be one step further into the direction of standards, especially W3C standards. It would be not a step into a application-specific solution. One actual problem could be that only some XSL processors / software (to my knowledge) are able to process XInclude commands: xsltproc, cocoon, xalan (no fragments), ? But while the new version of DDI is developped, hopefully more XSL processors are including the XInclude mechanism. I'm looking forward to your comments and to the discussion Achim -------------- next part -------------- A non-text attachment was scrubbed... Name: identity_transformation.xsl Type: text/xml Size: 433 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040304/4783af87/identity_transformation.xml From wlt at pop.umn.edu Thu Mar 4 09:48:54 2004 From: wlt at pop.umn.edu (Wendy Thomas) Date: Tue Mar 8 09:08:02 2005 Subject: [DDI-CDG] Notes on XInclude and DDI In-Reply-To: <40473B3C.6040007@zuma-mannheim.de> Message-ID: I just wanted to let you know that I forwarded this to the SRG list. I thought they would be interested in the approach this group is taking. And Oliver, while I haven't been active in the discussion I am seriously lurking and catching up to where the group is. I've just been swamped with the SRG activities....having discovered that being a "vice chair" in this group means that, in actuality I'm chairing it and doing a big chunk of the writing. wendy Wendy L. Thomas Phone: +1 612.624.4389 Data Access Core Director Fax: +1 612.626.8375 Minnesota Population Center Email: wlt@pop.umn.edu University of Minnesota 537 Heller Hall 271 19th Avenue South Minneapolis, MN 55455