[DDI-CDG] Notes on XInclude and DDI

Joachim Wackerow wackerow at zuma-mannheim.de
Thu Mar 4 09:20:44 EST 2004


Hi,

I would like to lay out some background to the XInclude mechanism and 
its sense for DDI in general and especially for the subjects "Family of 
data sets" and "Complex files".

XInclude is a include mechanism by W3C. Citation from
http://www.w3.org/TR/xinclude/
“Many programming languages provide an inclusion mechanism to facilitate 
modularity. Markup languages also often have need of such a mechanism. 
This specification introduces a generic mechanism for merging XML 
documents (as represented by their information sets) for use by 
applications that need such a facility. The syntax leverages existing 
XML constructs - elements, attributes, and URI references.”

The general idea is to include a whole XML document "b" in another XML 
document "a" whithout to have to copy the information physically into 
document a. The include is processed by the used XML parser. With a 
generic XSL stylesheet (a so called identity transformation) it would be 
possible to produce a document "a2" with all information physically in 
it. But this should be necessary only for rare cases, there should be no 
editing done in this generated document. By this include mechanism it is 
possible to store each information just once in a XML document, but it 
could be used in other XML documents. There is no risk to double the 
same information in several files, an important issue for updating DDI 
documents. See:
http://www.w3.org/TR/xinclude/#basic-example

Similar mechanims are existent in programming languages like the header 
files “.h” in “C”. In Microsoft Word also exists a similar mechanism to 
include graphic files. One could decide wether a whole graphic file is 
included into a Word document (in this case often the document size 
increases dramatically) or wether only a pointer to the external graphic 
file is included. With the second alternative each change in the 
external graphic file is immediately reflected in the Word document. In 
SPSS it is possible to include external command files in another command 
file by the command “INCLUDE FILE=”.

In addition it is possible with XInclude to include only parts of other 
XML files by the mechanism of XPointer, which itself uses XPath to 
address the fragments. A relative simple XPointer expression is 
possible, if one uses ID's of the other XML file (see the example 
included with the proposal). See: 
http://www.w3.org/TR/xinclude/#range-example

For complex files and family of datasets I think for every "study" in a 
logic sense one DDI codebook should exist. With hierarchical files one 
codebook for the person-level "study", one codebook for the 
household-level "study", one codebook for the constructed new 
person-level "study" with the added household information. With time 
series one codebook for each time ("individual study") and one codebook 
for the whole collection, the new "study" (for comparison analyses).

If a information is specific to a single "study", it should be stored in 
this DDI codebook. If a information is generic to a collection of 
studies or is specific to a new created "study", it should be stored in 
the corresponding DDI codebook.

By this mechanism it would be possible to weave a distributed set of DDI 
codebooks, which include parts of other codebooks or refer to parts of 
other codebooks. To refer to another XML file or an part of it, it 
exists the XLink/XPointer mechanism, which is a generic development of 
the HTML link concept (address: name="top1", link: href="url#tops").

So it is not necessary to develop a specific programming tool. An 
XInclude aware XML parser/XSL processor is able to do the work, if an 
XSL stylesheet is applied. Obviously it is necessary to develop 
stylesheets to transform the information of the DDI XML files into other 
formats like HTML, FO/PDF, whatever.

Attached you will find a generic stylesheet to produce a DDI codebook 
with all information physically in it (you could use it with the 
examples attached to the proposal), it is called an identity 
transformation. I used the XSL processor XSLTPROC, which has integrated 
the functionality of the whole XInclude specification.
See: http://xmlsoft.org/XSLT/
for Windows: http://www.zlatkovic.com/libxml.en.html

In general it seems desirable to allow XInclude for at least the major 
parts of DDI (docDscr, stdyDscr, dataDscr). By this means it would be 
possible to let edit several people several valid parts of the document. 
The several parts could be integrated in one master document per study 
by means of XInclude. From the application perspective it would be also 
more flexible to have different files for example for the study 
description and for the data description. A search in the study 
description would have to load only the study information not the whole 
DDI information.

Including the XInclude mechanism into DDI would be one step further into 
the direction of  standards, especially W3C standards. It would be not a 
step into a application-specific solution.

One actual problem could be that only some XSL processors / software (to 
my knowledge) are able to process XInclude commands: xsltproc, cocoon, 
xalan (no fragments), … But while the new version of DDI is developped, 
hopefully more XSL processors are including the XInclude mechanism.

I'm looking forward to your comments and to the discussion
Achim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: identity_transformation.xsl
Type: text/xml
Size: 433 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040304/4783af87/identity_transformation.xml


More information about the DDI-CDG mailing list