[DDI-CDG] Notes on XInclude and DDI
Joachim Wackerow
wackerow at zuma-mannheim.de
Thu Mar 4 09:20:44 EST 2004
Hi,
I would like to lay out some background to the XInclude mechanism and
its sense for DDI in general and especially for the subjects "Family of
data sets" and "Complex files".
XInclude is a include mechanism by W3C. Citation from
http://www.w3.org/TR/xinclude/
“Many programming languages provide an inclusion mechanism to facilitate
modularity. Markup languages also often have need of such a mechanism.
This specification introduces a generic mechanism for merging XML
documents (as represented by their information sets) for use by
applications that need such a facility. The syntax leverages existing
XML constructs - elements, attributes, and URI references.”
The general idea is to include a whole XML document "b" in another XML
document "a" whithout to have to copy the information physically into
document a. The include is processed by the used XML parser. With a
generic XSL stylesheet (a so called identity transformation) it would be
possible to produce a document "a2" with all information physically in
it. But this should be necessary only for rare cases, there should be no
editing done in this generated document. By this include mechanism it is
possible to store each information just once in a XML document, but it
could be used in other XML documents. There is no risk to double the
same information in several files, an important issue for updating DDI
documents. See:
http://www.w3.org/TR/xinclude/#basic-example
Similar mechanims are existent in programming languages like the header
files “.h” in “C”. In Microsoft Word also exists a similar mechanism to
include graphic files. One could decide wether a whole graphic file is
included into a Word document (in this case often the document size
increases dramatically) or wether only a pointer to the external graphic
file is included. With the second alternative each change in the
external graphic file is immediately reflected in the Word document. In
SPSS it is possible to include external command files in another command
file by the command “INCLUDE FILE=”.
In addition it is possible with XInclude to include only parts of other
XML files by the mechanism of XPointer, which itself uses XPath to
address the fragments. A relative simple XPointer expression is
possible, if one uses ID's of the other XML file (see the example
included with the proposal). See:
http://www.w3.org/TR/xinclude/#range-example
For complex files and family of datasets I think for every "study" in a
logic sense one DDI codebook should exist. With hierarchical files one
codebook for the person-level "study", one codebook for the
household-level "study", one codebook for the constructed new
person-level "study" with the added household information. With time
series one codebook for each time ("individual study") and one codebook
for the whole collection, the new "study" (for comparison analyses).
If a information is specific to a single "study", it should be stored in
this DDI codebook. If a information is generic to a collection of
studies or is specific to a new created "study", it should be stored in
the corresponding DDI codebook.
By this mechanism it would be possible to weave a distributed set of DDI
codebooks, which include parts of other codebooks or refer to parts of
other codebooks. To refer to another XML file or an part of it, it
exists the XLink/XPointer mechanism, which is a generic development of
the HTML link concept (address: name="top1", link: href="url#tops").
So it is not necessary to develop a specific programming tool. An
XInclude aware XML parser/XSL processor is able to do the work, if an
XSL stylesheet is applied. Obviously it is necessary to develop
stylesheets to transform the information of the DDI XML files into other
formats like HTML, FO/PDF, whatever.
Attached you will find a generic stylesheet to produce a DDI codebook
with all information physically in it (you could use it with the
examples attached to the proposal), it is called an identity
transformation. I used the XSL processor XSLTPROC, which has integrated
the functionality of the whole XInclude specification.
See: http://xmlsoft.org/XSLT/
for Windows: http://www.zlatkovic.com/libxml.en.html
In general it seems desirable to allow XInclude for at least the major
parts of DDI (docDscr, stdyDscr, dataDscr). By this means it would be
possible to let edit several people several valid parts of the document.
The several parts could be integrated in one master document per study
by means of XInclude. From the application perspective it would be also
more flexible to have different files for example for the study
description and for the data description. A search in the study
description would have to load only the study information not the whole
DDI information.
Including the XInclude mechanism into DDI would be one step further into
the direction of standards, especially W3C standards. It would be not a
step into a application-specific solution.
One actual problem could be that only some XSL processors / software (to
my knowledge) are able to process XInclude commands: xsltproc, cocoon,
xalan (no fragments), … But while the new version of DDI is developped,
hopefully more XSL processors are including the XInclude mechanism.
I'm looking forward to your comments and to the discussion
Achim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: identity_transformation.xsl
Type: text/xml
Size: 433 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040304/4783af87/identity_transformation.xml
More information about the DDI-CDG
mailing list