[DDI-CDG] Proposal of a structure for collections of surveys

Joachim Wackerow wackerow at zuma-mannheim.de
Fri Feb 6 10:03:10 EST 2004


DDI Expert Alliance
Substantive Content Working Group
Comparative Data/Families of Datasets Subgroup
Joachim Wackerow, 2004-02-06


Proposal for a structure for collections of surveys


Intention

The intention is to provide a higher level of description for families 
of datasets with the integration of the current DDI structure and with 
use of common W3C XML technologies in a modular sense. The collection 
concept is located on top of the current DDI structure.

The current DDI structure should only be changed as necessary to 
integrate this technologies.


Concept

Families of datasets could be collections of surveys on one specific 
axis like time, space, etc. The current DDI structure describes 
independent surveys. A collection is a set of several survey 
descriptions. The collection has several individual members of a family 
of surveys. The collection describes the context of all members and the 
common connection of the individual members. The relation from 
collection to surveys is the relation of a common subset of several sets 
to the sets themselves.

Collection
(common sub set)
---------------------
|                   |
|           ------------
|           | Survey 1 |
|           ------------
|                   |
|             ------------
|             | Survey 2 |
|             ------------
|                   |
---------------------

The collection has a list of the individual surveys. The collection 
schema itself integrates the current DDI structure to describe the 
information, which is common to the individual members of the 
collection. This could be a part of the study description as also the 
description of common variables. With a central description of common 
information there is no need of redundant information in the individual 
DDI descriptions.

Collection (description of common information)
----------------------------------------------
    |                 |
    |                 |
  Survey 1          Survey 2
(description of specific information on the individual surveys)

To integrate a part of one XML file ínto another, one could use the 
XInclude technology. With this mechanism it is possible to integrate the 
common information stored in the collection file into a DDI file. With 
XInclude one could integrate the information stored in one XML file into 
another XML file. In addition it is possible with XPointer/XPath 
expressions to integrate only a part of the information.


Technical issues

To realize this structure in XML it is necessary to have a XML Schema of 
DDI. This schema could be integrated in the new XML Schema of the 
collection. Furthermore the DDI XML schema needs to be extented for the 
XInclude mechanism.

The DDI schema used in the collection schema must be changed in few 
points. The element "codeBook" should have the new attribute "level" 
with the value "collection_of_surveys" to distinguish it to codebooks on 
indivual surveys. The existing element "varGrp" with a variable list in 
the attribute "var" seems not appropriate for this use; a new element 
"NEW_varGrp" would be necessary to describe variable groups over the 
indivual DDI files. This element has the new subelement 
"NEW_variable_identifier" to point to variables in the individual DDI files.

Attached are XML example files to illustrate the concept. These files 
are well-formed, working examples (for XInclude). Because of the new 
extensions it is not possible to validate them against the current DDI DTD.

See attached example files:
collection_of_surveys.xml
MZ1996.xml

Both schemas (collection and existing codebook) could be also integrated 
in a single XML schema. DocBook has a similar structure (set, book, 
chapter).
See: http://www.docbook.org/tdg/en/html/set.html#d0e181280
MARC has the hierarchy: collection, record.
See: http://www.loc.gov/standards/marcxml///xml/spy/spy.html


Collection of collections

Additionally it would be possible with the described structure to have a 
collection of collections (recursive pattern), which themselves have 
surveys as members. For example a collection in the time axis could have 
members of collections in the geography domain; this collections have 
individual surveys as members.

                  Super Collection
                          |
                   ---------------
                   |             |
             Collection 1  Collection 2
                 |    |        |    |
Survey Level:  1a   1b       2a   2b


See attached example files:
collection_of_collections.xml


Notes on XInclude

In general the use of the XInclude mechanism has other benefits. It 
would be possible to organize parts of a single DDI description in 
separate files. This could be desirable at the editing process of 
extensive surveys: the study description and the data description could 
be edited and validated by different persons in separate files. For the 
purpose of organisation of many DDI structured information (XML 
databases, XML transformation, XML based searching), it would be also 
more flexible to have parts of a DDI file. To realize this mechanism it 
would be necessary to allow elements like "stdyDscr" as root element in 
addition to "codeBook". Similar in DocBook it is possible to have 
different elements as root element.


Construction of new variables

For the comparison of data it would nice to have a mechanism for the 
description of new constructed variables. Often variables of different 
surveys are only comparable by means of new constructed variables. To 
describe this construction (often conditionally) it is possible to 
integrate a foreign DTD/XML Schema like MathML in the DDI structure. 
Furthermore it would be possible with such a mechanism to generate for 
example a SPSS command setup with if-statements.

See the discussion thread "Adding tags to the DDI" in January on the DDI 
users list, especially:
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000153.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000147.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000151.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000158.html

One remark was, that MathML is complex and hard to read. Perhaps there 
exists another XML language to express formal conditional expressions 
like this. I think, it would be better to use an existing DTD/schema 
than to build a new language for this purpose.

See attached example:
single_study_mathml_ddi.xml


I'm looking forward to your comments and to the discussion

Best regards, Achim

PS: After one week cross country skiing I'll be back to work at 2004-02-16
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collection_of_surveys.xml
Type: text/xml
Size: 1959 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_surveys.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MZ1996.xml
Type: text/xml
Size: 1787 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/MZ1996.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collection_of_collections.xml
Type: text/xml
Size: 1588 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_collections.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: single_study_mathml_ddi.xml
Type: text/xml
Size: 2327 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/single_study_mathml_ddi.xml


More information about the DDI-CDG mailing list