[DDI-CDG] Proposal of a structure for collections of surveys
Joachim Wackerow
wackerow at zuma-mannheim.de
Fri Feb 6 10:03:10 EST 2004
DDI Expert Alliance
Substantive Content Working Group
Comparative Data/Families of Datasets Subgroup
Joachim Wackerow, 2004-02-06
Proposal for a structure for collections of surveys
Intention
The intention is to provide a higher level of description for families
of datasets with the integration of the current DDI structure and with
use of common W3C XML technologies in a modular sense. The collection
concept is located on top of the current DDI structure.
The current DDI structure should only be changed as necessary to
integrate this technologies.
Concept
Families of datasets could be collections of surveys on one specific
axis like time, space, etc. The current DDI structure describes
independent surveys. A collection is a set of several survey
descriptions. The collection has several individual members of a family
of surveys. The collection describes the context of all members and the
common connection of the individual members. The relation from
collection to surveys is the relation of a common subset of several sets
to the sets themselves.
Collection
(common sub set)
---------------------
| |
| ------------
| | Survey 1 |
| ------------
| |
| ------------
| | Survey 2 |
| ------------
| |
---------------------
The collection has a list of the individual surveys. The collection
schema itself integrates the current DDI structure to describe the
information, which is common to the individual members of the
collection. This could be a part of the study description as also the
description of common variables. With a central description of common
information there is no need of redundant information in the individual
DDI descriptions.
Collection (description of common information)
----------------------------------------------
| |
| |
Survey 1 Survey 2
(description of specific information on the individual surveys)
To integrate a part of one XML file ínto another, one could use the
XInclude technology. With this mechanism it is possible to integrate the
common information stored in the collection file into a DDI file. With
XInclude one could integrate the information stored in one XML file into
another XML file. In addition it is possible with XPointer/XPath
expressions to integrate only a part of the information.
Technical issues
To realize this structure in XML it is necessary to have a XML Schema of
DDI. This schema could be integrated in the new XML Schema of the
collection. Furthermore the DDI XML schema needs to be extented for the
XInclude mechanism.
The DDI schema used in the collection schema must be changed in few
points. The element "codeBook" should have the new attribute "level"
with the value "collection_of_surveys" to distinguish it to codebooks on
indivual surveys. The existing element "varGrp" with a variable list in
the attribute "var" seems not appropriate for this use; a new element
"NEW_varGrp" would be necessary to describe variable groups over the
indivual DDI files. This element has the new subelement
"NEW_variable_identifier" to point to variables in the individual DDI files.
Attached are XML example files to illustrate the concept. These files
are well-formed, working examples (for XInclude). Because of the new
extensions it is not possible to validate them against the current DDI DTD.
See attached example files:
collection_of_surveys.xml
MZ1996.xml
Both schemas (collection and existing codebook) could be also integrated
in a single XML schema. DocBook has a similar structure (set, book,
chapter).
See: http://www.docbook.org/tdg/en/html/set.html#d0e181280
MARC has the hierarchy: collection, record.
See: http://www.loc.gov/standards/marcxml///xml/spy/spy.html
Collection of collections
Additionally it would be possible with the described structure to have a
collection of collections (recursive pattern), which themselves have
surveys as members. For example a collection in the time axis could have
members of collections in the geography domain; this collections have
individual surveys as members.
Super Collection
|
---------------
| |
Collection 1 Collection 2
| | | |
Survey Level: 1a 1b 2a 2b
See attached example files:
collection_of_collections.xml
Notes on XInclude
In general the use of the XInclude mechanism has other benefits. It
would be possible to organize parts of a single DDI description in
separate files. This could be desirable at the editing process of
extensive surveys: the study description and the data description could
be edited and validated by different persons in separate files. For the
purpose of organisation of many DDI structured information (XML
databases, XML transformation, XML based searching), it would be also
more flexible to have parts of a DDI file. To realize this mechanism it
would be necessary to allow elements like "stdyDscr" as root element in
addition to "codeBook". Similar in DocBook it is possible to have
different elements as root element.
Construction of new variables
For the comparison of data it would nice to have a mechanism for the
description of new constructed variables. Often variables of different
surveys are only comparable by means of new constructed variables. To
describe this construction (often conditionally) it is possible to
integrate a foreign DTD/XML Schema like MathML in the DDI structure.
Furthermore it would be possible with such a mechanism to generate for
example a SPSS command setup with if-statements.
See the discussion thread "Adding tags to the DDI" in January on the DDI
users list, especially:
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000153.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000147.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000151.html
http://lion.icpsr.umich.edu/pipermail/ddi-users/2004-January/000158.html
One remark was, that MathML is complex and hard to read. Perhaps there
exists another XML language to express formal conditional expressions
like this. I think, it would be better to use an existing DTD/schema
than to build a new language for this purpose.
See attached example:
single_study_mathml_ddi.xml
I'm looking forward to your comments and to the discussion
Best regards, Achim
PS: After one week cross country skiing I'll be back to work at 2004-02-16
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collection_of_surveys.xml
Type: text/xml
Size: 1959 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_surveys.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MZ1996.xml
Type: text/xml
Size: 1787 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/MZ1996.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collection_of_collections.xml
Type: text/xml
Size: 1588 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/collection_of_collections.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: single_study_mathml_ddi.xml
Type: text/xml
Size: 2327 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20040206/77316822/single_study_mathml_ddi.xml
More information about the DDI-CDG
mailing list