Metadata are critical to effective data use as they convey information that is necessary to fully exploit the analytic potential of the data. Because it is often impossible for secondary researchers to ask questions of the original data producers, metadata becomes the de facto form of communication between them. Complete and thorough metadata make possible a more complete understanding of a dataset, searches of data including variables, and a variety of options for display on the Web.

For a detailed discussion of metadata, see Phase 3: Data Collection and File Creation in ICPSR's Guide to Social Science Data Preparation and Archiving. What follows here is a shortened version of that material.

Data Documentation Initiative (DDI)

In documenting its data holdings, ICPSR uses the Data Documentation Initiative metadata specification, a standard for the content, presentation, transport, and preservation of documentation expressed in XML (eXtensible Markup Language). XML permits the markup, or tagging, of documentation content for retrieval and repurposing across the data life cycle.

ICPSR encourages data depositors to generate documentation that conforms with DDI, but will convert existing metadata in other formats to the DDI standard when necessary.

Several tools are available for writing XML, including Nesstar Publisher and Colectica. The DDI website also has a list of appropriate tools and other XML resources.

Study- and variable-level metadata

ICPSR creates study-level metadata records in DDI format using information supplied by data depositors and other sources. These study-level records may be exported as XML compliant with these standards: DDI Codebook, DDI Lifecycle, Dublin Core, and MARCXML.

Variable-level DDI metadata are generated from statistical package files and enhanced with question text. These metadata then become part of the Social Science Variables Database.

Important metadata elements

A number of elements should be included in DDI metadata, including, but not limited to:

  • Principal investigator
  • Funding sources
  • Data collector/producer
  • Project description
  • Sample and sampling procedures
  • Weighting
  • Substantive, temporal, and geographic coverage of the data collection
  • Data source(s)
  • Unit(s) of analysis/observation
  • Variables
    • Exact question wording or the exact meaning of the datum
    • Universe information
    • Exact meaning of codes
    • Missing data codes
    • Unweighted frequency distribution or summary statistics
    • Imputation and editing information
    • Details on constructed and weight variables
    • Location in the data file when relevant
    • Variable groupings
  • Technical information on files
  • Data collection instruments

Archives also value receiving the following information

  • Citations to related publications
  • Technical information on files, e.g., information on file formats, file linking
  • Data collection instruments
  • Interviewer guide
  • Coding instrument