Guide to Social Science Data Preparation and Archiving:

Importance of Data Sharing and Archiving

Archives and domain repositories that preserve and disseminate social and behavioral data perform a critical service to the scholarly community and to society at large, ensuring that these culturally significant materials are accessible in perpetuity. The success of the archiving endeavor, however, ultimately depends on researchers’ willingness to deposit their data and documentation for others to use.

In recent years, several national scientific organizations have issued statements and policies underscoring the need for prompt archiving of data, and some funding agencies have begun to require that the data they fund be deposited in a public archive. The National Institutes of Health (NIH) now requires a data sharing plan for large projects, and in 2011 the National Science Foundation (NSF) began to require a data management plan as part of every grant application. The National Endowment for the Humanities (NEH) and Institute of Museum and Library Services (IMLS) have followed suit. The NSF’s Social, Behavioral & Economic Sciences directorate recently released a data archiving policy.

These statements from leading research funding agencies demonstrate that the data sharing ethic is integral to maximizing the impact and benefit of research dollars. Experience has demonstrated that the durability of the data increases and the cost of processing and preserving the data decreases when deposits are timely. Further, archived data result in a greater number of publications and a higher profile for data producers (Pienta, 2010).

Data sharing also allows scientists to test and replicate each others’ findings. “The replication standard holds that sufficient information exists with which to understand, evaluate, and build upon a prior work if a third party can replicate the results without any additional information from the author” (King, 1995). “[S]ome departments now require students writing dissertations and senior theses to submit a replication data set that, after an optional embargo period, gets made public and is permanently archived” (King, 2006).

There are many benefits to data sharing that go beyond replication. Fienberg (1994) argues that data sharing:

  • Reinforces open scientific inquiry. When data are widely available, the self-correcting features of science work most effectively.
  • Encourages diversity of analysis and opinions. Researchers having access to the same data can challenge each other’s analyses and conclusions.
  • Promotes new research and allows for the testing of new or alternative methods. Examples of data being used in ways that the original investigators had not envisioned are numerous.
  • Improves methods of data collection and measurement through the scrutiny of others. Making data publicly available allows the scientific community to reach consensus on methods.
  • Reduces costs by avoiding duplicate data collection efforts. Some standard datasets, such as the General Social Survey and the National Election Studies, have produced literally thousands of papers that could not have been published if the authors had to collect their own data.
  • Archiving makes known to the field what data have been collected so that additional resources are not spent to gather essentially the same information.
  • Provides an important resource for training in research. Secondary data are extremely valuable to students, who then have access to high-quality data as a model for their own work.

Early archiving may enable a researcher to enhance the impact (and certainly the visibility) of a project.

Planning Ahead for Archiving and Preservation of Data

Data management and sharing plans should be developed in conjunction with an archive to maximize the utility of the data and to ensure the availability of the data in the future. We recommend that researchers consult as early as possible with the data archive in which they plan to deposit data; this will facilitate preservation and dissemination of the research data.

Data archives are committed to maintaining social science research data for the long term, for the benefit of future researchers, and to assist data creators in meeting the stipulations of their grantors. There are several factors to consider when selecting a data archive or domain repository for deposit with a view toward long-term access to your data. These include evidence of an explicit institutional commitment to preservation, and indicators that the preservation program is sustainable and credible and offers preservation and access services that are able to meet your short-term and long-term requirements. Compliance with the OAIS Reference Model (PDF) is also an important factor to consider when selecting an archive for deposit. ICPSR’s Digital Preservation section has more information on digital preservation standards and a glossary of terms.

The Data Life Cycle

Researchers should plan for eventual archiving and dissemination of project data before the data even come into existence. According to Jacobs and Humphrey (2004), “Data archiving is a process, not an end state where data is simply turned over to a repository at the conclusion of a study. Rather, data archiving should begin early in a project and incorporate a schedule for depositing products over the course of a project’s life cycle and for the creation and preservation of accurate metadata, ensuring the usability of the research data itself. Such practices would incorporate archiving as part of the research method.”

We offer here a schematic diagram illustrating key considerations germane to archiving at each step in the data creation process. (A text version of this schematic diagram is also available.) The actual process may not be as linear as the diagram suggests, but it is important to develop a plan to address the archival considerations that come into play across all stages of the data life cycle.

data lifecycle

Phases 1-6 in Figure 1 are covered in the respective sections of the Guide.

Using the Guide

The Guide to Social Science Data Preparation and Archiving is aimed at those engaged in the cycle of research, from applying for a research grant, through the data collection phase, and ultimately to preparation of the data for deposit in a public archive. The Guide is a compilation of best practices gleaned from the experience of many archivists and investigators. The reader should note that the Guide does not attempt to address policies and procedures specific to certain archives, as they vary. Most public social science archives encourage investigators to contact them at any point in the research process to discuss their plans with respect to the design and preparation of public-use datasets.