Data Sharing for Demographic Research
A data archive for demography and population sciences

Tips for Writing a Data Sharing or Data Management Plan

An effective data sharing plan in the social and behavioral sciences must address the following principles:

  • Protection of human subjects - Show evidence of a disclosure risk limitation strategy that addresses respondent privacy and confidentiality across the research data life cycle, taking into account not only informed consent and the IRB submission for data collection but also evaluation and limitation of disclosure risk in the final analytic data files. If a restricted-use file will be created, detail the administrative controls on data access and use that are planned.
  • Comprehensive standardized documentation - Provide a plan for creating study and variable level metadata that documents all variables, including derived, imputed, and recoded items, in a clear and transparent way. Documentation should conform to international documentation standards in the scientific field in which data collection occurs and should facilitate interoperability and metadata reuse, avoiding the use of proprietary software.
  • Enduring access - Present a recommendation for widespread and fair access to the data for all eligible users. Criteria for eligibility and the legal terms of data re-use should be clearly stated in the plan along with any licensing proposed. The technology underlying the dissemination mechanism should be designed to ensure stable, continuous, and secure access to the data files.
  • Long-term preservation - Provide a plan for preserving the data over time and ensuring that the data are migrated as appropriate and kept usable for a reasonable period. This plan can incorporate a strategy that moves the data to a digital or institutional repository. Any barriers to archival storage of data in specific locations should be noted.
  • Usage metrics - Outline a method for tracking data use and the characteristics of data users. Effective data sharing must address the needs of the secondary users. In the absence of information about the volume and nature of the user community, data dissemination strategies may be costly and ineffective.