Prepare Your Data for Deposit with ICPSR

Preparing your data for deposit can be smooth and straightforward if you follow a few best practices. Think of it like packing a suitcase for a trip — being organized and including all the essentials will make the journey easier for everyone who uses your data. This page walks you through preparing your data for archiving with ICPSR. For additional information on data management, reference ICPSR’s Guide to Social Science Data Preparation and Archiving.

Deposits should include all data and documentation necessary for others to independently read and interpret the data. At minimum, ICPSR requires that you submit data files, documentation files (such as codebooks, user guides, or questionnaires), and descriptive information about your study and methodology. Some important considerations and guidelines are below. For a quick checklist of everything you’ll need to start your deposit, see the Depositor Checklist. As you begin preparing your deposit, double-check your informed consent or Institutional Review Board (IRB) documentation (or the Terms of Use if you gathered your data from existing sources) to ensure your data can be shared.

Don’t have time or expertise to clean your data?

Learn how our Professional Curation Team handles the heavy lifting for you.

Preparer Checklist

At-a-glance

  • Organize Files: Use “One record per case” and consistent naming conventions.
  • Clean & Label: Define missing values; use numeric codes for software interoperability.
  • De-identify: Critical guide on removing PII (Personally Identifiable Information). Mention that ICPSR staff also perform a secondary disclosure review.
  • Document: Importance of codebooks, questionnaires, and “readme” files.
  • Review Data Format Submission Requirements

Full list

Data Accuracy and Completeness

Plan for data input and format (numeric or character). Determine how you will check for errors, inconsistencies, and version management. For example, archives increasingly use checksums and other techniques to ensure integrity.

Data Structure

ICPSR can accept data organized as:

“Flat” or Rectangular Files:

  • Data organized in long records, often starting with an ID followed by variables.
  • Suitable for most datasets and easy to read by analytic programs.

Hierarchical Files:

  • Efficient for large datasets with many empty fields, like detailed surveys with varying numbers of respondents’ children.
  • Stores data in a header record and multiple secondary records, saving space but needing more complex programming.
  • Alternatively, separate files for different records can be used, (e.g., respondents and children), providing flexibility and easier analysis.

Relational Databases:

  • Collections of linked data tables using key variables (e.g., “Family ID”).
  • Allow for specific queries and data combinations from multiple tables.
  • Recommend exporting as flat files and using SQL to preserve table relationships.

Longitudinal/Multi-Wave Study Files:

  • Data collected from the same participants over multiple times or waves, usually organized as hierarchical files.
  • Must maintain consistent file information, use linking identifiers, and align variable labels and values across waves for ease of data comparison

Data File(s)

For quantitative data, submit files in SAS, SPSS, Stata, or ASCII (with setup syntax files). For qualitative data, submit files in plain text (*.txt), rich text (*.rtf), scanned image of text with OCR (*.pdf), or Microsoft Word (*.doc, *.docx). Other formats are also accepted. Ensure each variable has clear, exclusive codes and labels. Define any missing data codes. Follow the Depositor Checklist or contact ICPSR staff at ICPSR-help@umich.edu for guidance in preparing your data

Documentation

Provide full documentation such as codebooks, data collection instruments, summary statistics, and project summaries. Documentation should integrate question text with variable information where possible. Common documentation formats are PDF, .doc, .xls(x), etc.

It is crucial to handle research data with care to protect participant confidentiality​​. During the planning phase, ensure that data sharing complies with participants’ consent and IRB requirements. Keep data secure. When preparing for data sharing, deidentify any variables that might compromise confidentiality. The good news is: ICPSR reviews all data it receives for disclosure risk. Read more about data confidentiality at ICPSR.

ICPSR’s Disclosure Risk Guide for Data Depositors includes remediation suggestions to handle both indirect and direct identifiers. For an overview of the management of restricted-use data, please refer to ICPSR’s Restricted-use Data Deposit and Dissemination Procedures (pdf).

If your data include sensitive questions or contextual details that are analytically important but might increase the chance that a participant could be reidentified, ICPSR will recommend releasing a restricted-use version of the data.

Now that you’ve started the deposit process, please fill out the details like principal investigators, title, funding, project description, and methodology. This helps others find and use your data responsibly. The goal is to make your data discoverable and usable for future research.

Description

In this section, provide a detailed description of your data, including study design, sample, and methodology. This information is crucial for others to understand and use your data. Check the ICPSR Metadata Documentation Portal for guidance on what to include. Providing quality information in the deposit form helps us connect users with your data.

Data

We prefer to receive data formatted in one of the statistical packages (R, SAS, SPSS, or Stata), but delimited or ASCII files are accepted with proper documentation like a data dictionary or codebook. Please ask ICPSR staff about other acceptable file formats. Ensure variables have clear labels, value labels, and missing data codes.

For qualitative data, refer to the Guide for Sharing Qualitative Data at ICPSR.

Documentation

Documentation files are integral to the reuse of a data collection. That is why it’s important to submit documentation that explains your data collection thoroughly. Documentation includes codebooks, user guides, questionnaires, README files, and more. Documentation is best when it integrates question text with variable information when possible.

Identifiers and protections

Remove direct identifiers before depositing your data. If your data contains identifiers, explicitly mention it during the deposit submission process. ICPSR can handle such data under certain conditions, following study participants’ consent and IRB approval. For additional information about how we handle sensitive data, check out Preserving Respondent Confidentiality.

If you choose curation, ICPSR will review your data for disclosure risks and work with you to address them. This might involve creating public and/or restricted-use versions of your data. More details are available at Restricted-Use Data Management at ICPSR. For specific questions about how we handle sensitive data, contact ICPSR User Support.

 

Metadata, or detailed information about data collections, are crucial for maximizing their usefulness. They allow users to understand and use the data without needing to contact the data producers. Good metadata standardize data descriptions, improve understanding, facilitate searches, and enhance web display.

At ICPSR, metadata are created primarily from information provided by data producers and metadata specialists. Data producers should submit the following at minimum:

  • Clear and consistent titles (i.e., Title, Location, and Years would appear on the ICPSR site as “Aging in Women, United States, 2005-2006”).
  • Project description, including goals, main topics, and methodology
  • Principal Investigator (PI) names and organizational affiliations
  • Dates of data collection
  • Intended unit of analysis (who or what is being studied)
  • Sample description
  • Universe description
  • Project/study website, if available
  • Funding source(s) and grant number(s)

Please review the ICPSR Metadata Documentation Portal to learn more about the study-level metadata to include with your data deposit.

Keep a list of any publications related to your data – these references can be included with your deposit. Any publications included in your deposit will be added to the ICPSR Bibliography of Data-related Literature, helping others find your data.

Additional Information

Looking for more information about data sharing? Check out the resources below.

  • FAIR Principles – internationally accepted guidelines for managing and sharing scientific data.
  • Data Documentation Initiative (DDI) – an international standard for describing data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. DDI can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving.

Contact us if you have questions about preparing your deposit.