Deposit Data

The National Addiction and Health Data Archive Program (NAHDAP) accepts data on substance use and health-related topics. NAHDAP is funded by the National Institute on Drug Abuse (NIDA) but also accepts high-priority data collections funded by other sources. NAHDAP’s mission is to preserve deposited files in perpetuity. Below, you will find a link to NAHDAP’s online deposit form as well as guidance on how to prepare your data for deposit.

To complete a data deposit, click on the “Deposit Data” button above. Please make sure that “NAHDAP” appears in the “Archive” field so that your data are archived with us. Please contact us at icpsr-help@umich.edu if you have any questions.

  1. Final version of each dataset generated during the project, including scale or other derived variables created for published analyses (required)
    • Provide quantitative data in SPSS, SAS, Stata, or R formats with all variable and value names and labels embedded within the file. If you provide the data in SAS format, include formats file(s).
    • Provide qualitative data in plain text (*.txt), rich text (*.rtf), scanned image of text with OCR (*.pdf), or Microsoft Word (*.doc, *.docx).
  2. Codebook(s) listing the variable names, variable labels, value labels, and missing value designations (an SPSS dictionary with these elements can suffice) (required)
  3. User guide, manual, or other metadata or documents (such as research articles) that describe the methodology and data collection protocols (required)
  4. Programming code necessary to reproduce all constructed measures and to link/merge files as needed (nice to have)
  5. Blank copy of each data collection instrument (nice to have)
  6. IRB approval and blank copy of consent form(s) used (nice to have)
  7. Inventory of the files deposited (nice to have)

NAHDAP accepts quantitative, qualitative, imaging, video, and other types of data. All data accepted at NAHDAP must include information on substance use. You can find more information about our current priorities below:

Studies with one or more of the following characteristics are considered high-priority:

  • Data that advance knowledge or address gaps identified in NIDA’s strategic priorities.
  • Data from probability samples from studies that use some form of random selection to increase representativeness.
  • Longitudinal data from studies that take place over time, with at least two waves of data collection using the same measures.
  • Studies that include common/standardized measures and/or overlap in methods used or measures included with NAHDAP studies.

To support data science and advanced analytics, NAHDAP is interested in datasets that are:

  • Available in machine-readable, standardized formats, with clear documentation and metadata.
  • High–quality and have undergone quality assurance for missingness, outliers, and validity.
  • De-identified to protect privacy.
  • Sufficiently large and scalable for robust analysis.
  • Include relevant features for AI approaches, such as structured survey responses, time-stamped event logs, free-text (for NLP), sensor data, or imaging (with de-identified metadata).

NAHDAP’s Selection Criteria:

  • Demonstrate high scientific, methodological, and ethical standards.
  • Enable secondary analysis and/or replication.
  • Accompanied by sufficient documentation (metadata, codebooks, protocols).
  • Protect participant privacy and adhere to informed consent, especially for sensitive populations.
  • Include diverse populations that reflect the lived experiences of all people within communities being studied.
  • Accessible in formats supporting interoperability and broad reuse (e.g., SPSS, SAS, Stata, R, etc.).
  • Promote open science — datasets with embargo or limited access will be reviewed carefully to ensure public value.

NAHDAP is open to discussions about any dataset that may be of use to the substance use research field. However, the following types of data are not considered a high priority. If archived, they may be preserved with only minimal processing by NAHDAP staff:

  • Data that better fit another domain repository, such as DBGAP
  • R21 experimental studies (i.e., pilot data) without plans to archive follow-up data
  • Data without social science content
  • Data from non-human subjects (e.g., animals)

The following are best practices for preparing your data and documentation for archiving at NAHDAP. For more information, please refer to ICPSR’s Guide to Social Science Data Preparation and Archiving, Guide for Sharing Qualitative Data, and Depositor Checklist.

To facilitate secondary use, it is important to fully document variables in the context of the data file as well as in the codebook. When preparing the data and documentation for archiving, please review the best practices below:

Data Best Practices

  1. Address confidentiality regarding the data before deposit by at minimum removing direct identifiers and addressing indirect identifiers as you see fit. If treating the data will unduly impact the analytic utility of the data, please contact NAHDAP staff to discuss releasing the data as a restricted-use dataset.
  2. Retain ID or case identification variables in character format. Most statistical software sort and match character format IDs faster than numeric IDs. Also, character IDs are better retained between software packages. If the data collection consists of two or more related datasets, clearly identify all IDs needed to link data files together, and explain the relationship among the files and the variables in the documentation.
  3. Keep dates in character or simple numeric format (i.e., no dashes or slashes). Statistical software packages use different start dates for their date variables. For example, JUN2020 is not recognized as a date format by all software packages. Alternatively, dates can be replaced by time lapse variables (e.g., days or century months between events or age at time of event).
  4. Except for ID and date variables, define all variables as numeric whenever possible. A wider range of analyses can be performed with numeric variables.
  5. Convert data files to one record per case. Complex record structures (i.e., multiple records per case, hierarchical, mixed record types) are difficult for most users and cause difficulty both for NAHDAP’s automation process and for software interoperability. For example, if individuals or institutions experience more than one “event” such as a hospital visit or doctor’s visit, create a person file that includes all of the events on the person record. Likewise, if a doctor visit file includes lab tests with individual results attached to those lab tests, structure the data so that the doctor visit is the observation and each lab test and its results are recorded as variables on the doctor visit record.
  6. Assign a set of exhaustive, mutually exclusive codes to each variable and use the same codes across variables recording the same type of responses (e.g., 0 No, 1 Yes). Provide each variable and each code with descriptive labels in the data file (e.g., the SPSS, SAS, or Stata file) to aid proper understanding of the data content. Secondary analysts rely on the data file to provide the majority of the information they use to analyze the data. Despite the best attempts to convince users to read documentation carefully, often they do not. Review labels for comprehensibility and to make sure that they clearly describe the information or question recorded in that variable. If labels in the data must be abbreviated due to length limitations, the full information should be provided in a codebook, data collection instrument, or other documentation.
  7. Assign separate missing codes for not applicable, non-response, refusals, and other types of missing data. Use numeric codes because special missing characters are often lost when converting between software packages and in preservation formats such as ASCII. If retaining blanks (i.e., “system missing),” the documentation must identify what type of missing data the blanks represent.
  8. Deposit transformed variables (i.e., variables constructed or derived from variables collected using the questionnaire). For example, scale items may be scored using the scale algorithm and stored in one or more summary scale variables. Assign labels to the transformed variables in the recode statements used to create the variables. Clearly identify data files comprised of only transformed or analysis variables and clearly mark transformed variables stored with the original variables. Describe the source of the transformed variable and the method for deriving it by depositing the recode statements and/or a more extensive explanation in the codebook or other documentation. In some cases, the recode statements and explanation only may be deposited if secondary analysts can faithfully reproduce the transformed variable.
  9. Include the technical variables in the data file that are needed in order for statistical inference to be valid, such as weights, non-response adjustment, survey design variables, case disposition indicators, and other related variables. The documentation and labels should clearly describe how the variables were constructed and how they should be used, especially if different analyses require different weights or disposition variables. Imputation flags should also be keyed to the corresponding variable and the method of imputation should be fully explained in the documentation.
  10. Reconcile univariate statistics on each variable. Secondary data users confirm that they are reading the data properly by comparing the documentation with univariate statistics they produce from the data. Data producers are best positioned to reconcile case counts, out-of-range codes, skip patterns, and univariate distributions before deposit. Secondary analysts resort to unsupported assumptions if they are unable to reconcile the data with the documentation.

Documentation Best Practices

  1. Provide all documentation needed for others to sufficiently understand the data. NAHDAP often distributes the original project documentation with data. Relevant documentation includes:
    • Data collection design documents which include study rationale, data collection strategies, and a description of post-processing decisions such as weighting, imputation, and recodes,
    • Questionnaires or data collection instruments,
    • Original data documentation that is not embedded in the data file or instruments, and
    • Any working papers, technical reports, or publications associated with the data collection or substantive aspects of the data.

For data that contain indirect identifiers that may pose disclosure risk for respondents, NAHDAP staff can assist with preparing a restricted-use archiving and dissemination plan. For institutions or Institutional Review Boards that want a legal document to cover such situations, NAHDAP has a Restricted-Use Deposit and Dissemination Agreement available. The agreement is not required to deposit such data with NAHDAP.

NAHDAP staff will initially review your deposit to ensure it is complete. We may reach out to you following your deposit if any information is missing, if we have difficulty accessing any of your files, or if we need to request the files in a different format.

After the initial review, NAHDAP staff conduct a confidentiality review and begin data curation on all deposited data. From this evaluation, staff recommend a method of data release that protects respondents from re-identification while retaining the analytic utility of the data. Data release options include:

  • Public-use data release
  • Restricted-use data release
    • Secure download, Virtual Data Enclave, or Physical Data Enclave release options are available depending on the sensitivity and risk of harm present in the data. All of these options require end users to submit an application and Restricted Data Use Agreement prior to access.
  • Combination of public-use and restricted-use data release

Please note that we may reach out to you with questions during the curation process, so we recommend you build time for questions into your data sharing timeline if possible.

For more information about what steps we take during data curation, please refer to ICPSR’s Deposit Your Data page.

We recommend the following guides to help you prepare your data for archiving:

  1. ICPSR’s Guide to Social Science Data Preparation and Archiving
  2. ICPSR’s Guide for Sharing Qualitative Data
  3. ICPSR’s Depositor Checklist
  4. Additional Guidance for HEAL Depositors

If you are working on a grant application or have further questions about working with NAHDAP before you select us as your repository, please take a look at the following resources:

  1. ICPSR’s Data Management Plans & Grant Support page
  2. NAHDAP’s Depositor FAQs

In addition, you may find it helpful to review the video below for more information on depositing your data and a tutorial on using NAHDAP’s online data deposit form.