Optional Recommendations for Preparing Quantitative, Tabular Data for Deposit

In addition to the required ICPSR data submission materials, the following optional recommendations may help ensure that submitted data and documentation are complete, well organized, and easier to review and reuse.

Supply Additional Documentation/Files:

  • Include original survey/data collection instruments)
  • IRB Approval and sample Informed Consent statement
  • Include raw and derived variables (and coding used to produce derived variables), ensuring variables related to published results are included
  • Include design variables (stratum, cluster, final weights), linking variables (where files can be combined)
  • Include all citations for publications related to data submission

Variables and Variable Labeling

  • Each variable name is less than 32 characters
  • Use a unique variable label for each variable
  • Approximate question text in the label
  • Do not use periods (.) or dollar signs ($) within labels
  • Do not start a label with a number
  • Do not contain spaces within labels (use – or _)
  • Variable label length should not exceed 256 characters, when possible

Values and Value Labeling

  • Numeric codes should not be greater than 10 digits
  • Use a unique value label for each discrete category
  • Omit value labels when they have non-integer values
  • Omit value labels for date and time variables
  • Omit value labels for string variables, if possible
  • Value label length should not exceed 120 characters, when possible

Missing Data

  • Create consistent missing data codes/values that are used across all variables
  • Recode any alpha-numeric missing data codes to numeric codes (applies to SAS and Stata data files)

Column Widths

  • For all numeric variables, 15 characters or fewer
  • For string variables, 250 characters or fewer**

**Some statistical packages allow longer string variables, but when the files are converted other packages those values are truncated