enabling social science research over time

Digital Preservation Requirements Applied to ICPSR

Prepared by Nancy Y. McGovern, DPO, February 2007

Submission Information Package (SIP)

The SIP at ICPSR consists of the submitted deposit form, the original files, and associated study-level metadata and variable-level metadata (e.g., codebooks) received from depositors. All of the information on the deposit form is informative for digital preservation, but the essential information to document the deposit transaction includes: information that identifies the depositor and a description of the deposit, an exact listing of the files received (original file name and checksum are good identifiers), and the date of the deposit. It is not necessary to retain the original media on which files were submitted if the depositor does not submit online. The SIP includes additional metadata, files, or replacement files that were requested or received from the depositor to complete the deposit, when applicable. Persistent identifiers for the submission and the files should be assigned upon arrival or as soon after as possible. The SIP forms the basis of the Archival Information Package (AIP). As the scope of digital content received by ICPSR expands (e.g., project Web sites, related audio and video files), the level and nature of the metadata for new digital content will adjust accordingly.

Archival Information Package (AIP)

The AIP at ICPSR includes or refers to all of the content of the SIP (original files, study-level metadata, and variable-level metadata) supplemented by normalized versions of the original files (when needed), enhanced study-level and variable-level metadata, and a record of preservation actions over time. The Archival Information Collection (AIC) record that is established upon receipt of the deposit form and the files brings together all of the information about the lifecycle of the deposit, the assignment of study identifiers, the processing history (e.g., study folders, study folder database, study tracking database), the preservation actions over time, and the distribution of the digital content received by ICPSR.

Dissemination Information Package (DIP)

The DIP at ICPSR includes derivative versions of the processed files in acceptable distribution formats, relevant set-up files, and the study-level and variable-level metadata required to read and use the files. The DIP is typically a subset of the AIP because not all of the study-level and variable-level metadata required for preservation may be needed for current use of the studies and files, and the accumulated record of preservation actions would not typically be distributed. ASCII-based versions of the data are available to users. A record of the files that have been made available for distribution and basic tracking of the use (downloading) of studies and files are required. Files that have been superseded after the files have been made available are removed from the online access system (the Web site) and available on demand from the AIPs in the repository. Superseded files are noted on the Web site and documented in the relevant AIPs.

Summary of Components by Preservation Stage

SIP AIP DIP
Original files [3] Original [3] + Normalized files Distributed files [4]
Metadata (study [1], variable [5]) Enhanced metadata [1,5] Released metadata [1,5]
Deposit Form [2] Deposit Form + processing history [2]