The following glossary is provided as a resource for data producers, data librarians, data users, and is
based on a glossary prepared by James Jacobs, formerly at the University of California, San Diego. Some
terms were also added from XML Terms:
Jargon terms in XML and what they mean
To supplement this glossary with more terms related to computing, consult the Encyclopedia of Computer
Science, Fourth Edition, edited by Anthony Ralston and Edwin D. Reilly.
The act of making information available. Digital preservation is a requirement for providing long-term access to
digital content. Access is "the OAIS entity that contains the services and functions which make the archival information
holdings and related services visible to Consumers." OAIS requires that an archive be able to find and deliver digital
content to authorized users; delivery may be to an individual or to an access delivery system.
The Collection Delivery Unit is responsible for providing access services and the digital preservation function preserves
the capability to regenerate the DIPs (Dissemination Information Packages) as needed over time.
"The OAIS entity that contains the services and functions needed to
control the operation of the other OAIS functional entities on a day-to-day basis." The OAIS Reference Model identifies
the policies and other documents that are the responsibility of Administration and are required by an OAIS.
Administration function is currently provided by Computing and Network Services, which oversees the works of the Data
Library staff, in conjunction with the Digital Preservation Officer, who develops requisite policies and guidance for
digital preservation operations. Digital preservation policy development at ICPSR is informed by OAIS.
(noun) A total created from smaller units. For instance, the population of a county is an aggregate of the
populations of the cities, rural areas, etc., that comprise the county.
(verb) To total data from smaller units into a large unit. Example: "The Census Bureau aggregates
data to preserve the confidentiality of individuals."
- aggregate data
Data that have been aggregated. Contrast with microdata.
- Archival Information Collection (AIC)
"An Archival Information Package whose Content
Information is an aggregation of other Archival Information Packages."
The Collection Delivery Unit is
responsible for providing access services and the digital preservation function preserves the capability to regenerate
the DIPs (Dissemination Information Packages) as needed over time.
- Archival Information Package (AIP)
"An Information Package, consisting of the
Content Information and the associated Preservation Description Information (PDI), which is preserved within an
The AIP consists of the original files deposited, processed versions of data files and documentation,
normalized files, and associated metadata.
- Archival Storage
"The OAIS entity that contains the services and
functions used for the storage and retrieval of Archival Information Packages."
The Archival Storage function
provides onsite and offsite redundancy through online copies (and a tape copy as extra backup) of ICPSR's digital
content, both the archival copies and the access copies. ICPSR preserves the ability to regenerate the Dissemination
Information Package (DIP); we do not preserve the software-dependent files (e.g., SAS, SPSS, Stata) that are
distributed. Archival storage contributes to ensuring business continuity for ICPSR and is a component of the disaster
planning at ICPSR.
(noun) A data archive is a site where machine-readable materials are stored, preserved, and possibly
redistributed to individuals interested in using the materials. (verb) To place or store in an archive.
A character-encoding scheme used by many computers. The ASCII standard uses 7 of the 8 bits in a byte to define
the codes for 128 characters. Example: In ASCII, the number "7" is treated as a character and is encoded as: 00010111.
Because a byte can have a total of 256 possible values, there are an additional 128 possible characters that can be
encoded into a byte, but there is no formal ASCII standard for those additional 128 characters. Most IBM-compatible
personal computers do use an IBM "extended" character set that includes international characters, line and box drawing
characters, Greek letters, and mathematical symbols. (ASCII stands for American Standard Code for Information
Interchange.) See also EBCDIC.
- attributes (XML)
XML elements can have attributes that further describe them, such as the following:
In the example above, "currency" is an attribute of "Price", and the attribute's value is
- binary format
Any file format in which information is encoded in some format other than a standard character-encoding scheme. A
file written in binary format contains information that is not displayable as characters. Software capable of
understanding the particular binary format method of encoding information must be used to interpret the information in a
binary-formatted file. Binary formats are often used to store more information in less space than possible in a
character format file. They can also be searched and analyzed more quickly by appropriate software. A file written in
binary format could store the number "7" as a binary number (instead of as a character) in as little as 3 bits (i.e.,
111), but would more typically use 4 bits (i.e., 0111). Binary formats are not normally portable, however. Software
program files are written in binary format. Examples of numeric data files distributed in binary format include the
IBM-binary versions of the Center for Research in Security Prices files and the U.S. Department of Commerce's National
Trade Data Bank on CD-ROM. The International Monetary Fund distributes International Financial Statistics in a
mixed-character format and binary (packed-decimal) format. SAS and SPSS store their system files in binary
- binary number
A number written using binary notation which only uses zeros and ones. Example: Decimal number 7 in
binary notation is: 111.
A bit is the smallest unit of information that a computer can work with. Each bit is either a "1" or a "0". Often
computers work with groups of bits rather than one bit at a time; the smallest group of bits a computer usually works
with is a byte, which is 8 bits.
A descriptor for information that is created in digital form, as opposed
to digitized from analog sources.
The majority of deposits consist of born digital content. There are some
examples of hard copy and anolog materials that might be made digital (digitized) by ICPSR. For example, the Data-PASS
project is identifying older social science data that include documentation and other components in hard copy format and
there are some deposits that contain video in VHS format.
(See skip pattern.)
- Business Continuity
"Describes the processes and procedures an organization puts in
place to ensure that essential functions can continue during and after a disaster." [SearchStorage.com] A note
regarding preservation: "Backups vs Preservation: Disaster recovery strategies and backup systems are not sufficient to
ensure survival and access to authentic digital resources over time. A backup is a short-term data recovery solution
following loss or corruption and is fundamentally different to an electronic preservation archive." ["Continued access
to authentic digital assets," JISC
Digital Preservation Paper, Nov 26, 2006.]
We are addressing business continuity requirements by ensuring
redundant backup of the preservation and access copies of ICPSR's digital content, by the establishment of a warm backup
for the ICPSR Web server, by identifying our core functions for business continuity, by assessing the current backup and
storage measures for our institutional records that support core functions to diminish the risk of loss in most
emergency situations, by conducting a self-assessment of our information security program to comply with relevant
standards, and by developing the requisite policies and procedures for business continuity.
Eight bits. A byte is simply a chunk of 8 ones and zeros. For example: 01000001 is a byte. A computer often works
with groups of bits rather than individual bits and the smallest group of bits that a computer usually works with is a
byte. A byte is equal to one column in a file written in character format. Most data files distributed by ICPSR are in
- Canonical Formats
"In information technology, canonicalization is the process of making
something [conform] with some specification... and is in an approved format. Canonicalization may sometimes mean
generating canonical data from noncanonical data."[Clifford Lynch, "Canonicalization: A Fundamental Tool to Facilitate Preservation
and Management of Digital Information," D-Lib Magazine, September 1999, volume 5, Number 9.]
Canonical formats are widely supported and considered to be optimal for long-term preservation.
(See card image.)
- card image
(1) Eighty characters of data stored as a single physical record. (2) A file storage format of 80 characters or
bytes per record. The card-image format is a remnant of the time when data were literally input on punch cards that had
physical limits of 80 characters per card. Usually a case or all the data for a single respondent is stored on several
80-character "cards." Each "card" is numbered and stored in numerical sequence. Cards with the same sequence number
(i.e., having a common format for the layout and contents of variables) are called a "deck"; thus cards are often
referred to in documentation by their "deck number." Example: "The variable for age is stored in Deck 01 in
columns 10-11 and the variable for race is stored in Deck 02 in column 10."
In survey research, an individual respondent. Contrast with unit of analysis.
(See Computer-Assisted Telephone [Personal] Interviewing.)
Compact Disc Read-Only Memory. A storage medium. Data are "stamped" onto the disc during the manufacturing
process. The disc is read-only. A variant has appeared that is rewritable, but this variant is not in use for the
dissemination of data.
- character-encoding scheme
A method of encoding characters including alphabetic characters (A-Z, uppercase and lowercase), numbers 0-9,
punctuation and other marks (e.g., comma, period, space, &, *), and various "control characters" (e.g., tab,
carriage return, linefeed) using binary numbers. For a computer to print a capital "A" or a number "7" on the computer
screen, for instance, we must have a way of telling the computer that a particular group of bits represents an "A" or a
"7". There are standards, commonly called "character sets," that establish that a particular byte stands for an "A" and
a different byte stands for a "7". The two most common standards for representing characters in bytes are
ASCII and EBCDIC.
- character format
Any file format in which information is encoded as characters using only a standard character-encoding scheme. A
file written in "character format" contains only those bytes that are prescribed in the encoding scheme as corresponding
to the characters in the scheme (e.g., alphabetic and numeric characters, punctuation marks, and spaces). A file written
in the ASCII character format, for instance, would store the number "7" in eight bits (i.e., one byte): 00010111. A file
written in EBCDIC would store the number "7" in eight bits as 11110111. Contrast with binary
- character sets
(See character-encoding scheme.)
Process to check data for adherence to standards, internal consistency, referential integrity, valid domain, and
to replace/repair incorrect data with correct data.
To "clean" a data file is to check for wild codes and
inconsistent responses (see consistency check); to verify that the file has the correct and expected number of records,
cases, and cards or records per case; and to correct errors found.
In most numeric data files, answers to questions are recorded with numbers rather than text, and often even
numeric answers are recorded with numbers other than the actual response. The numbers used in the data file are called
"codes." Thus, for instance, when a respondent identifies herself as a member of a particular religion, a code of "1"
might be used for Catholic, a "2" for Jewish, etc. Likewise, a person's age of 18 might be coded as a 2 indicating "18
or over." The codes that are used and their correspondence to the actual responses are listed in a
Generically, any information on the structure, contents, and layout of a data file. Typically, a codebook
includes: column locations and widths for each variable; definitions of different record types; response codes for each
variable; codes used to indicate nonresponse and missing data; exact questions and skip patterns used in a survey; and
other indications of the content of each variable. Many codebooks also include frequencies of response. Codebooks vary
widely in quality and amount of information included.
"A codec is the means by which sound and video files are compressed for storage
and transmission purposes. There are various forms of compression: 'lossy' and 'lossless', but most codecs perform
lossless compression because of the much larger data reduction ratios that occur [with lossy compression]. Most codecs
are software, although in some areas codecs are hardware components of image and sound systems. Codecs are necessary for
playback, since they uncompress [or decompress] the moving image and sound files and allow them to be rendered."
ICPSR will have to specify which type of codec they would like to use in creating digital files of video materials.
Preferred codecs can change as frequently as preferred file formats; it will be important to conduct current research to
know which codecs are most appropriate.
In a data file, a single vertical column, each being one byte in length. Fixed format data files are
traditionally described as being arranged in lines and columns. In a fixed format file, column locations describe the
locations of variables.
- column location
The precise location in a data file of a variable expressed in column numbers, beginning with the first column in
a physical record as column number 1.
- Common Services
"The supporting services such as inter-process communication, name
services, temporary storage allocation, exception handling, security, and directory services necessary to support the
Computing and Network Services (CNS) provides or acquires requisite services to provide Common Services to
meet the requirements of digital preservation.
A method of reducing the size of computer files. There are several compression programs available, such as gzip
- Compression ratio or reduction ratio
The ratio that is used to discuss the quantity of
original data versus the quantity of data after compression.
- Computer-Assisted Telephone Interviewing (CATI)/Personal Interviewing (CAPI)
A method of coding information from telephone or personal interviews directly into a computer during the
interview. CATI/CAPI software usually has built-in consistency checks, will not allow wild codes to be entered, and
automatically prompts the interviewer for correct skip pattern questions.
- consistency check
A process of data cleaning that eliminates inappropriate responses to branched questions. For instance, one
question might ask if the respondent attended church last week; a response of "no" should indicate that questions about
church attendance should be coded as "inapplicable." If those questions were coded any other way than "inapplicable,"
this would be inconsistent with the skip patterns of the survey instrument.
"The role played by those persons, or client systems, who
interact with OAIS services to find preserved information of interest and to access that information in detail. This can
include other OAISs, as well as internal OAIS persons or systems."
Member institutions and other users are the
Consumers of ICPSR digital assets.
- control cards
(See setup files.)
- cross-sectional study
In survey research, a study in which data from particular subjects are obtained only once. Contrast with
longitudinal studies, in which a panel of individuals is interviewed repeatedly over a period of time.
Note that questions in a cross-sectional study can apply to previous time periods.
Digital Audio Tape. A high-density storage medium.
For social science, data is generally numeric files originating from social
research methodologies or administrative records, from which statistics are produced.
At ICPSR, the majority of
digital content matches this definition of data. ICPSR's collections are expanding to include audio, video, geospatial,
Web-based and other digital content that pertains to social science research.
- data definition statements
(See setup files.)
- Data Documentation Initiative (DDI)
An effort to develop a specification for documenting data files in XML. The DDI Alliance is the organization that
created the specification, though "DDI" is often used to refer to the actual DTD created by the DDI Committee. More
information can be found on the DDI website.
- data entry
The process of converting verbal or written responses to electronic form.
- Data Management
"The OAIS entity that contains the services and functions for
populating, maintaining, and accessing a wide variety of information. Some examples of this information are catalogs and
inventories on what may be retrieved from Archival Storage, processing algorithms that may be run on retrieved data,
Consumer access statistics, Consumer billing, Event Based Orders, security controls, and OAIS schedules, policies, and
The pipeline incorporates a diagram and visualization of the Data Management function of OAIS for
ICPSR. The increasingly comprehensive Oracle system provides Data Management services and content defined in OAIS,
including information from the Deposit Form, the Study Tracking System, the metadata record, the current data library
system, the growing preservation system, the turnover system, and other components of the lifecycle as they are
automated. The process improvement initiative is reviewing and revising the lifecycle process at ICPSR.
- Data Processing
Within the field of information technology, data processing typically
means the processing of information by machines.
Data processing is defined by procedures designed to make a data
collection easier to use, ensure its accuracy, enhance its utility, optimize its format, protect confidentiality, etc.
For archival purposes, the process and results of data processing must be systematically and comprehensively captured so
that the process applied to the data is transparent to users.
Or "data set." A collection of data records. In the SAS statistical software, a "SAS data set" is the internal
representation of data.
(See Data Documentation Initiative.)
- DDI instance
An XML document marked up according to the DDI DTD. In other words, a codebook or catalog record marked up in
(See card image.)
Used to restore data to uncompressed form after compression.
- Designated Community
An OAIS concept describing the constituency for which the archived
information should be relevant and understandable.
The Designated Community includes depositors (Producers) and
users (Consumers) who are typically members of the social science research community or extensions of that community,
e.g., data librarians, digital archivists.
- dictionary file
A special form of machine-readable codebook that contains information about the structure of a data file and the
locations and, often, the names of variables in the data file. Typically, a researcher uses a dictionary file and a data
file together with statistical software; the statistical software uses the dictionary to specify variables by name,
rather than specifying their locations in the file.
- Digital Curation
"Digital curation is all about maintaining and adding value to a
trusted body of digital information for future and current use; specifically, the active management and appraisal of
data over the entire life cycle. Digital curation builds upon the underlying concepts of digital preservation whilst
emphasizing opportunities for added value and knowledge through annotation and continuing resource management.
Preservation is a curation activity, although both are concerned with managing digital resources with no significant (or
only controlled) changes over time."
Digital curation is a fairly new term. Curation of social science research
data has always been the mission and purpose of ICPSR, if not the term used to described what we do. ICPSR is
formalizing its data stewardship services at the University of Michigan and for member institutions.
- Digital Preservation
A term that encompasses all of the activities required to ensure
that the digital content designated for long-term preservation is maintained in usable formats, for as long as access to
that content is needed or desired, and can be made available in meaningful ways to current and future users.
Digital preservation is a distributed function that includes the Digital Preservation Officer, who develops and
promulgates requisite policies that reflect prevailing standards and practice in the digital preservation community;
Computing and Network Services, which oversees the archival storage function, the day-to-day operations of digital
preservation, and develops tools and procedures to perform digital preservation activities and meet archival
- Digital Videotape Formats
"A related family of open bitstream encoding formats for
recording digital video on physical media (tapes, hard disks) through digital video devices (digicams, camcorders)."
Currently DV, DVCAM, and DVCPRO are the most widely used digital videotape formats.
These digital video formats
are different and distinct from the digital video file formats that will comprise the main thrust of ICPSR's digital
video preservation program. However, it is likely that many depositors will use these formats and ICPSR must be prepared
to convert them. This will be somewhat challenging because it is can be difficult to transcode these formats to data
- Disclosure Limitation
Procedures undertaken to limit the risk of disclosure of
individual identities in data files.
The techniques used for disclosure limitation include data masking,
recoding, topcoding, swapping, and perturbation (see other ICPSR sources for definitions of these terms). Like data
processing, the process and results of disclosure limitation need to be systematically, comprehensively, and
transparently documented for users.
- Dissemination Information Package (DIP)
"The Information Package, derived from one or
more AIPs, received by the Consumer in response to a request to the OAIS." An archive works with Consumers over time to ensure that DIPs remain useful.
The DIPs are the access copies of
files (data, documentation, supporting files, and related metadata) that are made available to users by download via the
ICPSR website; by CD via the mail, for a subset of files that require a user agreement; or in the ICPSR data enclave
onsite, for files that contain sensitive information and cannot otherwise be made available.
- Document Type Definition (DTD)
A set of rules that applies SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language) to
the markup of documents of a particular type. A DTD provides a list of the elements, attributes, comments, notes, and
entities that may be used in the document, as well as their relationships to one another.
Generically, any information on the structure, contents, and layout of a
data file. Sometimes called "technical documentation" or "a codebook". Documentation may be considered a specialized
form of metadata.
Documentation has arrived in a wide array of formats since the establishment of ICPSR in 1962.
To meet preservation requirements, documentation must be complete, correct, comprehensive, current, and compliant (to
content and preservation standards). ICPSR produces documentation that conforms with the Data Documentation Initiative (DDI). (See the DDI website for current
information about the version and current status of DDI.) As an XML-based format, DDI provides a preferred preservation
format for documentation.
Downloading is the transmission of a file from one computer system to another, usually to a smaller computer
system. From the Internet user's point-of-view, to download a file is to request it from a Web page on another computer
and to receive it.
(See Document Type Definition.)
"The role played by those who set overall OAIS policy as one component in a
broader policy domain."
The Director, the Digital Preservation Officer, and the Director's Group perform the role
of Management in the OAIS context, with input from ICPSR Advisory Council and approval of the highest level
- margin of error
A measurement of the accuracy of the results of a survey. Example: A margin of error of plus or minus
3.5% means that there is a 95% chance that the responses of the target population as a whole would fall somewhere
between 3.5% more or 3.5% less than the responses of the sample (a 7% spread).
The characters and codes that change a text document into an XML or other Markup Language document. This includes
the < and > characters as well as the elements and attributes of a document.
A term that refers to structured data about data. Metadata is an old concept
(e.g., card catalogs and indexes), but metadata is often essential for digital content to be useful and meaningful.
Metadata can capture general or specific information about digital content that may define administrative, technical, or
structural characteristics of the digital content. "Preservation metadata" is the term for a broader set of
metadata that documents the lifecycle of digital content from creation through processing, storage, preservation, and
use over time. Preservation metadata is required at the aggregate (e.g., collection and study level) and at the item
(e.g., file and variable) level. All preservation actions that are applied to digital content over time should be
captured in preservation metadata, for example. The Preservation Metadata Implementation Strategies (PREMIS) data
dictionary is a digital preservation community development that is moving towards being a standard. There are
additional format-specific (e.g., NISO Still Image data dictionary) and other standards that define additional metadata
We prepare a metadata record for each data collection, and we present a searchable database of
metadata records on our public website. ICPSR has defined a set of file-level metadata elements for preservation and
intends to comply with PREMIS as it develops. The process improvement initiative at ICPSR includes the identification
of metadata at each stage of the pipeline.
Microdata files are those that contain information on individuals rather than aggregate data. The U.S. Census
Bureau's "Summary Files" contain aggregate data and consist of totals of individuals with various specified attributes
in a particular geographic area. They are, in a sense, tables of totals. The Bureau's PUMS (Public Use Microdata Sample)
files, however, contain the data from the original census survey instrument with certain information removed to protect
the anonymity of the respondent.
- missing data
Missing data values are assigned when the information being collected is missing, which can happen for several
reasons -- among them, the respondent refused to answer or the question was inapplicable for that particular respondent
because of previous responses.
The Open Archive Information System (OAIS) Reference Model, an ISO
standard that formally expresses the roles (producer, management, consumer, and implicitly archives), functions (common
services, ingest, archival storage, data management, administration, preservation planning, and access), and content
(submission information package, archival information collection, archival information package, and dissemination
information package) of an archive. It was approved as an ISO standard in 2003. OAIS is undergoing a five-year review
The digital preservation policies program, system, and function are being developed in conformance with
- operating system
The special software required to make a computer work. It provides the link between the user and the hardware.
Popular operating systems include DOS, MacOS, VMS, VM, MVS, UNIX, and OS/2. (Note that "Windows 3.x" is not an operating
system as such, since it must have DOS to work, while Windows NT and Windows 98 are operating systems.)
Statistical software similar to SPSS and SAS with strong data management features. In the past ICPSR distributed
many studies in OSIRIS format with special machine-readable codebooks and dictionary files readable by the OSIRIS
software. (Note: OSIRIS has been officially decommissioned by its sponsor, the Institute for Social Research, University
- OSIRIS codebook
A machine-readable codebook written in binary format for use with OSIRIS software.
- OSIRIS dictionary
A machine-readable data dictionary usable with OSIRIS software. ICPSR distributes only "Type 1" OSIRIS
dictionaries, which are in a binary format and must be written in EBCDIC. OSIRIS "Type 5" dictionaries are character
- packed decimal
A method of encoding two pieces of information in a single byte. For instance, instead of storing a digit in one
byte and a sign in another byte using a traditional character encoding scheme, a packed decimal format might use a
binary number to indicate the value of the digit in 4 bits of the byte and a code indicating whether the digit is
positive or negative in the other 4 bits. The International Monetary Fund distributes data in packed decimal
A group of individuals who are interviewed more than once over time in a longitudinal survey.
- panel study
A longitudinal study in which a panel of individuals is interviewed at intervals over a period of time. In
general usage, the definitions of longitudinal study and panel study overlap. At least one author says that the term
"panel study" is sometimes used for studies that are restricted to a short period of time or are limited to two or three
interviews, and "longitudinal study" is used for studies that last longer or include more interviews; but there are
significant examples where this distinction is not accurate. In general, longitudinal studies involve panels of
respondents and panel studies are longitudinal studies. Examples of panel studies include the Survey of Income and
Program Participation (SIPP) and the Panel Study of Income Dynamics (PSID).
An algorithm or program to determine the syntactic structure of a sentence or string of symbols in some language.
Essentially, a program that analyzes the structure of text, looking for particular patterns and extracting/editing based
upon pre-established rules.
(See Portable Document Format.)
- physical record
A segment of data that has a specified and constant size in bytes or that is clearly delimited from other records
by a newline character or sector of a disk or other means identifiable to a computer program reading the file. For
example, a card-image data file has physical records of 80 bytes each, by definition. In a file in
logical record length structure, each physical record is the same number of bytes in length as the
"logical record length." See also line.
An abbreviation for principal investigator.
In computer science, pipeline processing is "a category of
techniques that provide simultaneous, or parallel, processing within the computer. It refers to overlapping operations
by moving data or instructions into a conceptual pipe with all stages of the pipe processing simultaneously. For
example, while one instruction is being executed, the computer is decoding the next instruction." The term pipeline
calls to mind the assembly line approach in manufacturing.
The pipeline refers to the flow of digital content
from reception through processing to public release with imbedded preservation milestones.
- portable file
In computer usage, a file or program is "portable" if it can be used by a variety of software on a variety of
hardware platforms. SPSS portable files can be produced using the "export" command.
- Portable Document Format (PDF)
A universal file format that retains the page layout, typography, and graphics of the original document and can
be viewed, printed, and searched with viewer software such as Adobe Acrobat.
- Preservation Planning
The OAIS entity that "provides the services and functions for
monitoring the environment of the OAIS and providing recommendations to ensure that the information stored in the OAIS
remains accessible to the Designated User Community over the long term, even if the original computing environment
becomes obsolete. Preservation Planning functions include evaluating the contents of the archive and periodically
recommending archival information updates to migrate current archive holdings, developing recommendations for archive
standards and policies, and monitoring changes in the technology environment and in the Designated Community's service
requirements and Knowledge Base. Preservation Planning also designs IP templates and provides design assistance and
review to specialize these templates into SIPs and AIPs for specific submissions. Preservation Planning also develops
detailed Migration plans, software prototypes and test plans to enable implementation of Administration migration
The Digital Preservation Officer is primarily responsible for Preservation Planning with the programming
and technical infrastructure support of Computing and Network Services (CNS).
- principal investigator
The person or organization responsible for a study; equivalent to "author" in bibliographic citations.
"The role played by those persons, or client systems, who
provide the information to be preserved. This can include other OAISs or internal OAIS persons or systems."
Producer includes principal investigators, project managers, federal agencies, other data archives, and others; it is
anyone who authorizes (or requires) ICPSR to preserve digital content.
Often a data analyst or data producer will produce new data values from raw data and include these in a data
file; this process is called "recoding." For instance, an age variable might contain a respondent's actual age in years,
but this information might be "recoded" to produce a new variable, "eligible voter," with a code of "1" for all those 18
and over and a code of "2" for all those under 18.
Depending on the context, "record" may refer to a physical record or a logical record. See also
- record length
Depending on the context, the length in bytes (i.e., columns) of a physical record or a logical record.
- record type
A record that has a consistent logical structure. In files that include different units of analysis, for
instance, different record types are needed to hold the different variables. For example, one record type might have a
variable for income in one column and another record type might have a variable for household size in that same column.
The codebook will describe these different structures and how to determine which is which so that the user can tell
statistical software how to interpret that particular column as income or household size.
- rectangular file
A physical file structure. A rectangular file is one that contains the same number of card images or the same
physical record length for each respondent or unit of analysis. Contrast with hierarchical
- relational structure
A study that includes different units of analysis, particularly when those units are not arranged in a strict
hierarchy as they are in a hierarchical file, has a relational structure. Note that the data could be arranged in
several different physical structures to handle such a data structure. For instance, each unit of analysis might be
stored in a separate rectangular file with identification numbers linking each case to the other units; or, the
different units of analysis might be stored in one large file with a hierarchical file structure; or the different units
could be stored in a special database structure used by a relational database management system such as INGRES. An
example of a study with a relational structure is the Survey of Income and Program Participation, which has eight or
more record types; these record types are related to each other but are not all members of a hierarchy of membership.
For instance, there are record types for household, family, person, wage and salary job, and general income
In survey research, the person responding to the survey questions.
- response rate
The ratio of returned questionnaires to the survey designed universe.
The denominator is the number of designed subjects, whether a sample or a population, that was approached by mail,
telephone, or other channel of investigation. The numerator is the number of actual responses.
- response codes
Typically responses to questions are "coded" by assigning numeric codes to each possible response. Thus, a "yes"
might be coded "1" and a "no" "2"; female respondents might be indicated "1" and male respondents "2"; each state or
county might be assigned a numeric code.
- Restricted-Use Data
Data that contain sensitive information (usually about human
subjects) that could permit the identification of individuals.
To obtain access to these data through ICPSR, a
user must complete a legal contract or in some cases travel to where the data are stored. The presence of sensitive
information in deposited digital content presents a management challenge for long-term preservation to ensure that
archival storage requirements for achieving distributed redundancy address confidentiality requirements, for
- secondary analysis
The process of reexamining existing data to address new questions or use methods not previously
- setup files
A character format file written in a statistical software language (SAS, SPSS, Stata, etc.) describing a data
file. These files are useful because they provide variable locations, names, and labels. Software-specific code must be
added to perform analysis.
Standard Generalized Markup Language. A generic language for document representation. SGML is an international
standard that describes the relationship between a document's content and its structure.
- skip pattern
In survey research, the sequence of questions asked and skipped. For instance, if a respondent answers a question
that indicates he did not vote in the last election, his data record should "skip" items regarding how he voted in the
All the information collected at a single time or for a single purpose or by a single principal investigator. A
study consists of one or more files. Examples: the General Social Survey; A Gallup Poll; the 1990 Census of
Population and Housing STF 1A.
A document that explains the rules governing how another document (or group of documents) should display. For
HTML pages, stylesheets are written as Cascading Stylesheets (CSS); for XML files, we use XSLT.
- Submission Information Package (SIP)
"An Information Package that is delivered by the
Producer to the OAIS for use in the construction of one or more AIPs."
The SIP includes the original files and associated metadata and documentation, including
information provided on the ICPSR Deposit Form.
- system file
A generic term for the native or internal storage format used by statistical software. When statistical software
reads a "raw" character format data file consisting of ASCII or EBCDIC characters, it must read each byte in sequence.
It can be more efficient in its storage, retrieval, and calculations by storing a data file in a special binary format
called a system file. Typically, a system file for one brand of software cannot be read by another brand of software or
by the same brand on another hardware platform. Some software is capable of creating a portable file that can then be
read by other software or on other platforms.