ICPSR Collection Development Policy

Executive Summary

ICPSR maintains an extensive archive of data to support research and knowledge building in the social and behavioral sciences. This policy sets forth a description of the characteristics of data that ICPSR has interest in adding to the collection. ICPSR intentionally casts a broad net in order to add a wide range of data that would be of interest to the diverse fields representing the social and behavioral sciences. However, at the same time the organization applies additional appraisal criteria to determine the appropriate level of curatorial investment that ICPSR will make to ensure long-term and effective use of the data.

By balancing our broad interests with a more focused investment in curation, ICPSR is positioned to select a wide array of social and behavioral science data while spending member resources strategically to best anticipate the future needs of the research community and the broad public.

ICPSR seeks to acquire, archive, and disseminate data of interest to researchers in the social and behavioral sciences. Our definition of the domain is necessarily broad and encompassing as we recognize that traditional boundaries between disciplines are blurring, research is becoming more integrative, and our designated community is expanding. We focus our collecting efforts on data that can address key questions about the human experience in all of its diversity and richness.

To guide the growth and management of ICPSR’s collection, we have created a policy that offers clear goals but also a degree of flexibility to enable ICPSR to respond to the rapid pace of change in the research environment. We present the policy here to inform ICPSR membership, users, core partners, prospective depositors, and potential funders of the principles governing our collection development activities.

The Inter-university Consortium for Political and Social Research began to build a collection of data to be shared across its member institutions in 1962. The early archive included the American National Election Study and other sample survey data. By the 1970s, several large-scale, social science surveys, including some that were conducted by the various centers of the Institute for Social Research at the University of Michigan, were added. ICPSR’s collection of survey data strengthened in the decades following, but the archive also expanded in new ways, partly due to Federal grants and contracts awarded to ICPSR to archive special collections, such as data on criminal justice and aging. In later years, additional topical archives were added: substance/drug use, HIV, education (including early education and childcare), health and medical care, demography, racial and ethnic minorities and several others (see the thematic collections page for a full listing).

Over time, ICPSR began to add new data types resulting from a wide range of quantitative and qualitative methods to its collection. As ICPSR’s capacity for curating new kinds of data about the human experience has grown, there has been simultaneous, albeit sometimes slow, growth in the culture of data sharing throughout the scientific research community. ICPSR has positioned itself to invite data from a full range of social, behavioral, and health science disciplines in recognition that social relationships and status are closely intertwined with biological, cognitive, and clinical processes and experiences, to name but a few.

Data from ICPSR are used primarily by the academic research community, which includes researchers and students around the world. ICPSR data and data-related products and services are also used by policymakers, consultants, service providers, journalists and other professionals. As ICPSR continues to increase its supply of free, open-access data, the broader public will increasingly be able to take advantage of ICPSR data.

Building on broad, inclusive collection development policies from the past and also acknowledging the increasing importance of research infrastructure that supports cross-disciplinary research, ICPSR seeks data from many disciplines, in support of many methods, and about wide-ranging population groups as described below. These lists are not intended to be exhaustive of what ICPSR is interested in collecting, but rather to show the wide range of data that are considered in scope for ICPSR. Also, as user demand for data broadens, and to better anticipate future trends in research, ICPSR is willing to consider additional kinds of data not appearing below. See also Out of Scope and High-Priority Areas.

  • Sociology
  • Political Science
  • Government
  • Economics
  • Public Policy
  • Law
  • Business
  • Demography
  • Education
  • Communication/Media Studies
  • Environmental Studies
  • Criminology
  • Geography
  • Anthropology
  • Archeology
  • Rural Studies/Urban Studies
  • Psychology
  • Human Development
  • Family Studies
  • Gender/Women’s Studies
  • Gerontology
  • Epidemiology
  • Public Health
  • Nursing
  • Health Care/Medicine
  • Social Work
  • Arts & Humanities
  • Survey Techniques (including online)
  • Interviewing Techniques
  • Qualitative
  • Experiments
  • Content Analysis
  • Textual Analysis
  • Field Research and Observational Techniques
  • Historical Methods
  • Clinical Trials
  • Interventions
  • Policy Analysis
  • Administrative Databases
  • Web Scraping
  • Data Mining
  • Replication Data
  • Teaching Packages with Data
  • Code and Syntax to apply to data
  • Data visualization
  • Video
  • Audio
  • Geospatial
  • Biomarker
  • U.S.
  • International/Cross-National/Comparative
  • State/Regional/Local
  • Criminal Justice Populations
  • Children & Adolescents
  • Adults and the Elderly
  • Various Race & Ethnic Groups
  • Families, Couples, & Households

ICPSR is frequently asked to define what types of data it will not accept. The following list outlines some of the criteria used to define data that are not in scope for ICPSR.

  • Non-Social and Behavioral Research Data: Data that cannot be connected with or used to expand upon the scientific investigation of the social dimensions of human lives (both antecedents and consequences) will not be acquired. For example, most data in the physical sciences are out of scope for ICPSR.
  • Cost of Data: ICPSR generally does not purchase data or pass along the costs of access to proprietary data to the user community. Therefore, data with associated fees may be considered out of scope for ICPSR.
  • Limited Access Data: ICPSR generally does not accept data requiring limitations on use, with the exception of data with access conditions intended to protect the privacy and identity of the human subjects. For example, ICPSR generally does not accept data in cases where access would be governed by an outside entity (e.g., external approval for use, publication review, authorship requirements).
  • Availability Elsewhere: ICPSR prefers to be the archive of record for a data collection. Data that are permanently available from another trusted repository may instead be linked to from the ICPSR catalog.
  • Directly Identifying: ICPSR, generally, does not accept data with direct identifiers.
  • Copyright: ICPSR only accepts data when the data contributor grants ICPSR rights to curate, disseminate and preserve a copy of the data.

Historically, ICPSR has acquired and processed government data collections either with support of the ICPSR membership or through topical archives at ICPSR that make data freely available to the public. As the U.S. government has increasingly become an authoritative and reliable source for the data it collects, ICPSR will acquire government data only when it believes it can: (1) add significant value for its users, (2) ensure long-term preservation of the data, and/or (3) add value through data curation (especially DDI) to leverage/increase the access, discoverability, and correct use of data.

For other important government collections, ICPSR will provide links to the original data sources in its catalog. ICPSR will continue to accept requests from users about government data that are difficult to locate or use, or of such high interest that their acquisition by ICPSR is justified.

ICPSR identifies high-priority data through review and analysis of user demand (user behavior, recommendations of ICPSR Council, Official Representatives, and the membership) and scanning the research landscape (review of scholarly publications, grant award databases, and trending research topics in the news). The high-priority list evolves as topics are added and others are dropped (updated annually). The purpose is to encourage identification, nomination, and deposit of high value data deemed important to users and found to be limited in ICPSR’s current holdings. The current high-priority areas are:

  • Sexual Orientation: As the United States embarks on major policy decisions regarding the legality of same sex marriage, ICPSR has observed a large interest in data about sexual orientation.
  • Bullying: One of the trending topics in research is bullying in schools and other organizations (e.g. places of employment). Research seeks to understand bullying and the consequences of bullying on others, and what kinds of interventions (e.g. school-based, workplace) are effective at reducing bullying and the effects of bullying. ICPSR would like to identify and acquire research data about bullying in schools, cyberbullying and so on.
  • Social Media: ICPSR users frequently search for social media data as this method of research is becoming an increasingly important source for information in the social and behavioral sciences. ICPSR is particularly interested in acquiring studies on the frequency and type of participation in social media and the impact of social media, as well as internet use and behavior. Adding data collections built from social media (e.g. blogs, posts, profiles, search behavior) data would both support secondary analysis of these data and potentially spur collection of new social media data.
  • Immigration: Policies around immigration reform and understanding the experiences of legal and illegal immigrants remain front and center on the American stage and stem from our roots as a country built by immigrants. ICPSR seeks to archive data that yield a new understanding of immigrant populations and help further our knowledge about the impact of immigration.
  • Individual Well-Being: Understanding how men and women achieve and maintain psychological and social well-being in contemporary settings remains an important research goal, yet data sharing in this area has been somewhat limited. ICPSR seeks data on a wide range of topics about mental or psychological well-being, social well-being, happiness, depression, and demoralization.
  • Longitudinal Data: Longitudinal data are often seen as a gold standard in human development research and many social and behavioral science disciplines. Also, longitudinal data help to establish causality. Because of the strengths of this approach, ICPSR remains very interested in archiving longitudinal data.
  • International Data: ICPSR seeks data originating from one or more non-U.S. countries. We are especially interested in data from Asian countries as well as countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data. Also, we seek comparative data that could be used to support cross-national, comparative research.
  1. ICPSR prefers data in a readily useable format (see the Library of Congress’ Recommended Format Specifications), accessible in a variety of computing and technological settings.
  2. ICPSR prefers data formats that promote easy access and use without compromising research value.
  3. ICPSR prefers that data files deposited in a raw format be transformable or convertible into formats usable by a variety of statistical or analytical software.
  4. ICPSR prefers data files unaccompanied by value-added software.
  5. Data in obsolete, proprietary, or hard-to-use formats may still be accepted by ICPSR, although these characteristics may compromise any future use of the data other than as-is, bit-level access.
  1. ICPSR requires that data deposited in the archive meet recognized standards for privacy and confidentiality of subjects studied. (For information on these standards, see the University of Michigan’s Human Research Protection Program information).
  2. ICPSR prefers to acquire data that can reside in the public domain.
  3. ICPSR requires that data intended for public use be formatted so that identifiers inadvertently included in the data can be removed using standard practices without reducing the research value of the original data.
  4. Any access limitations that ICPSR might apply to specific data collections (e.g., a requirement that restricted-use agreements must be signed) should be legally justified and manageable given ICPSR’s resources, goals, and mission.

ICPSR maintains a broad policy of inclusion based on two levels of curation services. ICPSR offers both an option of making data available to the user community in the condition deposited (no curation) along with an option for member-funded curation, which involves review, enhancement, and quality checking of the data to ensure usability and findability.

The selection criteria employed for the two levels of curation services are:

  • No Curation: The least restrictive stream of data entering ICPSR is data that receive no curation. Any depositor with data meeting the terms of ICPSR’s broad Collection Development Policy may deposit and publish data in openICPSR, an open-access repository. Fees may be collected from non-member institutions for this service. Confidential data in openICPSR requiring restricted access will be supported with fees collected from users. Curation services, paid for by the contributor, are available.
  • Member-Funded Curation: ICPSR also accepts and curates data that are considered to be valuable (either in the present or future) to the membership of ICPSR. There are additional selection criteria placed on data that are curated for the ICPSR membership. Potential value is determined by the following criteria:
    • Popularity: Data in openICPSR with evidence of significant usage will undergo review for possible member-funded curation.
    • Series: ICPSR maintains longstanding series and will continue to curate new data that are part of these series to maximize the historic investment in the data by funding agencies, data producers, researchers, and ICPSR itself.
    • Methodological Rigor: Data that are methodologically sound, including but not limited to nationally representative sampling designs, will be identified and acquired for member-funded curation. Data stemming from an ineffective or flawed research design will not be curated for the membership and instead deferred to openICPSR.
    • Scientific Reputation: Highly cited data collections, data collected by frequently cited scientists, and data resulting in high quality citations (impact) will be identified and acquired for member-funded curation.
    • Data and Documentation Quality: Quality of the data and documentation are considered when reviewing incoming data. If there is inadequate documentation and/or data are of poor quality, data will not be curated for the membership due to the higher cost of curation and prospects for more limited use. Data not meeting this criteria will be deferred to openICPSR.
    • High-Priority: Data that are in demand and/or that represent known gaps in ICPSR holdings will be acquired for curation for the ICPSR membership (see High-Priority Data above). ICPSR understands that new areas of research may be more experimental and as such the data might not otherwise meet the criteria for curation. ICPSR considers these high-risk, high-reward data as being worthy of curation for the membership at lower levels of quality, methodological rigor, and reputation.

ICPSR also has several grants and contracts to provide data archiving services to various research communities by creating topical archives, but each of these projects has developed its own set of selection criteria that fit within the broad Collection Development Policy of ICPSR.

ICPSR's investment in curation ensures usability, findability, and long-term access to data it anticipates being of greatest value today and into the future. Curation services include reviewing incoming data and documentation for accuracy, consistency, meaning, and ensuring that data can be understood by users who did not collect the data. The rich metadata developed by ICPSR during curation also help in the delivery of data to the user so that data can be understood and found by the widest audience. Techniques for minimizing disclosure risk are applied to the data during curation. Curated data are preserved in an archival format to ensure long-term access as well as presented in multiple formats in use currently for easier, immediate access by users. In addition to the curation services funded by the ICPSR membership or government grants and contracts, data contributors may purchase curation services.

This policy is subject to a five-year review and re-issuance of policy by Collection Development Committee (approved last on 07/07/2015). This policy is open for review and comment by membership at any time.