Human Subject Protection and Disclosure Risk Analysis

 

Home

Mission

Centers
 ICPSR
 ISR

Projects

Contact Us

Related Sites

Conferences

Citations

   
 

Project II: Estimation of Disclosure Risk and Statistical Methods for Disclosure Limitation

PRINCIPAL INVESTIGATOR
T. E. Raghunathan
Senior Research Scientist
Institute for Social Research
University of Michigan

CO-INVESTIGATORS
Ben B. Hansen
Visiting Research Investigator
Institute for Social Research
University of Michigan

Roderick J. A. Little
Richard Remington Collegiate Professor of Biostatics
Department of Biostatistics
University of Michigan

Richard Valliant
Senior Research Scientist
Institute for Social Research
University of Michigan

Project Summary

The recent explosion in demands for microdata from researchers and policy makers, especially when the data collection is paid for with public funds, has increased concerns about confidentiality protection. Confidentiality of responses is a serious commitment made by data-collecting agencies to participants in the study. A similar commitment is expected of any agency that disseminates the data to researchers and policy makers. Along with the increased demands for microdata, a number of commercial databases with the identifying information such as names and addresses and demographic information have also become accessible. These databases raise the concern that an intruder can potentially link the anonymous survey data released by the data collection agencies for public use with the commercial databases to identify one or more respondents to the survey.

This research proposal has three primary objectives:

  1. To assess the risk of disclosure using data from four test-bed national probability surveys covering a wide variety of topics. The risk will be addressed using two broad classes of intruder models. Type I. where an individual, personally known to the intruder, is known by the intruder to be in the survey; and Type H, where an intruder with access to an external database with names and addresses is seeking to identify respondents in the survey and hence gain access to confidential information;
  2. To develop and evaluate new methods to avoid disclosure, and
  3. To develop strategies for replacing variables in public-use data sets deemed to increase the risk of disclosure by summary variables that allow users to adjust or control for these variables without knowing their actual values.

Specific Aims of the Project

Increasing demands for microdata from researchers and policy makers, especially when the data collection is paid for with public funds, have increased concerns about confidentiality protection. Simultaneously, tragic deaths of biomedical research subjects have recently led to increased focus on human subject protection. Confidentiality of responses is a serious commitment made by data-collecting agencies to research participants. A similar commitment is expected of any agency that disseminates the data to researchers and policy makers. Along with the increased demands for microdata, a number of commercial databases with identifying information such as names and addresses and demographic information have also become accessible. These databases raise the concern that an intruder can potentially link anonymous survey data released by data collection agencies for public use with the commercial databases to identify one or more respondent to the survey. In response to this concern, many data-collection agencies withhold from public use data sets geographic, demographic and other potential key information. Such actions limit the use of data by researchers and policy makers.

Rather than restricting the amount of data released, researchers can also restrict the access to full data sets. Several data enclaves have been established where persons wishing to use the microdata must perform analysis in these locations, using computers on site or through remote access. The latter option requires several passes through "checkpoints" to ensure that the output does not contain any potential identifying information or raw microdata. These solutions limit the major positive social, medical, health, economic and policy advances that can accrue from the release of data to users from a broad spectrum of society.

As with all projects in this application, we view resulting harm to human subjects from a breach of the pledge of confidentiality as resting on a sequential set of probabilities and expectations--the probability that one or more intruders will attempt to identify persons in the data set, the probability that they succeed, and the magnitude of harm to a subject upon disclosure. This project concentrates on estimating the probability of disclosure given an attempt at intrusion. Further, it focuses on the threat of disclosure posed by publicly available data sets that could be used to match to publicly-released microdata sets that have provided a pledge of confidentiality to survey respondents.

This research proposal has three primary objectives:

  1. To assess the risk of disclosure using data from four test-bed national probability surveys covering a wide variety of topics;
  2. To develop and evaluate new methods to protect microdata sets from disclosure, and
  3. To develop strategies for replacing variables in public-use data sets deemed to increase the risk of disclosure by summary variables that allow users to adjust or control for these variables without knowing their actual values.

We will develop and evaluate strategies to augment public-use data files with summaries of variables that otherwise have been suppressed to preserve confidentiality. The new methods will be based on generalizations of propensity score and pair matching methods.

The results of our Specific Aims will be developed in parallel with the research developed by the team working on Project I so that the informed consent techniques they will investigate will study how best to describe the techniques for maximal comprehension by the survey respondents. The project will also work in conjunction with Project 3 as part of their aim of assessing best practices, and with Project 4 for the creation of training and dissemination materials.

 
    
   

Home •  Mission •  Centers •  Projects •  Contact Us •  Related Sites •  Conferences •  Citations

© 2005 ICPSR

Contact Web support