Human Subject Protection and Disclosure Risk Analysis

 

Home

Mission

Centers
 ICPSR
 ISR

Projects

Contact Us

Related Sites

Conferences

Citations

   
 

Project III: Statistical Disclosure Control: Best Practices and Tools for the Social Sciences

PRINCIPAL INVESTIGATOR
JoAnne McFarland O'Rourke
Research Investigator, Archivist
Inter-university Consortium for Political and Social Research
Institute for Social Research
University of Michigan

CO-INVESTIGATOR
Myron P. Gutmann
Director and Senior Research Scientist
Inter-university Consortium for Political and Social Research
Institute for Social Research
University of Michigan

Project Summary

In Project 3, research objectives turn to the practical application of statistical disclosure assessment and limitation. The ultimate goals of this project are to delineate and define best practices for disclosure limitation and to develop and test tools for applying disclosure limitation techniques. Achieving these goals begins with a thorough synthesis of the disclosure literature and documentation (Phase I). Next, an investigation of current disclosure practices, knowledge, resources, and barriers will be performed through the administration of a survey that begins with the Principal Investigators drawn from a sample of funded NIH and NSF studies and is then extended to others participants involved in disclosure decisions for the sampled studies (Phase 2). We follow the survey with in-depth interviews of a purposive sample of survey respondents to obtain a nuanced understanding of how disclosure concepts translate into practice (Phase 3). Based on the results of Phases 1-3, a set of best practices will be identified which integrate the findings of Projects 1 and 2 (Phase 4). Finally, we develop, test, and refine tools that incorporate best practices (Phases 5 and 6).

Specific Aims of the Project

Despite a significant body of literature on statistical disclosure limitation published over the last 25 years, social science data producers continue to approach the disclosure limitation task in very different ways, and disclosure limitation practices vary widely, even for similar types of data. Some data preparers do not do enough to limit disclosure risk, while others suppress data unnecessarily, with corresponding reductions in analytic utility. The level of knowledge regarding risk assessment and the lack of useful tools to assist in decision-making regarding disclosure limitation also contribute to these variations in practice.

This project will identify a set of best practices for assessing and limiting disclosure risk for microdata and develop tools for researchers and archivists to assist them in applying these best practices. We will begin by conducting a thorough review of available information on disclosure limitation practices, including the practices of thc U.S. statistical agencies, and the contexts in which different practices are employed.

Next, we will conduct a survey of researchers and archivists to provide the first systematic measure of the proportion of social science data originally collected or acquired that are ultimately released. Through this survey, we will also determine how decisions to protect respondent confidentiality are actually made, how decision-makers' risk aversion impacts which data are released, and what the specific barriers to data release are. Once the survey is complete, we will conduct semi-structured interviews with respondents for a purposive sample of research studies to allow a nuanced understanding of how respondents think about confidentiality protection and how disclosure decisions are reached.

Third, we will construct models of best practices within specific contexts, applying the results of this project as well as the results of Project 2. Finally, using ICPSR data, we will develop and test tools that incorporate best practices. These tools will translate the theory of disclosure risk and disclosure limitation into practical tools that are broadly applicable. In addition, these tools will help facilitate the dissemination of more data in the public realm and at the same time help create greater levels of protection in public use data than are currently available. We will also investigate the possibility of converting the tools to a stand-alone software package.

Project Outcomes

Disclosure Risk Analysis Article Published in Ethics Journal

JoAnne McFarland O'Rourke of ICPSR along with colleagues from the disclosure committee of the Substance Abuse and Mental Health Data Archive (SAMHDA) at ICPSR published an article in the September 2006 issue of the Journal of Empirical Research on Human Research Ethics (JERHRE). The article is titled, "Solving Problems of Disclosure Risk While Retaining Key Analytic Uses of Publicly Released Microdata." Coauthors are Stephen Roehrig of Carnegie Mellon University, Steven G. Heeringa of the Institute for Social Research's Survey Research Center, Beth Glover Reed and William C. Birdsall of the School of Social Work at the University of Michigan, and Margaret Overcashier and Kelly Zidar of ICPSR.

When creating a data protection plan, it is important to strike a balance between human subject protection and data that retain analytic utility, say the authors. They write that the public-use version of the data is very important because it is likely to be the only one to which most researchers, policy analysts, teaching faculty, and students will ever have access. Hence, it is the version from which much of the utility of the data is extracted and often it effectively becomes the historical record of the data collection.

In the article, the authors analyze the disclosure risks and discuss the data protection plans for two national studies and one administrative data system. Taking key uses of each of the data collections into consideration, they employ three distinct disclosure limitation methods to protect respondents while still providing statistically accurate and highly useful public-use data: data swapping, microaggregation, and suppression of detailed geographic data. They describe the characteristics of the data sets that led to the selection of these methods, provide measures of the statistical impact, and give details of their implementation.

They also describe the composition of their disclosure committee, highlight the important disciplines and experience represented by the group, and describe the group process. The authors end by suggesting best practices for data users, principal investigators, and distributors; possible research agendas; and educational implications.

The article is available online.

Project Bibliography

Gutmann, MP, O'Rourke, JM, and Colyer, CJ. "Human Subjects Protection and Disclosure Risk Analysis." Paper presented at the annual meeting of the American Sociological Association. San Francisco, CA, August, 2004.

Gutmann, MP, O'Rourke, JM, Witkowski, K., Colyer, C., and McNally, J. "Providing Spatial Data for Secondary Analysis: Issues and Current Practices Relating to Confidentiality." Paper presented at the 2005 Population Association of America Meeting, Philadelphia, PA, and at the Statistics Canada Methodology Workshop, Ottawa, October 2005. Submitted for publication to Population Research and Policy Review.

O'Rourke, JM. "Assessing Disclosure Risk and Preserving Analytic Utility Using Disclosure Analysis," workshop presented for ICPSR's summer program course Data Sharing and Dissemination: Making Your Data a Resource for Others, June, 2005.

O'Rourke, JM. Disclosure Risk Analysis, Survey Research Center Seminar Series Protections of Humans Subjects, Institute for Social Research, University of Michigan, May, 2006.

O'Rourke, JM, Roehrig, S, Heeringa, SG, Reed, BG, Birdsall, WC, Overcashier, M., Zidar, K. "Solving Problems of Disclosure Risk While Retaining Key Analytic Uses of Publicly Released Microdata." Journal of Empirical Research on Human Research Ethics, 1(3), Sep. 2006, 63-84.

 
    
   

Home •  Mission •  Centers •  Projects •  Contact Us •  Related Sites •  Conferences •  Citations

© 2005 ICPSR

Contact Web support