Building Data Registries with Privacy and Confidentiality for Patient-Centered Outcomes Research (PCOR) [Methods Study], 2020 (ICPSR 39579)
Version Date: Nov 24, 2025 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Li Xiong, Emory University
https://doi.org/10.3886/ICPSR39579.v1
Version V1
Summary View help for Summary
Researchers can use patient health data to compare treatments. But these data may include information, like names or social security numbers, that could identify patients. Researchers use different methods to remove such information and protect patients' privacy. Some methods work well to protect privacy but may make data less useful for research. Other methods don't protect privacy well enough.
Current methods for protecting privacy don't work well when:
- The number of patients in the data set is smaller than the number of data fields, such as patient traits or health conditions, and data are updated many times
- Patients' health and treatments are measured at more than one point in time
- Data are displayed as a graph to better capture some types of content
In this study, the research team created three new methods. The team wanted to see if the new methods better protect patient privacy but also make sure data remain useful for research.
To access the methods and software, please visit the AIMS Group at Emory University.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Distributor(s) View help for Distributor(s)
Study Purpose View help for Study Purpose
To develop and evaluate algorithms for building patient-centered and privacy-preserving data registries. The project had 3 specific aims: (1) develop methods for establishing registries of private data, (2) develop methods for establishing registries that contain both private and consented data, and (3) develop methods for evaluating and tracking patient privacy risks and establishing data registries that take into account fine-grained patient privacy preferences.
Study Design View help for Study Design
In this study, researchers designed algorithms based on the differential privacy (DP) framework to build data registries. The DP framework requires that any statistical aggregations and computations published using patient data limit or eliminate the possibility of determining if the calculations included a particular patient's record. Researchers developed three algorithms:
- Distance-based sampling with adaptive threshold (DSAT) for high dimensional and dynamic data
- Differentially private frequent sequence mining (PFS2) via sampling-based candidate pruning for correlated sequential data
- Differentially private frequent subgraph mining (DFG) for correlated graph data where correlated data are modeled as graphs representing correlations of co-occurrences of health events Researchers compared the performance of the new algorithms with existing algorithms.
Existing DP algorithms assume that patient data at different time points are independent or that temporal correlations do not risk re-identification of data. To quantify and control the privacy loss from existing DP algorithms for correlated sequential data, researchers developed the Control Temporal Privacy Leakage (ConTPL) method.
A nine-member stakeholder panel including patient privacy advocates, patients, privacy compliance officers, and biomedical informaticians gave input on the design and conduct of the study.
Data Source View help for Data Source
Simulated data created using Emory Analytic Information Warehouse, Clinical Data Warehouse for Research at University of California San Diego, Adult Data Set from US Census, MSNBC data set obtained from UCI Machine Learning Repository, and Open National Cancer Institute Database
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.
