Building Data Registries with Privacy and Confidentiality for Patient-Centered Outcomes Research (PCOR) [Methods Study], 2020 (ICPSR 39579)

Version Date: Nov 24, 2025 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Li Xiong, Emory University

https://doi.org/10.3886/ICPSR39579.v1

Version V1

Slide tabs to view more

Researchers can use patient health data to compare treatments. But these data may include information, like names or social security numbers, that could identify patients. Researchers use different methods to remove such information and protect patients' privacy. Some methods work well to protect privacy but may make data less useful for research. Other methods don't protect privacy well enough.

Current methods for protecting privacy don't work well when:

  • The number of patients in the data set is smaller than the number of data fields, such as patient traits or health conditions, and data are updated many times
  • Patients' health and treatments are measured at more than one point in time
  • Data are displayed as a graph to better capture some types of content

In this study, the research team created three new methods. The team wanted to see if the new methods better protect patient privacy but also make sure data remain useful for research.

To access the methods and software, please visit the AIMS Group at Emory University.

Xiong, Li. Building Data Registries with Privacy and Confidentiality for Patient-Centered Outcomes Research (PCOR) [Methods Study], 2020. Inter-university Consortium for Political and Social Research [distributor], 2025-11-24. https://doi.org/10.3886/ICPSR39579.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-1310-07058)
Inter-university Consortium for Political and Social Research
Hide

To develop and evaluate algorithms for building patient-centered and privacy-preserving data registries. The project had 3 specific aims: (1) develop methods for establishing registries of private data, (2) develop methods for establishing registries that contain both private and consented data, and (3) develop methods for evaluating and tracking patient privacy risks and establishing data registries that take into account fine-grained patient privacy preferences.

In this study, researchers designed algorithms based on the differential privacy (DP) framework to build data registries. The DP framework requires that any statistical aggregations and computations published using patient data limit or eliminate the possibility of determining if the calculations included a particular patient's record. Researchers developed three algorithms:

  • Distance-based sampling with adaptive threshold (DSAT) for high dimensional and dynamic data
  • Differentially private frequent sequence mining (PFS2) via sampling-based candidate pruning for correlated sequential data
  • Differentially private frequent subgraph mining (DFG) for correlated graph data where correlated data are modeled as graphs representing correlations of co-occurrences of health events Researchers compared the performance of the new algorithms with existing algorithms.

Existing DP algorithms assume that patient data at different time points are independent or that temporal correlations do not risk re-identification of data. To quantify and control the privacy loss from existing DP algorithms for correlated sequential data, researchers developed the Control Temporal Privacy Leakage (ConTPL) method.

A nine-member stakeholder panel including patient privacy advocates, patients, privacy compliance officers, and biomedical informaticians gave input on the design and conduct of the study.

Simulated data created using Emory Analytic Information Warehouse, Clinical Data Warehouse for Research at University of California San Diego, Adult Data Set from US Census, MSNBC data set obtained from UCI Machine Learning Repository, and Open National Cancer Institute Database

Hide

2025-11-24

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.