Privacy-Preserving Analytic and Data-Sharing Methods for Clinical and Patient-Powered Data Networks [Methods Study], California, Colorado, and Washington, 2014-2018 (ICPSR 39563)

Version Date: Nov 18, 2025 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Sengwee Toh, Harvard Pilgrim Health Care

https://doi.org/10.3886/ICPSR39563.v1

Version V1

Slide tabs to view more

Sometimes a study can get better results using data from different sites. In these cases, researchers may want to share patient data, including personal and private information such as dates of birth and addresses. However, researchers may not want to share data across sites because of worries about patient privacy. Some statistical methods can change patients' sensitive individual data into summary data that hides individuals' personal information. These privacy-protecting methods, or PPMs, make it safe to share data across sites. But researchers don't know if PPMs produce accurate results.

In this study, the research team compared combinations of PPMs with methods that use patients' individual data.

To access the methods, software, and R package, please visit the distributed GitHub.

Toh, Sengwee. Privacy-Preserving Analytic and Data-Sharing Methods for Clinical and Patient-Powered Data Networks [Methods Study], California, Colorado, and Washington, 2014-2018. Inter-university Consortium for Political and Social Research [distributor], 2025-11-18. https://doi.org/10.3886/ICPSR39563.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-1403-11305)
Inter-university Consortium for Political and Social Research
Hide

2014 -- 2018
Hide

This project aimed to (1) assess stakeholders' understanding of and preference for privacy-preserving analytic and data-sharing methods, and assess the benefits and limitations of implementing them in multisite PCOR studies; (2) develop or enhance a suite of privacy-preserving methods to perform rigorous analysis without sharing individual-level data; and (3) create freely available dissemination tools, including analytic code, educational materials, technical documentation, and user guides for these methods.

In this study, the research team developed a set of privacy-preserving analytic methods made up of data sharing methods, confounding scores, and confounding adjustment approaches. The team compared the performance of privacy-preserving methods that use summary-level data with those using individual-level data.

The research team conducted multiple simulations to generate individual-level data that resembled patient data from an observational study. Next, the team transformed the individual-level simulated data to create three different summary-level data sets representing increasing levels of summarization: risk-set data, summary-table data, and effect-estimate data.

The research team then analyzed each data set using different privacy-preserving methods to estimate treatment effects on health outcomes. To control for confounding in the analyses, the team applied confounder summary scores such as propensity scores (PSs) and disease risk scores (DRSs). The team incorporated these scores into the analyses for confounding adjustment via matching, stratification, or weighting. To evaluate the performance of the various privacy-preserving methods on survival outcomes, the team compared the treatment effect estimates with the true treatment effect used to generate the simulated data.

The research team also tested the performance of the privacy-preserving methods using real data from two comparative effectiveness studies on obesity and rheumatoid arthritis from three sites within a clinical data research network. The team applied the same analytic methods used with the simulated data.

To design and implement this study, the research team worked with patients, health system administrators, other researchers, and experts in governance and regulatory compliance.

Patients from integrated delivery systems - Kaiser Permanente Colorado, Kaiser Permanente Northern California, and Kaiser Permanente Washington

Simulated data to resemble comparative effectiveness research data, empirical data from the administrative and clinical databases of Kaiser Permanente and Strategic Partners Patient Outcomes Research to Advance Learning (PORTAL) network.

Hide

2025-11-18

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.

pcodr logo

This study is maintained and distributed by the Patient-Centered Outcomes Data Repository (PCODR). PCODR is the official data repository of the Patient-Centered Outcomes Research Initiative (PCORI).