Incremental Privacy-Preserving Record Linkage (iPPRL) to Reduce Barriers to Data Sharing and Improve Data Quality [Methods Study], Colorado, 2011-2022 (ICPSR 39738)

Version Date: Mar 23, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Toan Ong, University of Colorado Anschutz Medical Campus

https://doi.org/10.3886/ICPSR39738.v1

Version V1

Slide tabs to view more

Researchers often have trouble collecting complete information on patient health, as patients may receive care at different places. Linking patient records from different places may help researchers get a more complete picture.

One way to link records is through personal information, such as names and birth dates. But this method increases risks to patient privacy. Another way, known as privacy-preserving record linkage, or PPRL, masks personal information. But current PPRL methods only work when linking entire sets of patient data, including data that have already been shared and linked. Linking entire data sets takes a long time. Also, sharing the same records multiple times increases data privacy risks.

In this study, the research team developed and tested a new PPRL method called incremental PPRL. This method links only new or updated data rather than re-linking entire data sets.

Ong, Toan. Incremental Privacy-Preserving Record Linkage (iPPRL) to Reduce Barriers to Data Sharing and Improve Data Quality [Methods Study], Colorado, 2011-2022. Inter-university Consortium for Political and Social Research [distributor], 2026-03-23. https://doi.org/10.3886/ICPSR39738.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-2018C1-11287)
Inter-university Consortium for Political and Social Research
Hide

2011 -- 2022
2011 -- 2013
Hide

(1) To develop and implement a novel iPPRL method; (2) To compare iPPRL with existing linkage methods and validate its accuracy and effectiveness

The research team extended existing PPRL methods to develop a new iPPRL method. The method successively linked incremental data sets to an initial data set; linkage ended when no new data could be added. The team applied the iPPRL method to a simulated data set containing 115,000 records that mimicked real-world data quality issues.

Then, using real patient data, the research team compared the performance of the iPPRL method with two existing methods which require re-linking whole data sets. The team first linked data from five health systems in the Colorado Congenital Heart Disease registry. They manually reviewed the linked records to create a reference data set containing 4,940 linked records. Next the team linked the same records using the iPPRL method and the two existing methods. They compared the linkage results from the iPPRL and existing methods with the reference data set.

Patients, a patient representative, and researchers provided input throughout the study.

A simulated data set with 115,000 records Colorado Congenital Heart Disease registry data from 2011-2013 for 4,940 patients ages 11-64 during

Hide

2026-03-23

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.

pcodr logo

This study is maintained and distributed by the Patient-Centered Outcomes Data Repository (PCODR). PCODR is the official data repository of the Patient-Centered Outcomes Research Initiative (PCORI).