Improving Clinical Effectiveness Research (CER)/Patient-Centered Outcomes Research (PCOR) Methods for Analyzing Linked Data Sources in the Absence of Unique Identifiers [Methods Study], United States, 2011-2022 (ICPSR 39731)

Version Date: Mar 16, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Roee Gutman, Brown University

https://doi.org/10.3886/ICPSR39731.v1

Version V1

Slide tabs to view more

Researchers often combine data from different sources, such as insurance claims and health records, to get a better picture of patients' health and use of health care. Researchers use unique identifiers, like Social Security numbers, to connect patient records and make them more complete. But sometimes this approach doesn't work well, especially when records don't have much personal information. Having limited personal data can lead to errors when linking records.

In this study, the research team created new methods to link data sets with limited personal information. Then they compared the new methods with existing ones. They also applied the new methods with real patient data.

Gutman, Roee. Improving Clinical Effectiveness Research (CER)/Patient-Centered Outcomes Research (PCOR) Methods for Analyzing Linked Data Sources in the Absence of Unique Identifiers [Methods Study], United States, 2011-2022. Inter-university Consortium for Political and Social Research [distributor], 2026-03-16. https://doi.org/10.3886/ICPSR39731.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-2017C3-9241)
Inter-university Consortium for Political and Social Research
Hide

2011 -- 2022
2011-01-01 -- 2015-09-30
Hide

To develop and test new methods to link two data sets and account for limited patient identifiers

To improve linkage accuracy when few identifiers are available, the research team developed two new Bayesian record linkage algorithms:

  • Bayesian Record Linkage with Variable in One File (BRLVOF), an algorithm to link two data sets that considers non-linking variables in one data set and uses relationships between non-linking variables from each data set
  • Multilayer Bayesian Record Linkage (MLBRL), an algorithm that links data sources by simultaneously accounting for patient identifiers and grouping entities in the data set, such as the provider for a group of patients

The research team conducted simulation analyses to compare the new methods with existing record linkage methods under different scenarios, such as varying error levels for linking variables and model misspecification.

The research team then applied the methods to link the National Trauma Data Bank (NTDB) data set to Medicare claims data for patients who went to inpatient care facilities after a traumatic brain injury. The team examined the linked data set to identify factors associated with recovery outcomes. Clinicians provided input during the study.

Simulated data sets with gender, ZIP code, and date of birth NTDB and Medicare claims data; Medicare beneficiaries ages 66 and older who were hospitalized following a traumatic brain injury and admitted to an inpatient rehabilitation facility between January 1, 2011, and September 30, 2015

Hide

2026-03-16

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.