Improving Clinical Effectiveness Research (CER)/Patient-Centered Outcomes Research (PCOR) Methods for Analyzing Linked Data Sources in the Absence of Unique Identifiers [Methods Study], United States, 2011-2022 (ICPSR 39731)
Version Date: Mar 16, 2026 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Roee Gutman, Brown University
https://doi.org/10.3886/ICPSR39731.v1
Version V1
Summary View help for Summary
Researchers often combine data from different sources, such as insurance claims and health records, to get a better picture of patients' health and use of health care. Researchers use unique identifiers, like Social Security numbers, to connect patient records and make them more complete. But sometimes this approach doesn't work well, especially when records don't have much personal information. Having limited personal data can lead to errors when linking records.
In this study, the research team created new methods to link data sets with limited personal information. Then they compared the new methods with existing ones. They also applied the new methods with real patient data.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Study Purpose View help for Study Purpose
To develop and test new methods to link two data sets and account for limited patient identifiers
Study Design View help for Study Design
To improve linkage accuracy when few identifiers are available, the research team developed two new Bayesian record linkage algorithms:
- Bayesian Record Linkage with Variable in One File (BRLVOF), an algorithm to link two data sets that considers non-linking variables in one data set and uses relationships between non-linking variables from each data set
- Multilayer Bayesian Record Linkage (MLBRL), an algorithm that links data sources by simultaneously accounting for patient identifiers and grouping entities in the data set, such as the provider for a group of patients
The research team conducted simulation analyses to compare the new methods with existing record linkage methods under different scenarios, such as varying error levels for linking variables and model misspecification.
The research team then applied the methods to link the National Trauma Data Bank (NTDB) data set to Medicare claims data for patients who went to inpatient care facilities after a traumatic brain injury. The team examined the linked data set to identify factors associated with recovery outcomes. Clinicians provided input during the study.
Data Source View help for Data Source
Simulated data sets with gender, ZIP code, and date of birth NTDB and Medicare claims data; Medicare beneficiaries ages 66 and older who were hospitalized following a traumatic brain injury and admitted to an inpatient rehabilitation facility between January 1, 2011, and September 30, 2015
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.
