Statistical Methods and Designs for Addressing Correlated Errors in Outcomes and Covariates in Studies Using Electronic Health Records Data [Methods Study], Tennessee, 2016-2021 (ICPSR 39726)

Version Date: Mar 12, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Bryan E. Shepherd, Vanderbilt University Medical Center

https://doi.org/10.3886/ICPSR39726.v1

Version V1

Slide tabs to view more

Electronic health records, or EHRs, have data on patient traits, health problems, and treatments. Researchers can use EHR data to study how treatments work or which patient traits affect health outcomes. But EHR data can have errors.

The best way to get accurate EHR data is to closely review patients' original records. But reviewing all patient records isn't possible when many patients are in a study. In such cases, researchers can review and correct records for a few patients and use the revised records to adjust data for all patients. But existing methods for using revised records don't address some kinds of errors, such as errors that are related. For example, errors in a treatment starting date can lead to mistakes in the data on length of treatment.

In this project, the research team created and tested new methods to improve the accuracy of EHR data. The new methods corrected records from some patients. Then the team used the corrections to address related errors for all patients.

To access the methods and software, please visit the MeasurementErrorMethods GitHub repository.

Shepherd, Bryan E. Statistical Methods and Designs for Addressing Correlated Errors in Outcomes and Covariates in Studies Using Electronic Health Records Data [Methods Study], Tennessee, 2016-2021. Inter-university Consortium for Political and Social Research [distributor], 2026-03-12. https://doi.org/10.3886/ICPSR39726.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-1609-36207)
Inter-university Consortium for Political and Social Research
Hide

2016 -- 2021
Hide

  1. Develop novel statistical methods that reduce or eliminate bias caused by correlated errors in time-to-event outcomes and covariates, thereby addressing an important setting for which there is a lack of available methods
  2. Design optimal multiwave validation strategies, where one divides the validation sample into multiple waves and decides which records to validate in later sampling waves based on results learned from earlier sampling waves
  3. Apply the methods and designs to a study investigating the association between maternal weight gain during pregnancy and childhood health outcomes using EHR data

The research team developed methods to address correlated errors in statistical analyses of EHR data. To do this, the team first manually validated data for a subsample of patients. The team then developed four new methods using the validated data to reduce bias caused by correlated errors:

  1. Multiple imputation (MI)
  2. Regression calibration (RC)
  3. Generalized raking (Raking)
  4. Sieve maximum likelihood estimation (SMLE)

Then the team conducted simulations to compare the robustness and efficiency of the four methods and created open-source software for all four methods.

Based on the simulation results, the research team further tested the Raking method using real patient data from 10,335 mother-child pairs, of which 996 pairs were validated by chart review, looking at whether weight gain during pregnancy predicted risk for childhood obesity.

Patients, caregivers, and doctors gave input throughout the study.

Simulated data; EHR data from Vanderbilt University Medical Center on 10,335 mother-child pairs

Hide

2026-03-12

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.