Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records [Methods Study], 2016-2021 (ICPSR 39724)
Researchers can use data from electronic health records, or EHRs, in studies that compare two or more treatments. In these studies, researchers need to identify all patients with the same phenotype. Phenotypes are a person's known traits, like height and weight, or known health problems, like diabetes. However, in EHR data, some data on patient traits or health problems may be missing for some patients.
Missing data in EHRs make it hard to correctly identify all patients with the same phenotype. It's even harder when data are missing due to a patient's health status. For example, patients with uncontrolled diabetes may need more lab tests than patients with controlled diabetes. As a result, researchers who are looking at lab tests may not identify patients with controlled diabetes as having diabetes.
In this project, the research team developed and tested a new statistical method that accounts for missing EHR data to estimate patient phenotypes.
To access the methods and software, please visit the bias_correction GitHub repository.