Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018 (ICPSR 39528)
Version Date: Oct 9, 2025 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Manisha Desai, Stanford University
https://doi.org/10.3886/ICPSR39528.v1
Version V1
Summary View help for Summary
Researchers can use data from health registries or electronic health records to compare two or more treatments. Registries store data about patients with a specific health problem. These data include how well those patients respond to treatments and information about patient traits, such as age, weight, or blood pressure. But sometimes data about patient traits are missing.
Missing data about patient traits can lead to incorrect study results, especially when traits change over time. For example, weight can change over time, and the patient may not report their weight at some points along the way. Researchers use statistical methods to fill in these missing data.
In this study, the research team compared a new statistical method to fill in missing data with traditional methods. Traditional methods remove patients with missing data or fill in each missing number with a single estimate. The new method creates multiple possible estimates to fill in each missing number.
To access the methods, software, and R package, please visit the SimulateCER GitHub and SimTimeVar CRAN website.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Distributor(s) View help for Distributor(s)
Study Purpose View help for Study Purpose
To evaluate statistical approaches for handling missing data in longitudinal studies of comparative effectiveness research through the following goals: (1) the creation of a tool to simulate studies observed in CER for method evaluation, (2) how to perform MI of derived predictors such as interactions, (3) how to perform MI of outcomes derived from repeated measures, and (4) performances of MI and commonly applied approaches when describing relationships between time-varying covariates with and without missing values and a right-censored outcome.
Study Design View help for Study Design
The research team conducted 3 simulation studies to address key questions in which they assessed performance using metrics including mean squared error, bias, and standard errors. To investigate imputation of interactions, the team evaluated active and passive MI strategies, in which active involves imputing the interaction term as if it were any other variable and passive involves deriving--and not imputing--the term only after imputing the main effects (ie, by simply taking the product of the main effects). They evaluated these approaches under the joint modeling (JM) approach, in which a joint parametric distribution is assumed for the imputation model, and the fully conditional specification (FCS) approach, in which specification of a joint model is bypassed, and, instead, conditional models for each variable are assumed. The team assessed similar approaches when addressing the imputation of an outcome, rate of change, when a 2-stage linear model was employed. Finally, the team investigated commonly applied methods including complete case (CC) and single imputation (SI) for handling missing time_x0002_varying covariates and MI that ignores the clustered data structure (MI Naïve). For the latter, they established a comprehensive R package to simulate data including time-varying covariates with a complex correlation structure to represent realistic CER studies.
Data Source View help for Data Source
Simulated data that resemble the complexity of an empirical study to identify antiretroviral therapies associated with increased risk of cardiovascular disease among patients with HIV
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
