Improving Causal Inference Methods via Statistical Learning with High-Dimensional Data [Methods Study], 2016-2021 (ICPSR 39713)

Version Date: Mar 12, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Zhiqiang Tan, Rutgers University

https://doi.org/10.3886/ICPSR39713.v1

Version V1

Slide tabs to view more

A randomized controlled trial, or RCT, is often the best way to learn if one treatment works better than another. RCTs assign patients to different treatments by chance. But RCTs are not always feasible. In such cases, researchers can use observational studies. In observational studies, researchers look at what happens when patients and their doctors choose the treatments. Traits such as age, gender, or health status may affect treatment choices. These traits may also affect patients' health, making it hard to know if changes in patients' health are due to treatment or to patient traits.

To figure out whether changes in patients' health result from treatment or something else, researchers use statistical methods. Two of these methods are:

  • Propensity score, or PS. PS methods compare the health of patients who have similar measured traits but received different treatments. These traits are in patient health records.
  • Instrumental variable, or IV. IV methods account for things that may affect treatment choice and patients' health but aren't in the patients' health records, such as personal preference about treatment.

But existing PS and IV methods don't work well when data sets include a lot of traits and health conditions for each patient. Such data sets are called high-dimensional data. In this study, the research team created and tested one PS method and one IV method for use with high-dimensional data.

Tan, Zhiqiang. Improving Causal Inference Methods via Statistical Learning with High-Dimensional Data [Methods Study], 2016-2021. Inter-university Consortium for Political and Social Research [distributor], 2026-03-12. https://doi.org/10.3886/ICPSR39713.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-1511-32740)
Inter-university Consortium for Political and Social Research
Hide

2016 -- 2021
Hide

To develop and test a new set of PS and IV methods that account for model misspecification when estimating causal effects of treatments using high-dimensional data

The research team developed a PS method and an IV method for use with high-dimensional data that account for model misspecification. The PS method estimates treatment effects in the absence of unmeasured confounders. The IV method estimates treatment effects when the data do not include all confounders.

The research team compared the new and existing methods using simulation and empirical analyses with varying degrees of model misspecification. To empirically test the new PS method, the team used data from a medical study about the effects of right heart catheterization. The team tested the IV method with survey data to estimate the causal effect of education on earnings.

Data from: Connors AF, Speroff T, Dawson NV, et al. The effectiveness of right heart catheterization in the initial care of critically ill patients. DOI: 10.1001/jama.276.11.889

Hide

2026-03-12

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.