Developing and Testing New Methods for Estimating Treatment Effectiveness in Observational Studies Using High-Dimensional Data [Methods Study], 2023 (ICPSR 39090)

Version Date: Apr 18, 2024 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Zhiqiang Tan, Rutgers University

https://doi.org/10.3886/ICPSR39090.v1

Version V1

Slide tabs to view more

Propensity scores (PS) and instrumental variables (IV) are methods used to assess treatment effects in observational studies when randomized controlled trials (RCTs) are not feasible. However, these methods have limitations, especially when using high-dimensional data, or data with numerous variables or many non-linear and interaction terms. Choices on which variables and non-linear and interaction terms to include may lead to model misspecification. The objective of this study was to develop and test a set of PS and IV methods that account for model misspecification when estimating causal effects of treatments using high-dimensional data.

First, the research team created the two new methods for use with high-dimensional data. The team then used a computer program to create test data that look like real patient data. The team applied the new methods to the test data. Next, the research team applied the new methods to real data from previous studies. They applied the PS method to data from Connors et al. (1996) and applied the IV method to data used by Card (1995). Using both test and real data, the research team compared findings from the new methods with those from existing PS and IV methods and checked to see if findings from the new methods were accurate when including different patient traits and health conditions in the analysis.

This collection contains the R software package RCAL and accompanying documentation. The package source as a .tar.gz file and six different versions are available in a zipped package. Files have been released as received by ICPSR from the depositor:

  • For R version 4.2, created April 24, 2022 (Windows, r-oldrel)
  • For R version 4.3, created October 20, 2023 (Windows, r-release)
  • For R version 4.4, created March 14, 2024 (Windows, r-devel)
  • For R version 4.2, created April 1, 2023 (Mac, arm64, r-oldrel)
  • For R version 4.3, created April 6, 2023 (Mac, arm64, r-release)
  • For R version 4.3, created April 11, 2023 (Mac, x86_64, r-release)

Tan, Zhiqiang. Developing and Testing New Methods for Estimating Treatment Effectiveness in Observational Studies Using High-Dimensional Data [Methods Study], 2023. Inter-university Consortium for Political and Social Research [distributor], 2024-04-18. https://doi.org/10.3886/ICPSR39090.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (ME-1511-32740)
Inter-university Consortium for Political and Social Research
Hide

The purpose of this research was to develop and evaluate methods using propensity scores and instrumental variables that account for model misspecification when estimating treatment effects using high-dimensional data. The specific aims were:

  1. Develop new statistical theory and methods for estimating propensity scores and for drawing inference about treatment effects from observational data, with possibly a large number of covariates or regressors.
  2. Develop new statistical theory and methods using instrumental variables about treatment effects from observational data, with possibly a large number of covariates or regressors.
  3. Develop and disseminate user-friendly software, including accessible and transparent documentation, for implementation of the new methods.

The research team first derived numerical algorithms to implement the proposed methods: regularized calibrated estimation for estimating propensity scores in high-dimensional data, and model-assisted inference about average treatment effects (ATE) in propensity scores and instrumental variables methods. The team then tested the methods with both simulated and real observational data from previous studies. The test data for propensity scores methods came from Connors et al. (1996), while the test data for instrumental variables methods came from the National Longitudinal Survey (NLS) of Young Men as used by Card (1995). Finally, the team compared the accuracy of their developed methods to that of existing propensity scores and instrumental variables methods. The proposed methods were implemented in the R software package RCAL. Please refer to the study vignettes and final report for technical details.

Connors, Jr., Alfred F., et al. "The Effectiveness of Right Heart Catheterization in the Initial Care of Critically III Patients." JAMA. 1996;276(11):889-897. [data used for methods evaluation]

National Longitudinal Survey (NLS) of Young Men as used in Card, David (1995): "Using Geographic Variation in College Proximity to Estimate the Return to Schooling," in Aspects of Labour Market Behavior: Essays in Honour of John Vanderkamp, ed. by Louis N. Christofides, E. Kenneth Grant, and Robert Swidinsky. Toronto: University of Toronto Press, 201-222. [data used for methods evaluation]

Hide

2024-04-18

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

pcodr logo

This study is maintained and distributed by the Patient-Centered Outcomes Data Repository (PCODR). PCODR is the official data repository of the Patient-Centered Outcomes Research Initiative (PCORI).