Natural Language Processing (NLP) for Medication Adherence: Complex Semantics and Negation [Methods Study], United States, 2015-2022 (ICPSR 39736)

Version Date: Mar 23, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Kirk Roberts, University of Texas Health Science Center at Houston

https://doi.org/10.3886/ICPSR39736.v1

Version V1

Slide tabs to view more

Clinical notes in electronic health records, or EHRs, can help researchers study treatments. For example, EHR notes may contain information about whether patients take their medicines as directed. But it takes researchers a lot of time to find this information.

Natural language processing, or NLP, methods can help researchers find information in EHR notes. With NLP, computer programs read and identify written language to make it easier to sort and study. But current NLP methods don't work well to find and label text about medicine use.

In this study, the research team created and tested a new NLP method to find and label EHR notes on patients' medicine use.

Roberts, Kirk. Natural Language Processing (NLP) for Medication Adherence: Complex Semantics and Negation [Methods Study], United States,  2015-2022. Inter-university Consortium for Political and Social Research [distributor], 2026-03-23. https://doi.org/10.3886/ICPSR39736.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-2018C1-10963)
Inter-university Consortium for Political and Social Research
Hide

2015 -- 2022
Hide

To develop and evaluate a new NLP method for studying medication adherence using EHR data

First, the research team developed a new NLP method for identifying, extracting, and categorizing text on medication adherence from clinical notes. Using 2,250 manually transcribed sentences from EHR notes, the team trained the NLP method to identify and extract text about medication adherence; categorize adherence as full, partial, or nonadherence; and identify reasons for nonadherence. Then the team tested the NLP method using 600 different transcribed sentences.

Next, the research team evaluated the new NLP method in three experiments. In the first experiment, the team compared the NLP method's output with the standard method of measuring medication adherence using prescription claims data. Second, the team used the method to analyze notes from psychiatric evaluations to study the correlation between nonadherence and psychiatric readmissions. Third, the team validated the method with an external EHR data source and tested its performance in the Medical Information Mart for Intensive Care (MIMIC-III) data set. The data included 1,018 sentences extracted from EHR notes for 923 patients. The team calculated the micro-FI score to measure the accuracy of the NLP method. Patients, clinicians, a caregiver, and a patient advocate helped design the study.

The new NLP method had a high level of accuracy (micro-F1 score=0.82) in identifying and categorizing notes on medication adherence. In the experiments, outputs from the NLP method helped the research team identify text related to medication nonadherence. The NLP method found that only one-third of patients with prescription claims data had documented statements of full adherence in EHR notes. Among patients admitted to the psychiatric center, full adherence decreased as psychiatric readmissions increased. Using the MIMIC-III data, the NLP method accurately identified medication adherence, with a micro-F1 score of 0.86.

NLP method evaluation:

  • EHRs for patients in the primary care and psychiatric inpatient settings at UTHealth Houston Harris County Psychiatric Center
  • Prescription claims from UT Physicians Allscripts data from 2015 to 2017 for 70,138 patients with matching clinical notes
  • 321,160 admission-related clinical notes from UTHealth Houston Harris County Psychiatric Center for 45,932 patients
  • MIMIC-III data set with EHR data for 923 patients

Hide

2026-03-23

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.