Development of Computational Methods for Evaluating Doctor-Patient Communication [Methods Study], United States, 2016-2021 (ICPSR 39720)
Version Date: Mar 18, 2026 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Zac E. Imel, University of Utah
https://doi.org/10.3886/ICPSR39720.v1
Version V1
Summary View help for Summary
The way doctors communicate with patients during office visits can affect the quality of care. Studying conversations between doctors and patients can help doctors improve their communication skills.
To study conversations, researchers rely on written records, or transcripts, of office visits. They read the transcripts and give each conversation topic a label. For example, topics may include smoking or pain. But labeling topics in this way may take a lot of time.
In this project, the research team created and tested a new method to make this work easier using natural language processing, or NLP. With NLP, computer programs interpret written language. NLP methods use a process called machine learning, where computer programs use data to learn how to perform different tasks with little or no human input.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Study Purpose View help for Study Purpose
The specific aims were to develop and evaluate natural language processing (NLP) models that predict (1) topics of conversations and (2) emotional valence of patient-provider interactions.
Study Design View help for Study Design
To develop NLP machine learning algorithms and models, researchers first trained the NLP algorithms to label topics in patient-clinician conversations. Researchers used 279 transcripts of patient-clinician conversations from two studies that already had 36 manually assigned topic labels, such as physical examination, cigarette use, or pain. The NLP algorithms learned to associate specific words in a conversation with the manually assigned topic labels and then predict the labels for other transcripts based on those associations.
Next, researchers developed three types of NLP classification models called non-sequential, window-based, and sequential. Each type of model used a different statistical method to label topics in the transcripts.
Researchers then evaluated the models' accuracy in labeling topics compared with the manually assigned labels in the same transcripts. They also compared each model's topic labels with a baseline NLP model that labeled the most common topics.
A patient advisory board provided input throughout the study, including how to explain complex research methods in a clear way.
Data Source View help for Data Source
Transcripts of audio recordings of patient-provider interactions from the Mental Health Discussion (MHD) study and transcripts of video recordings from the Assessment of Doctor-Elderly Patient Encounters (ADEPT) study
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.
