Development of Computational Methods for Evaluating Doctor-Patient Communication [Methods Study], United States, 2016-2021 (ICPSR 39720)

Version Date: Mar 18, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Zac E. Imel, University of Utah

https://doi.org/10.3886/ICPSR39720.v1

Version V1

Slide tabs to view more

The way doctors communicate with patients during office visits can affect the quality of care. Studying conversations between doctors and patients can help doctors improve their communication skills.

To study conversations, researchers rely on written records, or transcripts, of office visits. They read the transcripts and give each conversation topic a label. For example, topics may include smoking or pain. But labeling topics in this way may take a lot of time.

In this project, the research team created and tested a new method to make this work easier using natural language processing, or NLP. With NLP, computer programs interpret written language. NLP methods use a process called machine learning, where computer programs use data to learn how to perform different tasks with little or no human input.

Imel, Zac E. Development of Computational Methods for Evaluating Doctor-Patient Communication [Methods Study], United States, 2016-2021. Inter-university Consortium for Political and Social Research [distributor], 2026-03-18. https://doi.org/10.3886/ICPSR39720.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-1602-34167)
Inter-university Consortium for Political and Social Research
Hide

2016 -- 2021
Hide

The specific aims were to develop and evaluate natural language processing (NLP) models that predict (1) topics of conversations and (2) emotional valence of patient-provider interactions.

To develop NLP machine learning algorithms and models, researchers first trained the NLP algorithms to label topics in patient-clinician conversations. Researchers used 279 transcripts of patient-clinician conversations from two studies that already had 36 manually assigned topic labels, such as physical examination, cigarette use, or pain. The NLP algorithms learned to associate specific words in a conversation with the manually assigned topic labels and then predict the labels for other transcripts based on those associations.

Next, researchers developed three types of NLP classification models called non-sequential, window-based, and sequential. Each type of model used a different statistical method to label topics in the transcripts.

Researchers then evaluated the models' accuracy in labeling topics compared with the manually assigned labels in the same transcripts. They also compared each model's topic labels with a baseline NLP model that labeled the most common topics.

A patient advisory board provided input throughout the study, including how to explain complex research methods in a clear way.

Transcripts of audio recordings of patient-provider interactions from the Mental Health Discussion (MHD) study and transcripts of video recordings from the Assessment of Doctor-Elderly Patient Encounters (ADEPT) study

Hide

2026-03-18

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.