Unlocking Clinical Text in Electronic Medical Records (EMR) by Query Refinement Using Both Knowledge Bases and Word Embedding [Methods Study], Ohio, 2006-2022 (ICPSR 39734)
Version Date: Mar 16, 2026 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Yungui Huang, Nationwide Children's Hospital
https://doi.org/10.3886/ICPSR39734.v1
Version V1
Summary View help for Summary
Electronic health records, or EHRs, have information about a patient's health such as test results, diagnoses, and treatments. EHRs also have clinical notes that doctors and patients can use to track goals and decisions.
Clinical notes may be useful for research or to help improve care. But it's hard to get information from these notes across large groups of patients. The notes may use different ways to describe the same thing. For example, high blood pressure may be called hypertension. Also, the notes may use abbreviations or have spelling mistakes.
In this project, the research team designed and built a search engine to make EHR notes easier to search and use for patient care and research.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Study Purpose View help for Study Purpose
(1) To design and build a search engine to extract relevant clinical text from EHRs efficiently; (2) To assess the performance of the new search engine
Study Design View help for Study Design
The research team first established a new methodological framework to efficiently search for relevant clinical text from EHRs using an original query term. To create the framework, the team used more than 66 million clinical notes documenting patient encounters at Nationwide Children's Hospital in Ohio from 2006 to 2016. The framework included possible refinements for common queries and categorized relationships between original query terms and query refinements.
Next, the research team developed a web-based interactive search engine called Query Refinement by word Embedding and Knowledge base (QREK). Given a user's input query, QREK generates a list of relevant keywords, including word variations such as formal or informal forms, synonyms, abbreviations, and misspellings, and other relevant words like related diagnoses, medications, and procedures.
The research team then assessed the performance of QREK in two ways. First, the team asked three hospital residents to conduct 11 predefined queries and assess the relevance of terms suggested by QREK. Second, the team assessed QREK's ability to recall known synonyms from 6,682 terms in the Systematized Nomenclature of Medicine (SNOMED). The team calculated the percentage of SNOMED synonyms that QREK suggested among the first 60 search results.
The research team tested and refined the QREK user interface with six clinical residents. They then implemented the final version of QREK in nine use cases at Nationwide Children's Hospital.
Patients, hospital administrators, health insurers, health information technology specialists, researchers, and clinicians provided input during the study.
Data Source View help for Data Source
More than 66 million clinical notes documenting patient encounters at Nationwide Children's Hospital from 2006 to 2016
Notes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.
