Unlocking Clinical Text in Electronic Medical Records (EMR) by Query Refinement Using Both Knowledge Bases and Word Embedding [Methods Study], Ohio, 2006-2022 (ICPSR 39734)

Version Date: Mar 16, 2026 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Yungui Huang, Nationwide Children's Hospital

https://doi.org/10.3886/ICPSR39734.v1

Version V1

Slide tabs to view more

Electronic health records, or EHRs, have information about a patient's health such as test results, diagnoses, and treatments. EHRs also have clinical notes that doctors and patients can use to track goals and decisions.

Clinical notes may be useful for research or to help improve care. But it's hard to get information from these notes across large groups of patients. The notes may use different ways to describe the same thing. For example, high blood pressure may be called hypertension. Also, the notes may use abbreviations or have spelling mistakes.

In this project, the research team designed and built a search engine to make EHR notes easier to search and use for patient care and research.

Huang, Yungui. Unlocking Clinical Text in Electronic Medical Records (EMR) by Query Refinement Using Both Knowledge Bases and Word Embedding [Methods Study], Ohio, 2006-2022. Inter-university Consortium for Political and Social Research [distributor], 2026-03-16. https://doi.org/10.3886/ICPSR39734.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Patient-Centered Outcomes Research Institute (PCORI) (ME-2017C1-6413)
Inter-university Consortium for Political and Social Research
Hide

2006 -- 2022
2006 -- 2016
Hide

(1) To design and build a search engine to extract relevant clinical text from EHRs efficiently; (2) To assess the performance of the new search engine

The research team first established a new methodological framework to efficiently search for relevant clinical text from EHRs using an original query term. To create the framework, the team used more than 66 million clinical notes documenting patient encounters at Nationwide Children's Hospital in Ohio from 2006 to 2016. The framework included possible refinements for common queries and categorized relationships between original query terms and query refinements.

Next, the research team developed a web-based interactive search engine called Query Refinement by word Embedding and Knowledge base (QREK). Given a user's input query, QREK generates a list of relevant keywords, including word variations such as formal or informal forms, synonyms, abbreviations, and misspellings, and other relevant words like related diagnoses, medications, and procedures.

The research team then assessed the performance of QREK in two ways. First, the team asked three hospital residents to conduct 11 predefined queries and assess the relevance of terms suggested by QREK. Second, the team assessed QREK's ability to recall known synonyms from 6,682 terms in the Systematized Nomenclature of Medicine (SNOMED). The team calculated the percentage of SNOMED synonyms that QREK suggested among the first 60 search results.

The research team tested and refined the QREK user interface with six clinical residents. They then implemented the final version of QREK in nine use cases at Nationwide Children's Hospital.

Patients, hospital administrators, health insurers, health information technology specialists, researchers, and clinicians provided input during the study.

More than 66 million clinical notes documenting patient encounters at Nationwide Children's Hospital from 2006 to 2016

Hide

2026-03-16

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.