HYPHEN: A flexible, hybrid method to map phenotype concept mentions to terminological resources

Research output: Contribution to journalArticle

Abstract

Narrative clinical records and biomedical articles constitute rich sources of information about phenotypes, i.e., markers distinguishing individuals with specific medical conditions from the general population. Phenotypes help clinicians to provide personalised treatments. However, locating information about them within huge document repositories is difficult, since each phenotypic concept can be mentioned in many ways. Normalisation methods automatically map divergent phrases to unique concepts in domain-specific terminologies, to allow location and linking of all mentions of a concept of interest. We have developed a hybrid normalisation method (HYPHEN) to handle concept mentions with wide ranging characteristics, across different text types. HYPHEN integrates various normalisation techniques that handle surface-level variations (e.g., differences in word order, word forms or acronyms/abbreviations) and lexical-level variations (where terms have similar meanings, but potentially unrelated forms). HYPHEN achieves robust performance for both biomedical academic text and narrative clinical records, and has the ability to significantly outperform related methods.

Bibliographical metadata

Original languageEnglish
Pages (from-to)91-121
JournalTerminology
Volume24
Issue number1
Early online date31 May 2018
DOIs
Publication statusPublished - 2018

Related information