Extracting Gene-Disease Relations from Text to Support Biomarker Discovery

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Authors:
  • Paul Thompson
  • Sophia Ananiadou

Abstract

The biomedical literature constitutes a rich source of evidence to
support the discovery of biomarkers. However, locating evidence
in huge volumes of text can be difficult, as typical keyword
queries cannot account for the meaning and structure of text.
Text mining (TM) methods carry out automated semantic
analysis of documents, to facilitate structured searching that can
more precisely match users’ information needs. We describe our
TM approach to the detection of sentence-level associations
between genes and diseases, as a first step towards developing a
sophisticated search system targeted at locating biomarker
evidence in the literature. We vary the sophistication of our
detection methodology according to sentence complexity, using
either co-occurring mentions of genes and diseases, or linguistic
patterns obtained using evidence from approximately 1 million
biomedical abstracts. We demonstrate that this method can
detect associations more successfully than applying a single
technique, with an accuracy that compares highly favourably to
related efforts. We also show that the identified relations can
complement those detected using alternative approaches.

Bibliographical metadata

Original languageEnglish
Title of host publicationDigital Health 2017: Global Public Health, Personalised Medicine, and Emergency Medicine in the Age of Big Data
PublisherACM Digital Library
StateAccepted/In press - 6 Apr 2017