AUTOMATIC IDENTIFICATION OF TEXTUAL UNCERTAINTY

UoM administered thesis: Phd

  • Authors:
  • Chrysoula Zerva

Abstract

The exponential increase in published research progressively perplexes the navigation of existing literature and the search of specific information for researchers, rendering the incorporation of new knowledge increasingly difficult. Text mining, can aid in literature exploration, by processing vast document collections to extract and organise information of interest. This is of particular importance in the biomedical domain, where text mining methods can extract mentions of bio-molecular reactions and automatically incorporate them in pathway and interaction networks, thus contributing to their timely curation and maintenance. However, current methods tend to ignore the context of extracted interaction mentions, and treat them all as equally certain, overlooking speculative statements, hypotheses and admission of ignorance. To address this problem, we investigate the use of textual uncertainty in biomedical literature, and propose novel methods to identify the (un)certainty value of extracted statements. We study to which extent, such values, representing the confidence of the author in a statement (and thus the inferred certainty of the statement itself), can be used to provide a more informative weighting of extracted knowledge. Focusing on the biomedical use case, we propose an approach to accurately identify uncertainty values for the mentions of interaction identified in different documents. We subsequently use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction, and obtain a consolidated confidence score. Throughout this work, we validated the output of our methods against the judgement of researchers in bio-medicine. We thus confirmed that our methodology for inferring an overall interaction score can approximate well the scores attributed by researchers. We demonstrate the usability of textual uncertainty in the biomedical context, by integrating it as a confidence filter in a pilot interactive interface, providing literature-aided pathway visualisation. We thus illustrate, that, along with other literature-based confidence filters, textual uncertainty can help researchers explore and discover interactions of interest. % The aim of the thesis is to investigate the use of uncertainty in written language, with an emphasis in scientific writing. The thesis explores the practical aspects of assessing uncertainty of extracted statements and ranking information accordingly, as well as the theoretical foundations and linguistic patterns of expressed uncertainty in text. It is shown that automated uncertainty identification can prove to be a valuable tool in attempting to extract and process vast amounts of information from raw text, by enabling more accurate and targeted acquisition and integration of new knowledge.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2019