Linking Clinical Records to the Biomedical Literature

UoM administered thesis: Phd

  • Authors:
  • Noha Alnazzawi


Narrative information in Electronic Health Records (EHRs) contains a wealth of clinical information about treatments, diagnosis, medication and family history. In addition, the scientific literature represents a rich source of information that summarises the latest results and new research findings relevant to different diseases. These two textual sources often contain different types of valuable phenotypic information that may be complementary to each other. Combining details from each source thus has the potential to be useful in uncovering new disease-phenotypic associations. In turn, these associations can help to identify patients with high risk factors, and they can be useful in developing solutions to control the causes responsible for the development of different diseases. However, clinicians at the point of care have limited time to review the large volume of potentially useful information that is locked away in unstructured text format. This in turn limits the utility of this "raw" information to clinical practitioners and computerised applications. Accordingly, the provision of automated and efficient means to extract, combine and present phenotype information that may be scattered amongst a large number of different textual sources in an easily digestible format is a prerequisite to the effective use and comprehensive understanding of details contained within both the records and the literature. The development of such facilities can in turn help in deriving information about disease correlations and supporting clinical decisions. This thesis is the first comprehensive study focussing on extracting and integrating phenotypic information from two different biomedical sources using Text Mining (TM) techniques. In this research, we describe our work on (1) extracting phenotypic information from both EHRs and the biomedical literature; (2) extracting the relations between phenotypic information and distilling them from EHRs using an event-based approach; and (3) using normalisation methods to link the phenotypic information found in EHRs with associated mentions found in the literature as a first step towards the automatic integration of information from these heterogeneous sources.


Original languageEnglish
Awarding Institution
Award date31 Dec 2016