UoM administered thesis: Master of Philosophy

  • Authors:
  • Yuanhan Mo


Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies have shown that the use of machine learning and text mining methods to automatically identify relevant studies have the potential to drastically decrease the workload involved in the screening phase. The vast majority of available machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). This thesis explores the use of topic modelling methods to derive a more informative representation of studies. Latent Dirichlet Allocation (LDA) is applied, an unsupervised topic-modelling approach, to identify topics from a collection of studies. Then each study is represented as a distribution of LDA-topics. Additionally, Topics derived by LDA are enriched with technical multi-word terms identified by an automatic term recognition (ATR) tool. For experimentation, SVM-based classifiers are applied using either the topic-based or the BOWrepresentation to automatically identify relevant studies.The results obtained show that the SVM classifier is able to identify more relevant studies when using the LDA representation than the BOW representation. Moreover, this study demonstrates that kernel functions used in SVM obtain a superior performance when using LDA feature representations. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain


Original languageEnglish
Awarding Institution
Award date1 Aug 2016