Mining Negation and Uncertainty in Social Healthcare Networks

UoM administered thesis: Phd

  • Authors:
  • Rosyzie Anna Awg Hj Md Apong


More Internet users, particularly patients are seeking medical information from others through healthcare social networks (HSNs). Extensive amount of online comments by patients (or patient generated data (PGD)) can be a potential source for other medical research. Novel studies and findings have been achieved by the researcher in processing natural language from healthcare-related texts in HSNs. However, one aspect of PGD from social media that has not been widely studied are negations and uncertainty of medical concepts. Here, we introduced a social healthcare miner (SchMiner), a framework in identifying negation and uncertainty of medical concepts. In this thesis, we analysed and presented a novel conceptual model for negation expression in PGD. The model introduced and defined complete and incomplete negations of medical concepts. We also studied the types of negation modifiers that negates medical concepts into complete and incomplete classes. Negation identifier was trained using machine-learning algorithm that rely on engineered lexical, syntactic and semantic domain specific features. We applied scope-based approach in extracting the features at sentence level. The negation classifier trained with support vector machine algorithm achieved 90% accuracy and 84% F-Measure score. In addition, we also conducted an analysis of uncertainty in medical entities in PGD and mapped an existing state-of-the-art uncertainty model for medical entities. Identification of uncertainty from PGD in recent studies only focused on classification of sentence and document level. Specific work on identification of uncertainty at conceptual level is still very limited. Hence in this thesis, we implement a social healthcare uncertainty identifier (SchTnty). The identifier were modelled using machine-learning algorithm trained using scoped-based engineered features. The identifier achieved 83% accuracy, 65% F-Measure with 68% sensitivity and 88% specificity score performance on support vector machine algorithm. The framework provides state-of-the-art method that can be used to support other research work related to patient generated data from social media specifically.


Original languageEnglish
Awarding Institution
Award date1 Aug 2018