Prof Sophia Ananiadou

Chair in Computer Science

Full contact details
View graph of relations

Research interests

Prof. Ananiadou's main contributions are in the area of text mining and natural language processing. She is director of The National Centre for Text Mining (NaCTeM) the first publicly funded centre of its type in the world. She played a leading role in the creation of the centre,  negotiating  strategic links with industry (e.g. Pacific Life, Unitis, Pfizer, AZ, Unilever, Atypon, Microsoft, IBM, Elsevier and Nature) which has helped to guide NaCTeM towards its current status of a fully sustainable centre that engages with both academia and industry.

Her research covers a diverse range of areas, including terminology, information extraction, document classification and clustering, sentiment analysis, query expansion and resource building. These have been applied in a number of domains, such as biology, medicine, health and social sciences.

Her research has resulted in my publication of over 350 peer-reviewed publications in journals and conferences. My H-Index stands at 51 (Google Scholar) with over 10,000 citations, placing her in the top 30 most cited scholars worldwide in text mining.

Important research outcomes have included the following:

  • Computational terminology led to the development of the C-value method for automatic term recognition, which has been adopted as a standard for automatic term extraction internationally.  C-value has been packaged into the widely-used TerMine service.
  • Work on acronym recognition has led to the development of the popular Acromine service.
  • Work on direct and indirect association mining resulted in the deeopment of the FACTA+ service for which I was interviewed by New Scientist
  • A major outcome of the AHRC Big Data project Mining the History of Medicine is the History of Medicine search engine, which provides semantically-enhanced search over the archives of the British Medical Journal (1840-present day) and the London-area Medical Officer of Health reports (1848-1972). This work featured in the Lancet.
  • Collaboration with the National Institute for Health and Care Excellence (NICE) supported the development of a widely used tool, RobotAnalyst supporting search and screening of systematic reviews based on novel machine learning and text mining methods. RobotAnalyst is currently used by NICE, Cochrane teams and over 40 research institutes worldwide.
  • Leading role in the development of interoperable text mining platforms to support research in several areas: the U-Compare and Argo platforms received funding from IBM, EU, BBSRC, AHRC, JISC and DARPA. Leading role in the interoperable platform funded by H2020 OpenMinTed.
  • Role in making the University of Manchester a Hub in the UK of the META-NET Network of Excellence,  forging the Multilingual Europe Technology
  • Closely involved with teams of psychology and mental health to provide text analytics for mental health onset prediction.
  • At Salford Royal,  involved with the Lung Studies team on chronic obstructive pulmonary disease (COPD), extracting phenotypic information from literature and integrating knowledge from electronic health records and the literature.

In 2010, she supervised NaCTeM in the critical assessment of information extraction in biology (BioCreAtIvE III) protein-protein interaction (PPI) challenge to achieve the best performance, in what was considered the most challenging task, the Interaction Method Task (IMT). 

In 2013, she supervised  NaCTeM in BioCreAtIvE IV. NaCTeM achieved the best results in the tasks related with the automatic recognition of chemicals and genes (Comparative Toxicogenomics Database) and Chemical and Drug Named Entity Recognition.

NaCTeM’s event extraction system, EventMine, outperformed other systems in the BioNLP 2011 and 2013 tasks (pathway curation and infectious diseases, epigenetics) in the complex task of event extraction.

In 2015, within the DARPA funded Big Cancer Mechanism project,  her team produced the top performing ‘federated system’ for reading, producing the best performing text mining solutions for extracting entities and events for cancer pathways.

A co-instigator of the launch of a Special Interest Group within ACL (SIGBioMed) in 2008, dedicated to language processing in the biological, biomedical, and clinical domain bringing together researchers in NLP, bioinformatics, medical informatics, and computational biology, providing a venue for the promotion and dissemination of original research in this area. The BioNLP workshops are hubs of evaluation of text mining technology in biomedicine via the creation of shared tasks for which she acted as co-organiser.