Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary DiseaseCitation formats

  • External authors:
  • Meizhi Ju
  • Paul Thompson
  • Nawar Diar Bakerly
  • Georgios Gkoutos
  • Loukia Tsaprouni

Standard

Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease. / Ju, Meizhi; Short, Andrea; Thompson, Paul; Diar Bakerly, Nawar; Gkoutos, Georgios; Tsaprouni, Loukia; Ananiadou, Sophia.

In: Journal of the American Medical Informatics Association, 26.04.2019.

Research output: Contribution to journalArticle

Harvard

Ju, M, Short, A, Thompson, P, Diar Bakerly, N, Gkoutos, G, Tsaprouni, L & Ananiadou, S 2019, 'Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease', Journal of the American Medical Informatics Association. https://doi.org/10.1093/jamiaopen/ooz009

APA

Ju, M., Short, A., Thompson, P., Diar Bakerly, N., Gkoutos, G., Tsaprouni, L., & Ananiadou, S. (2019). Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease. Journal of the American Medical Informatics Association. https://doi.org/10.1093/jamiaopen/ooz009

Vancouver

Ju M, Short A, Thompson P, Diar Bakerly N, Gkoutos G, Tsaprouni L et al. Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease. Journal of the American Medical Informatics Association. 2019 Apr 26. https://doi.org/10.1093/jamiaopen/ooz009

Author

Ju, Meizhi ; Short, Andrea ; Thompson, Paul ; Diar Bakerly, Nawar ; Gkoutos, Georgios ; Tsaprouni, Loukia ; Ananiadou, Sophia. / Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease. In: Journal of the American Medical Informatics Association. 2019.

Bibtex

@article{1aadcf1927f448969c77993a838c254c,
title = "Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease",
abstract = "Objective Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining (TM) methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network based named entity recogniser to detect fine-grained COPD phenotypic information.Materials and Methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory (BiLSTM)-CRF network firstly recognises nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognise enclosing phenotype mentions.Results Our corpus of 30 full papers (available at http://www.nactem.ac.uk/COPD) is annotated by experts with 27,030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognising detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, e.g., those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine–grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.",
keywords = "Chronic Obstructive Pulmonary Disease, Text Mining, Natural Language Processing, Phenotype, Information Extraction",
author = "Meizhi Ju and Andrea Short and Paul Thompson and {Diar Bakerly}, Nawar and Georgios Gkoutos and Loukia Tsaprouni and Sophia Ananiadou",
year = "2019",
month = "4",
day = "26",
doi = "10.1093/jamiaopen/ooz009",
language = "English",
journal = "Journal of the American Medical Informatics Association",
issn = "1067-5027",
publisher = "Oxford University Press",

}

RIS

TY - JOUR

T1 - Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease

AU - Ju, Meizhi

AU - Short, Andrea

AU - Thompson, Paul

AU - Diar Bakerly, Nawar

AU - Gkoutos, Georgios

AU - Tsaprouni, Loukia

AU - Ananiadou, Sophia

PY - 2019/4/26

Y1 - 2019/4/26

N2 - Objective Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining (TM) methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network based named entity recogniser to detect fine-grained COPD phenotypic information.Materials and Methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory (BiLSTM)-CRF network firstly recognises nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognise enclosing phenotype mentions.Results Our corpus of 30 full papers (available at http://www.nactem.ac.uk/COPD) is annotated by experts with 27,030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognising detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, e.g., those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine–grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

AB - Objective Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining (TM) methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network based named entity recogniser to detect fine-grained COPD phenotypic information.Materials and Methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory (BiLSTM)-CRF network firstly recognises nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognise enclosing phenotype mentions.Results Our corpus of 30 full papers (available at http://www.nactem.ac.uk/COPD) is annotated by experts with 27,030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognising detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, e.g., those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine–grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

KW - Chronic Obstructive Pulmonary Disease

KW - Text Mining

KW - Natural Language Processing

KW - Phenotype

KW - Information Extraction

U2 - 10.1093/jamiaopen/ooz009

DO - 10.1093/jamiaopen/ooz009

M3 - Article

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

SN - 1067-5027

ER -