An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with SubwordsCitation formats

Standard

An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords. / Ju, Meizhi; Nguyen, Nhung; Miwa, Makoto; Ananiadou, Sophia.

In: Journal of the American Medical Informatics Association, 14.06.2019.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

Ju, Meizhi ; Nguyen, Nhung ; Miwa, Makoto ; Ananiadou, Sophia. / An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords. In: Journal of the American Medical Informatics Association. 2019.

Bibtex

@article{313e39054930474ba1c4cf17e707ec51,
title = "An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords",
abstract = "Objective: This paper describes an ensembling system to automatically extract adverse drug events (ADEs) and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2.Materials and Methods: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenised the MIMIC III dataset by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based CRF model and created an ensemble to combine its predictions with those of the neural model.Results: Our method achieved 92.78{\%} lenient micro F1-score with 95.99{\%} lenient precision and 89.79{\%} lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. Discussion: Analysis of the development set showed that our neural models can detect more informative text regions than feature-based CRF models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities especially nested entities.Conclusion: The overall results have demonstrated that the ensemble method can accurately recognise entities, including nested and polysemous entities. Additionally, our method can recognise sparse entities, by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.",
keywords = "Adverse Drug Event, Nested Named Entity Recognition, Information Extraction, Natural Language Processing, Electronic Health Record",
author = "Meizhi Ju and Nhung Nguyen and Makoto Miwa and Sophia Ananiadou",
year = "2019",
month = "6",
day = "14",
doi = "10.1093/jamia/ocz075",
language = "English",
journal = "Journal of the American Medical Informatics Association",
issn = "1067-5027",
publisher = "Oxford University Press",

}

RIS

TY - JOUR

T1 - An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords

AU - Ju, Meizhi

AU - Nguyen, Nhung

AU - Miwa, Makoto

AU - Ananiadou, Sophia

PY - 2019/6/14

Y1 - 2019/6/14

N2 - Objective: This paper describes an ensembling system to automatically extract adverse drug events (ADEs) and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2.Materials and Methods: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenised the MIMIC III dataset by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based CRF model and created an ensemble to combine its predictions with those of the neural model.Results: Our method achieved 92.78% lenient micro F1-score with 95.99% lenient precision and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. Discussion: Analysis of the development set showed that our neural models can detect more informative text regions than feature-based CRF models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities especially nested entities.Conclusion: The overall results have demonstrated that the ensemble method can accurately recognise entities, including nested and polysemous entities. Additionally, our method can recognise sparse entities, by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.

AB - Objective: This paper describes an ensembling system to automatically extract adverse drug events (ADEs) and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2.Materials and Methods: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenised the MIMIC III dataset by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based CRF model and created an ensemble to combine its predictions with those of the neural model.Results: Our method achieved 92.78% lenient micro F1-score with 95.99% lenient precision and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. Discussion: Analysis of the development set showed that our neural models can detect more informative text regions than feature-based CRF models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities especially nested entities.Conclusion: The overall results have demonstrated that the ensemble method can accurately recognise entities, including nested and polysemous entities. Additionally, our method can recognise sparse entities, by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.

KW - Adverse Drug Event

KW - Nested Named Entity Recognition

KW - Information Extraction

KW - Natural Language Processing

KW - Electronic Health Record

U2 - 10.1093/jamia/ocz075

DO - 10.1093/jamia/ocz075

M3 - Article

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

SN - 1067-5027

ER -