POS Tagging for Arabic Tweets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Authors:
  • Fahad Albogamy
  • Allan Ramsay
  • Galia Angelova (Editor)
  • Kalina Bontcheva (Editor)
  • Ruslan Mitkov (Editor)

Abstract

Part-of-Speech (POS) tagging is a key stepin many NLP algorithms. However, tweetsare difficult to POS tag because there aremany phenomena that frequently appear inTwitter that are not as common, or are entirelyabsent, in other domains: tweets areshort, are not always written maintainingformal grammar and proper spelling, andabbreviations are often used to overcometheir restricted lengths. Arabic tweets alsoshow a further range of linguistic phenomenasuch as usage of different dialects,romanised Arabic and borrowing foreignwords. In this paper, we present an evaluationand a detailed error analysis of stateof-the-art POS taggers for Arabic whenapplied to Arabic tweets. The accuracy ofstandard Arabic taggers is typically excellent(96-97%) on Modern Standard Arabic(MSA) text; however, their accuracy declinesto 49-65% on Arabic tweets. Further,we present our initial approach to improvethe taggers’ performance. By doingsome improvements based on observed errors,we are able to reach 79% tagging accuracy.

Bibliographical metadata

Original languageEnglish
Title of host publicationRecent Advances in Natural Language Processing
EditorsGalia Angelova, Kalina Bontcheva, Ruslan Mitkov
Pages1-8
Number of pages8
Publication statusPublished - Sep 2015
EventRecent Advances in Natural Language Processing - Hissarya, Bulgaria
Event duration: 1 Jan 1824 → …
http://lml.bas.bg/ranlp2015/docs/RANLP_main.pdf

Conference

ConferenceRecent Advances in Natural Language Processing
CityHissarya, Bulgaria
Period1/01/24 → …
Internet address