Arabic Treebank: from Phrase-Structure Trees to Dependency Trees

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Authors:
  • Allan Ramsay
  • Jan Hajic (Editor)
  • Koenraad De Smedt (Editor)
  • Marko Tadic (Editor)
  • Antonio Branco (Editor)

Abstract

The aim here is to create a dependency treebank from a phrase-structure treebank for Arabic. Arabic has a number of characteristics,described below, which make it particularly challenging to any natural language processing (NLP) applications. We describe an encouraging semi-automatic technique for converting phrase-structure trees to dependency trees by using a head percolation table.One of the most significant challenges here is the determination of the head of each subtree. We therefore examined different versionsof the head percolation table to find the best priority list for each entry in the table. Given that there is no absolute measure of the‘correctness’ of a conversion of a phrase structure tree to dependency form, we tested the various transformations by seeing how well astate-of-the-art dependency parser learnt the generalisations that were embodied by the converted trees.

Bibliographical metadata

Original languageEnglish
Title of host publicationhost publication
EditorsJan Hajic, Koenraad De Smedt, Marko Tadic, Antonio Branco
Place of PublicationMETA-RESEARCH Workshop on Advanced Treebanking Advanced Treebanking 2012
Pages61-68
Number of pages8
Publication statusPublished - May 2012
EventMETA-RESEARCH Workshop on Advanced Treebanking, Language Resources and Evaluation Conference - Istanbul
Event duration: 21 May 201227 May 2012
http://www.lrec-conf.org/proceedings/lrec2012/index.html

Conference

ConferenceMETA-RESEARCH Workshop on Advanced Treebanking, Language Resources and Evaluation Conference
CityIstanbul
Period21/05/1227/05/12
Internet address