AN INVESTIGATION INTO THE CROSS-LINGUISTIC ROBUSTNESS OF TEXTUAL EQUIVALENCE TECHNIQUES

UoM administered thesis: Phd

  • Authors:
  • Amal Alshahrani

Abstract

This thesis explores a range of techniques that have been applied to the task of Textual Equivalence (TEQV), i.e., identifying whether one text snippet is equivalent to another. This task has been widely explored for English texts. In this study we investigate and analyse the extent to which these techniques generalise to other languages, in particular Arabic. Written Arabic is widely said to be more ambiguous than English. This ambiguity makes determining the relationships between text snippets particularly challenging. We have tried to use these techniques in settings which are as similar as possible so that any differences that appear in the experimental results can be reliably attributed to differences between the two languages, rather than to differences in the experimental set-up. In particular the dynamic time warping (DTW) algorithm has been used to measure the similarity between sentence pairs by calculating the minimum number of editing operations (Insert, Delete, Exchange) which are required to convert one sentence to another. Also WordNet similarity measures have been used as a cost function for the Exchange operation. This algorithm has been extended with an extra operation, Swap, which allows for local permutations to compensate for the comparatively free word order of Arabic. The outcome is that when we extend the coverage of Arabic WordNet we obtain similar results to the use of English WordNet for TEQV for English; and that using the extended version of DTW provides more benefits for Arabic than for English

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
  • Allan Ramsay (Supervisor)
Award date31 Dec 2018