Evaluating the suitability of using raw machine translation output as input for foreign language written production

UoM administered thesis: Doctoral Thesis

Abstract

This thesis focuses on describing and evaluating the results of using machine translation post-editing in the foreign language class in order to provide guidelines relating to the suitability of using raw machine translation output as input for foreign language written production.

Given that free online raw machine translation output is currently used by language students for their foreign language written assignments, in this thesis we aim at evaluating the difficulty of post-editing into the foreign language against translating into the foreign language. The idea is to suggest a good use of raw MT output in the foreign language class in order to complement written production practice, and to introduce the students to machine translation post-editing, an activity that is growing in demand for language-skilled professionals. For this purpose a quantitative and qualitative analysis of the frequency and distribution of errors per error domain, per error category, and per text type was carried out. The analysis was based on a previously evaluated raw machine translation output, and on a learner corpus of student-edited (or post-edited) output, and student-translated output which was compiled as part of this study. The learner corpus contains 56,893 tokens and consists of the English-Spanish machine translation post-editing and translation work done by two groups of 16 advanced students of Spanish as a foreign language on eight general text types.

The computer-aided error analysis (CEA) methodology suggested by Granger (1998) was adopted, which involved the manual correction of the learner corpus, the design of an error typology and tagset, the insertion of error tags in the text files by means of a semi-automated error tagger, and the retrieval of lists of the most frequent error types in machine traslation post-editing and translation, together with an illustrative concordance-based linguistic analysis of major error types.

The results of this descriptive study indicate that there is a marked improvement of the student-edited output over the machine translation output, and of the student-edited output over the student-translated output in all the error domains and text types. It was also found that the percentage of words successfully post-edited was considerably higher than the percentage of unsuccessfully post-edited words per student and per text type. These results suggest that the machine translation post-editing experience was beneficial for the students.

Finally we report on the perceptions of foreign language students and teachers on the potential and limitations of the use of machine translation (especially machine translation post-editing) for foreign language instruction. We also outline some of the teaching and learning implications of this study in traditional and distance teaching/learning environments, and we propose a task-based pedagogical framework with suggestions for the integration of writing, translation, and machine translation post-editing in the FL class.



Details

Original languageEnglish
Supervisors/Advisors
    Award date2006