title   
  

Initial explorations in English to Turkish statistical machine translation

Durgar El-Kahlout, İlknur and Oflazer, Kemal (2006) Initial explorations in English to Turkish statistical machine translation. In: North American ACL - Workshop on Statistical Machine Translation, Newyork

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
134Kb

Abstract

This paper presents some very preliminary results for and problems in developing a statistical machine translation system from English to Turkish. Starting with a baseline word model trained from about 20K aligned sentences, we explore various ways of exploiting morphological structure to improve upon the baseline system. As Turkish is a language with complex agglutinative word structures, we experiment withmorphologically segmented and disambiguated versions of the parallel texts in order to also uncover relations between morphemes and function words in one language with morphemes and functions words in the other, in addition to relations between open class content words. Morphological segmentation on the Turkish side also conflates the statistics from allomorphs so that sparseness can be alleviated to a certain extent. We find that this approach coupled with a simple grouping of most frequent morphemes and function words on both sides improve the BLEU score from the baseline of 0.0752 to 0.0913 with the small training data. We close with a discussion on why one should not expect distortion parameters to model word-local morpheme ordering and that a new approach to handling complex morphotactics is needed.

Item Type:Papers in Conference Proceedings
Subjects:Q Science > QA Mathematics
P Language and Literature > P Philology. Linguistics
ID Code:1212
Deposited By:İlknur Durgar El-Kahlout
Deposited On:19 Dec 2006 02:00
Last Modified:25 Oct 2007 20:22

Repository Staff Only: item control page