Durgar El-Kahlout, İlknur and Oflazer, Kemal (2006) Initial explorations in English to Turkish statistical machine translation. In: North American ACL - Workshop on Statistical Machine Translation, Newyork
PDF
3011800000627.pdf
Download (137kB)
3011800000627.pdf
Download (137kB)
Abstract
This paper presents some very preliminary results for and problems in developing a statistical machine translation system from English to Turkish. Starting with a baseline word model trained from about 20K aligned sentences, we explore various ways of exploiting morphological structure to improve upon the baseline system. As Turkish is a language with complex agglutinative word structures, we experiment withmorphologically segmented and disambiguated versions of the parallel texts in order to also uncover relations between morphemes and function words in one language with morphemes and functions words in the other, in addition to relations between open class content words. Morphological segmentation on the Turkish side also conflates the statistics from allomorphs so that sparseness can be alleviated to a certain extent. We find that this approach coupled with a simple grouping of most frequent morphemes and function words on both sides improve the BLEU score from the baseline of 0.0752 to 0.0913 with the small training data. We close with a discussion on why one should not expect distortion parameters to model word-local morpheme ordering and that a new approach to handling complex morphotactics is needed.
Item Type: | Papers in Conference Proceedings |
---|---|
Subjects: | Q Science > QA Mathematics P Language and Literature > P Philology. Linguistics |
Divisions: | Faculty of Engineering and Natural Sciences |
Depositing User: | İlknur Durgar El-Kahlout |
Date Deposited: | 19 Dec 2006 02:00 |
Last Modified: | 26 Apr 2022 08:33 |
URI: | https://research.sabanciuniv.edu/id/eprint/1212 |