Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

Sarıkaya, Ruhi and Afify, Mohamed and Deng, Yonggang and Erdoğan, Hakan and Gao, Yuqing (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Transactions on Audio Speech and Language Processing, 16 (7). pp. 1330-1340. ISSN 1558-7916

[thumbnail of sarikaya08jml.pdf] PDF
sarikaya08jml.pdf

Download (117kB)

Abstract

Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
Item Type: Article
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences > Academic programs > Electronics
Faculty of Engineering and Natural Sciences
Depositing User: Hakan Erdoğan
Date Deposited: 08 Nov 2008 11:00
Last Modified: 26 Apr 2022 08:24
URI: https://research.sabanciuniv.edu/id/eprint/10259

Actions (login required)

View Item
View Item