Sarıkaya, Ruhi and Afify, Mohamed and Deng, Yonggang and Erdoğan, Hakan and Gao, Yuqing (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Transactions on Audio Speech and Language Processing, 16 (7). pp. 1330-1340. ISSN 1558-7916
PDF
sarikaya08jml.pdf
Download (117kB)
sarikaya08jml.pdf
Download (117kB)
Official URL: http://dx.doi.org/10.1109/TASL.2008.924591
Abstract
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve.
Item Type: | Article |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |
Depositing User: | Hakan Erdoğan |
Date Deposited: | 08 Nov 2008 11:00 |
Last Modified: | 26 Apr 2022 08:24 |
URI: | https://research.sabanciuniv.edu/id/eprint/10259 |