Arslan, Burak Bülent (2009) An approach to the morphological disambiguation problem using conditional random fields. [Thesis]
PDF
BulentBurakArslantez.pdf
Download (324kB)
BulentBurakArslantez.pdf
Download (324kB)
Official URL: http://192.168.1.20/record=b1293816 (Table of Contents)
Abstract
Morphology is the subfield of linguistics that studies the internal structures of words. Morphological analysis is the first step in revealing this structure by enumerating possible underlying morphological unit combinations that describe the surface form of a given word. The given surface form is said to be morphologically ambiguous, when more than one analysis corresponds to the given surface form. While words in every natural language may manifest morphological ambiguity, solving the problem of morphological disambiguation presents different challenges for different languages. In this work, we present an approach to this problem using Conditional Random Fields, a statistical framework that elegantly avoids data sparseness problems arising from the large vocabulary and tag set sizes, a characteristic of Turkish language. CRFs are used to build statistical models that rely on simple functions of easily testable properties of the training data at hand. Thanks to higher expressiveness gained by using tests on individual morphological markers, our results are in line with the state-of-the-art, using only a simple one-dimensional bigram chain model.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Natural language processing. -- Computational linguistics. -- Morphological disambiguation. -- Doğal dil işleme. -- Hesaplamalı dilbilim. -- Biçimbirimsel denkleştirme. |
Subjects: | Q Science > QA Mathematics > QA076 Computer software |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | IC-Cataloging |
Date Deposited: | 10 Feb 2011 15:00 |
Last Modified: | 26 Apr 2022 09:53 |
URI: | https://research.sabanciuniv.edu/id/eprint/16358 |