Oflazer, Kemal and Nirenburg, Sergei and McShane, Marjorie (2001) Bootstrapping morphological analyzers by combining human elicitation and machine learning. Computational Linguistics, 27 (1). pp. 59-85. ISSN 0891-2017
Full text not available from this repository. (Request a copy)
Official URL: http://dx.doi.org/10.1162/089120101300346804
Abstract
This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components-elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA075 Electronic computers. Computer science |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | Kemal Oflazer |
Date Deposited: | 07 Jun 2010 22:48 |
Last Modified: | 25 Jul 2019 10:00 |
URI: | https://research.sabanciuniv.edu/id/eprint/14018 |