Şen, Mehmet Umut and Yanıkoğlu, Berrin (2018) Document classification of SuDer Turkish news corpora [SuDer Türkçe haber derlemlerinin doküman sınıflandırması]. In: 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey
PDF
siu-2018-son.pdf
Download (191kB)
siu-2018-son.pdf
Download (191kB)
Official URL: http://dx.doi.org/10.1109/SIU.2018.8404790
Abstract
Word embeddings are successfully employed in various Natural Language Processing tasks, but training them requires large amount of text, which is scarce for Turkish. In this work, we collected large amounts of articles from two news websites and tags within web pages are used as labels. Obtained corpora are tested with various document classification models. Embedding based models performed better than models with the traditional TF-IDF features. A neural network that simultaneously learns the word embeddings and document classification performed the best.
Item Type: | Papers in Conference Proceedings |
---|---|
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | Berrin Yanıkoğlu |
Date Deposited: | 06 Sep 2018 16:04 |
Last Modified: | 03 Jun 2023 20:12 |
URI: | https://research.sabanciuniv.edu/id/eprint/36553 |