Document classification of SuDer Turkish news corpora [SuDer Türkçe haber derlemlerinin doküman sınıflandırması]

Şen, Mehmet Umut and Yanıkoğlu, Berrin (2018) Document classification of SuDer Turkish news corpora [SuDer Türkçe haber derlemlerinin doküman sınıflandırması]. In: 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey

[thumbnail of siu-2018-son.pdf] PDF
siu-2018-son.pdf

Download (191kB)

Abstract

Word embeddings are successfully employed in various Natural Language Processing tasks, but training them requires large amount of text, which is scarce for Turkish. In this work, we collected large amounts of articles from two news websites and tags within web pages are used as labels. Obtained corpora are tested with various document classification models. Embedding based models performed better than models with the traditional TF-IDF features. A neural network that simultaneously learns the word embeddings and document classification performed the best.
Item Type: Papers in Conference Proceedings
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: Berrin Yanıkoğlu
Date Deposited: 06 Sep 2018 16:04
Last Modified: 03 Jun 2023 20:12
URI: https://research.sabanciuniv.edu/id/eprint/36553

Actions (login required)

View Item
View Item