A Turkish hate speech dataset and detection system

Beyhan, Fatih and Çarık, Buse and Arın, İnanç and Terzioğlu, Ayşecan and Yanıkoğlu, Berrin and Yeniterzi, Reyyan (2022) A Turkish hate speech dataset and detection system. In: 13th Conference on Language Resources and Evaluation (LREC 2022), Marseille, France

[thumbnail of 2022.lrec-1.443.pdf] PDF
2022.lrec-1.443.pdf
Restricted to Registered users only

Download (330kB) | Request a copy

Abstract

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.
Item Type: Papers in Conference Proceedings
Uncontrolled Keywords: Hate speech detection, Deep learning, Turkish
Subjects: Q Science > QA Mathematics > QA076 Computer software
Divisions: Center of Excellence in Data Analytics
Faculty of Arts and Social Sciences
Faculty of Engineering and Natural Sciences
Depositing User: Berrin Yanıkoğlu
Date Deposited: 13 Sep 2022 16:12
Last Modified: 09 Apr 2023 22:13
URI: https://research.sabanciuniv.edu/id/eprint/44383

Actions (login required)

View Item
View Item