Multi-domain hate speech detection using dual contrastive learning and paralinguistic features

Dehghan, Somaiyeh and Yanıkoğlu, Berrin (2024) Multi-domain hate speech detection using dual contrastive learning and paralinguistic features. In: Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino

Full text not available from this repository. (Request a copy)

Abstract

Social networks have become venues where people can share and spread hate speech, especially when the platforms allow users to remain anonymous. Hate speech can have significant social and cultural effects, especially when it targets specific groups of people in terms of religion, race, ethnicity, culture or a specific social situation such as immigrants and refugees. In this study, we propose a hate speech detection model, BERTurk-DualCL, using a mixed objective with contrastive learning loss that is combined with the traditional cross-entropy loss used for classification. In addition, we study the effects of paralinguistic features, namely emojis and hashtags, on the performance of our model. We trained and evaluated our model on tweets in four different topics with heated discussions from two separate datasets, ranging from discussions about migrants to the Israel-Palestine conflict. Our multi-domain model outperforms comparable results in literature and the average results of four domain-specific models, achieving a macro-F1 score of 81.04% and 58.89% on two- and five-class tasks respectively.
Item Type: Papers in Conference Proceedings
Uncontrolled Keywords: Contrastive Learning; Hate Speech Detection; Turkish Language
Divisions: Center of Excellence in Data Analytics
Faculty of Engineering and Natural Sciences
Depositing User: Berrin Yanıkoğlu
Date Deposited: 30 Jul 2024 11:10
Last Modified: 30 Jul 2024 11:10
URI: https://research.sabanciuniv.edu/id/eprint/49543

Actions (login required)

View Item
View Item