Developing Turkish Language Models On SocialMedia

Najafi, Ali (2024) Developing Turkish Language Models On SocialMedia. [Thesis]

PDF
10656301.pdf

Download (1MB)

Abstract

Turkish is one of the most spoken languages in the world; however, it is still among the low-resource languages. Wide us of this language on social media platforms such as Twitter, Instagram, or Tiktok and strategic position of the country in the world politics makes it appealing for the social network researchers and industry. To address this need, we introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social media built using over 894 million Turkish tweets. The model shares the same architecture as RoBERTa-base model with smaller input length, making TurkishBERTweet lighter than the most used model, called BERTurk, and can have significantly lower inference time. We trained our model using the same approach for RoBERTa model and evaluated on two tasks: Sentiment Classification and Hate Speech Detection. We demonstrate that TurkishBERTweet outperforms the other available alternatives on generalizability and its lower inference time gives significant advantage to process large-scale datasets. We also show custom preprocessors for social media can acquire information from platform specific entities. We also conduct comparison with the commercial solutions like OpenAI and Gemini, and other available Turkish LLMs in terms of cost and performance to demonstrate TurkishBERTweet is scalable and cost-effective.
Item Type: Thesis
Uncontrolled Keywords: TurkishBERTweet, Sentiment Analysis, HateSpeech Detection,ChatGPT, Special Tokenizer -- TurkishBERTweet, Duygu Analizi, Nefret Söylemi Tespiti,ChatGPT, Special Tokenizer
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 18 Feb 2025 13:34
Last Modified: 18 Feb 2025 14:09
URI: https://research.sabanciuniv.edu/id/eprint/51396

Actions (login required)

View Item
View Item