Music emotion recognition: a multimodal machine learning approach

Gökalp, Cemre (2019) Music emotion recognition: a multimodal machine learning approach. [Thesis]

[thumbnail of 10234904_CemreGokalp.pdf] PDF
10234904_CemreGokalp.pdf

Download (1MB)

Abstract

Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performance
Item Type: Thesis
Uncontrolled Keywords: Music emotion recognition. -- Music information retrieval. -- Machine learning. -- Feature selection. -- Multi-modal analysis. -- Müzik duygusu tanıma. -- Müzik bilgisi çıkarımı. -- Makine öğrenmesi. -- Özellik seçimi. -- Çok-modlu analiz.
Subjects: H Social Sciences > HD Industries. Land use. Labor
Divisions: Sabancı Business School
Depositing User: IC-Cataloging
Date Deposited: 06 Nov 2019 10:28
Last Modified: 26 Apr 2022 10:32
URI: https://research.sabanciuniv.edu/id/eprint/39455

Actions (login required)

View Item
View Item