Comparison of active learning based hierarchical classification approaches on Twitter
Zaman, Rashid (2015) Comparison of active learning based hierarchical classification approaches on Twitter. [Thesis]
Real world data is mostly multi-labeled ie., it belongs to multiple classes simultaneously, as opposed to single labeled data belonging to a single class. At times these multiple labels fit into a logical hierarchy such that parent labels up in the hierarchy are generic and the related child labels down the hierarchy are more specific. Most of the machine learning classifiers are either serving single label classification tasks or have been transformed to perform flat mutli-label classification. At present, dedicated classifiers for hierarchical classification do no exist. For the purpose, strategies are designed relying on the single labeled classifiers to perform hierarchical classification. For such strategies are well-known in literature. Hierarchical classification has been researched in many domains like text categorization, webpages classification and medical diagnosis and has been found very useful. So far Twitter has been neglected by the researchers in hierarchical classification perspective. For developing supervised models labeled data is needed and labeling task requires resources in terms of humans, money and time, delimiting the amount of data which can be labeled. Active learning, a type of supervised learning, achieves acceptable performance with minimal amount of labeled data as compared to supervised learning models. In active learning, the learner selects the possible to achieve comparable model performance to that of supervised learning with lesser labeling effort and resources. Active learning is well-suited to the situations where unlabeled data is abundantly available. Hierarchical classification of tweets complemented by active learning as a viable labeling mechanism presents an interesting research problem. We implemented the prevailing four hierarchical classification approaches with active learning for twitteer domain. Based on the results, we can safely say that active learning is equally beneficial in Twitter. Comparing the results of the four approaches, hierarchical prediction through flat classification with active learning approach outperforms the other approaches.
Repository Staff Only: item control page