Thammasiri, Dech and Delen, Dursun and Meesad, Phayung and Kasap, Nihat (2014) A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Systems with Applications, 41 (2). pp. 321-330. ISSN 0957-4174
This is the latest version of this item.
PDF (This is a RoMEO green journal -- author can archive pre-print (ie pre-refereeing) and author can archive post-print (ie final draft post-refereeing))
Kasap_ESWA_2014.pdf
Download (1MB)
Kasap_ESWA_2014.pdf
Download (1MB)
Official URL: http://dx.doi.org/10.1016/j.eswa.2013.07.046
Abstract
Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high
prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Student retention, Attrition, Prediction, Imbalanced class distribution, SMOTE, Sampling, Sensitivity analysis |
Divisions: | Sabancı Business School Sabancı Business School > Operations Management and Information Systems |
Depositing User: | Nihat Kasap |
Date Deposited: | 12 Nov 2013 17:59 |
Last Modified: | 26 Apr 2022 09:06 |
URI: | https://research.sabanciuniv.edu/id/eprint/22035 |
Available Versions of this Item
-
A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. (deposited 22 Sep 2013 22:53)
- A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. (deposited 12 Nov 2013 17:59) [Currently Displayed]