Effect of dataset reduction techniques on computational complexity and predictive performance of classification problem

Akkaş, Suat (2025) Effect of dataset reduction techniques on computational complexity and predictive performance of classification problem. [Thesis]

PDF
10694613.pdf

Download (13MB)

Abstract

The usage of big <lata in the industry increases day by day. This situation existE also in the fınancial industry. The usage of big <lata in the fınancial sector leadE to enormous improvement in the areas of fınancial problems such as credit scoring problems. However, the usage of big <lata also increases the computational time and usage of available resources enormously. Therefore, this issue makes the usage oJ big <lata in some applications and some situations inefficient.To handle inefficiency in the usage of big <lata, we have focused on the sampling methods in this study. By using row-wise sampling algorithms and dimensionality reduction in <lata, we aimed to reduce computational time for solving credit scoring problems. However, our aim in this study is not just a reduction in computational time but also the performance of the model usage in credit scoring in the case oJ usage of big <lata. We have used also feature selection and transformation algorithmE in order to observe the effect of selection and transformation algorithms on different sample sizes of sampled <lata in terms of predictive power. Moreover, to validatE whether the sample dataset represents the main dataset or not, we have used a bunch of similarity metrics for different <lata types that exist in the dataset.By using this methodology, we have observed the relation between the computational time, power and <lata representativeness for different sample sizes of sampled <lata. According to our fındings from our study, it is possible to preserve the predictiVEıv""power of models until some sample size, with decreasing the computational amount in significant amounts. By demonstrating the relation between the computational time versus predictive power relations with different sample sizes and different fea­ ture reduction methods, we aim to propose the sample size and feature reductionselection for one's main concerns.
Item Type: Thesis
Uncontrolled Keywords: Sampling, Dimensionality Reduction, Similarity, Classification, Computational Performance.
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 22 Apr 2025 12:22
Last Modified: 22 Apr 2025 12:22
URI: https://research.sabanciuniv.edu/id/eprint/51787

Actions (login required)

View Item
View Item