title
  

Mixed-integer exponential cone programming in action:sparse logistic regression and optimal histogram construction

Asgharieh Ahari, Sahand (2020) Mixed-integer exponential cone programming in action:sparse logistic regression and optimal histogram construction. [Thesis]

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
785Kb

Official URL: https://risc01.sabanciuniv.edu/record=b2486368 _(Table of contents)

Abstract

In this study, two problems namely as, Feature Subset Selection In Logistic Regression and Optimal Histogram Construction are formulated and solved using solver MOSEK. The common characteristic of both problems is that the objective functions are Exponential Cone-representable. In the first problem, a prediction model is derived to predict the dichotomous dependent variable using labeled datasets which is known as classification in the context of machine learning. Different versions of the model are derived by the means of regularization and goodness of fit measures including Akaike Information Criteria, Bayesian Information Criteria, and Adjusted McFadden. Furthermore, the performance of these different versions are evaluated over a set of toy examples and benchmark datasets. The second model is developed to find the optimal bin width of histograms with the aim of minimizing Kullback– Leibler divergence, which is called Information gain in machine learning. The success of the proposed model is demonstrated over randomly generated instances from different probability distributions including Normal, Gamma and Poission

Item Type:Thesis
Uncontrolled Keywords:In this study, two problems namely as, Feature Subset Selection In Logistic Regression and Optimal Histogram Construction are formulated and solved using solver MOSEK. The common characteristic of both problems is that the objective functions are Exponential Cone-representable. In the first problem, a prediction model is derived to predict the dichotomous dependent variable using labeled datasets which is known as classification in the context of machine learning. Different versions of the model are derived by the means of regularization and goodness of fit measures including Akaike Information Criteria, Bayesian Information Criteria, and Adjusted McFadden. Furthermore, the performance of these different versions are evaluated over a set of toy examples and benchmark datasets. The second model is developed to find the optimal bin width of histograms with the aim of minimizing Kullback– Leibler divergence, which is called Information gain in machine learning. The success of the proposed model is demonstrated over randomly generated instances from different probability distributions including Normal, Gamma and Poission.
Subjects:T Technology > T Technology (General) > T055.4-60.8 Industrial engineering. Management engineering
ID Code:41190
Deposited By:IC-Cataloging
Deposited On:25 Oct 2020 11:30
Last Modified:25 Oct 2020 11:30

Repository Staff Only: item control page