Koçhan, Didem (2018) Scalable Monte Carlo inference in regression models with missing data. [Thesis]
PDF
10207838_DidemKochan.pdf
Download (1MB)
10207838_DidemKochan.pdf
Download (1MB)
Abstract
Markov chain Monte Carlo (MCMC) and Stochastic Gradient Langevin Dynamics (SGLD) algorithms comprise a basis for this thesis. These methods are studied in detail and combined for handling incomplete and large datasets. Two algorithms, which are based on Metropolis-Hastings (MH) and SGLD, are proposed to improve the performance of regression with missing data. We introduce an SGLD algorithm for large datasets with missing portions. The algorithm approximates the gradient of the log-likelihood of a subset of the data with respect to the unknown parameter by using samples for missing components obtained with MH moves. We implemented these methods for a logistic regression model to obtain parameter estimations. We worked with two different datasets with missing features and compared their performances. The first dataset is artificially generated from a logistic regression model where the features are normally distributed, whereas the second dataset is a real categorical data.
Item Type: | Thesis |
---|---|
Uncontrolled Keywords: | Industrial and Industrial Engineering. -- Endüstri ve Endüstri Mühendisliği. |
Subjects: | T Technology > T Technology (General) > T055.4-60.8 Industrial engineering. Management engineering |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Industrial Engineering Faculty of Engineering and Natural Sciences |
Depositing User: | IC-Cataloging |
Date Deposited: | 05 Oct 2018 11:33 |
Last Modified: | 26 Apr 2022 10:26 |
URI: | https://research.sabanciuniv.edu/id/eprint/36605 |