Feature selection using genetic algorithms

Warning The system is temporarily closed to updates for reporting purpose.

Barlak, Eda Sevim (2007) Feature selection using genetic algorithms. [Thesis]

[thumbnail of 3021800000061.pdf] PDF
3021800000061.pdf

Download (1MB)

Abstract

Microarray data is very important for identification of complex diseases and the development of diagnostic kits. This topic exhibits considerable aid especially to cancer research. Therefore, an influential number of biological and medical researchers have to deal with the datasets obtained from microarray experiments. Usage of these huge datasets is not efficient in terms of time and cost. Thus, many researchers contribute to tumor classification via effective use of microarray technologies for cancer research. To be able to obtain the most relevant subset containing the signature genes that are included in the pathway of certain diseases and therefore capable of classifying the entire data, is very crucial for true disease diagnosis. There are several approaches in the literature for this classification purpose. In this thesis, we present an approach to use, Genetic Algorithms for this feature subset selection problem. Genetic Algorithm is combined with Support Vector Machines for the calculation of classification accuracies of each gene. These classification accuracies denote the survival probabilities of the genes in our algorithm. The genes having higher classification accuracy will have more probability to survive. Three different real life cancer datasets are used for the tests. Our algorithm converged to better results then all other approaches in the literature. In colon tumor dataset which is one of our test datasets, we were able to classify the entire data with the accuracy of 100% using only 4 features ( genes ). In prostate cancer dataset we classified the data using 3 features with the accuracy of 100%. And finally we tested our Genetic Algorithm using an ovarian cancer dataset and we found only 3 significant features out of 15154 genes, again with the accuracy of 100%
Item Type: Thesis
Uncontrolled Keywords: Genetic algorithms. -- Feature selection. -- Colon cancer. -- Prostate cancer. -- Ovarian cancer. -- Genetik algoritmalar. -- Özellik altkümesi seçimi. -- Kolon kanseri. -- Prostat kanseri. -- Ovaryan kanseri
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Biological Sciences & Bio Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 20 May 2008 15:26
Last Modified: 26 Apr 2022 09:49
URI: https://research.sabanciuniv.edu/id/eprint/8518

Actions (login required)

View Item
View Item