title   
  

Feature selection using genetic algorithms

Barlak, Eda Sevim (2007) Feature selection using genetic algorithms. [Thesis]

[img]PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1169Kb

Official URL: http://risc01.sabanciuniv.edu/record=b1221504 (Table of Contents)

Abstract

Microarray data is very important for identification of complex diseases and the development of diagnostic kits. This topic exhibits considerable aid especially to cancer research. Therefore, an influential number of biological and medical researchers have to deal with the datasets obtained from microarray experiments. Usage of these huge datasets is not efficient in terms of time and cost. Thus, many researchers contribute to tumor classification via effective use of microarray technologies for cancer research. To be able to obtain the most relevant subset containing the signature genes that are included in the pathway of certain diseases and therefore capable of classifying the entire data, is very crucial for true disease diagnosis. There are several approaches in the literature for this classification purpose. In this thesis, we present an approach to use, Genetic Algorithms for this feature subset selection problem. Genetic Algorithm is combined with Support Vector Machines for the calculation of classification accuracies of each gene. These classification accuracies denote the survival probabilities of the genes in our algorithm. The genes having higher classification accuracy will have more probability to survive. Three different real life cancer datasets are used for the tests. Our algorithm converged to better results then all other approaches in the literature. In colon tumor dataset which is one of our test datasets, we were able to classify the entire data with the accuracy of 100% using only 4 features ( genes ). In prostate cancer dataset we classified the data using 3 features with the accuracy of 100%. And finally we tested our Genetic Algorithm using an ovarian cancer dataset and we found only 3 significant features out of 15154 genes, again with the accuracy of 100%

Item Type:Thesis
Uncontrolled Keywords:Genetic algorithms. -- Feature selection. -- Colon cancer. -- Prostate cancer. -- Ovarian cancer. -- Genetik algoritmalar. -- Özellik altkümesi seçimi. -- Kolon kanseri. -- Prostat kanseri. -- Ovaryan kanseri
Subjects:T Technology > TA Engineering (General). Civil engineering (General)
ID Code:8518
Deposited By:IC-Cataloging
Deposited On:20 May 2008 15:26
Last Modified:25 Dec 2008 13:20

Repository Staff Only: item control page