Classification of proteins using sequential and structural features

Albayrak, Aydın (2011) Classification of proteins using sequential and structural features. [Thesis]

[thumbnail of AydinAlbayrak_411626.pdf] PDF
AydinAlbayrak_411626.pdf

Download (2MB)

Abstract

Classification of proteins is an important process in many areas of bioinformatics research. In this thesis, we devised three different strategies to classify proteins with high accuracy that may have implications for function and attribute annotation. First, protein families were classified into different functional subtypes using a classification-via-clustering approach by using relative complexity measure with reduced amino acid alphabets (RAAA). The devised procedure does not require multiple alignment of sequences and produce high classification accuracies. Second, different fixed-length motif and RAAA combinations were used as features to represent proteins from different thermostability classes. A T-test based dimensionality reduction scheme was applied to reduce the number of features and those features were used to develop support vector machine classifiers. The devised procedure produced better results with less number of features than purely using native protein alphabet. Third, a non-homologous protein structure dataset containing hyperthermophilic, thermophilic, and mesophilic proteins was assembled de novo. Comprehensive statistical analyses of the dataset were carried out to highlight novel features correlated with increased thermostability and machine learning approaches were used to discriminate the proteins. For the first time, our results strongly indicate that combined sequential and structural features are better predictors of protein thermostability than purely sequential or structural features. Furthermore, the discrimination capability of machine learning models strongly depends on RAAAs.
Item Type: Thesis
Uncontrolled Keywords: Protein classification. -- Support vector machines. -- Classification-via-clustering. -- Thermostability. -- Protein families. -- Sequential and structural features. -- Machine learning. -- Relative complexity measure. -- Reduced amino acid alphabets. -- Proteinlerin sınıflandırılması. -- LibSVM. -- Kümeleme ile sınıflandırma. -- Sıcaklık dayanıklılığı. -- Protein aileleri. -- Dizisel ve yapısal özellikler. -- Bilgisayarlı öğrenme yöntemleri. -- Göreceli zorluk değeri. -- Sadeleştirilmiş protein alfabeleri. -- Destek vektör sınıflandırıcıları.
Subjects: T Technology > TA Engineering (General). Civil engineering (General) > TA164 Bioengineering
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Biological Sciences & Bio Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 17 Mar 2015 16:27
Last Modified: 26 Apr 2022 10:04
URI: https://research.sabanciuniv.edu/id/eprint/26780

Actions (login required)

View Item
View Item