Discovery of amino acid compositions and motifs responsible for topological transitions in protein complexes

Ekmen, Erhan (2021) Discovery of amino acid compositions and motifs responsible for topological transitions in protein complexes. [Thesis]

[thumbnail of 10336381.pdf] PDF

Download (1MB)


Prediction of structural classes of proteins has been pursued using various features of proteins such as amino acid composition (AAC), sequence information, structural motifs and amino acid coordinates. In some studies, it has been shown that using only AACs is enough to predict structural classes such as α, β, α+β, α/β and being monomer or dimer with high accuracy. These studies implicate that evolution has an impact on AAC for secondary and quaternary structure preferences of proteins. In this study, we use AACs to predict the topological preferences of protein complexes by applying several machine learning (ML) models. We used k-Nearest Neighbor (kNN) and Support Vector Machine (SVMs) algorithms utilizing AACs as the only feature for the prediction of secondary and quaternary structural classes of proteins. We successfully predicted the five secondary structural classes (α, β, α+β, α/β, s) of proteins with average F1-score of 0.65 with multiclass model. Different quaternary structural classes of complexes having four subunits have also shown that distinctive complexes which have higher symmetry can be predicted more robustly, up to an F1-score of 0.86, and proteins in two virus capsid structure classes with different symmetry can be predicted up to an F1-score of 0.89, proving how a simple feature of proteins is effective for quaternary structure of the protein complexes. To gain a physics-based understanding of these findings, we modeled the chains at the level of H/P (Hydrophobic/Polar) two-letter alphabet and detected unique 10-16 letter long sequences belonging to different quaternary topologies. We applied coarse-grained Dissipative Particle Dynamics (DPD) simulations on complexes which have repetitions of these sequences and found associations unique to the sequences. Thus, although the AACs are effective in the formation of quaternary structures, sequences creating special hydrophobic patches at the interface determine the topological details.
Item Type: Thesis
Uncontrolled Keywords: amino acid composition. -- secondary/quaternary structure. -- k-nearest neighbor. -- support vector machine. -- H/P model. -- dissipative particle dynamics. -- amino asit yüzdesi. -- ikincil/dördüncül yapı. -- k en yakın komşu. -- destek vektör makineleri. -- H/P model. -- dağılıcı parçacık dinamiği.
Subjects: T Technology > TA Engineering (General). Civil engineering (General) > TA164 Bioengineering
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Biological Sciences & Bio Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 18 Nov 2021 14:45
Last Modified: 26 Apr 2022 10:40

Actions (login required)

View Item
View Item