Discovery of amino acid compositions and motifs responsible for topological transitions in protein complexes
Ekmen, Erhan (2021) Discovery of amino acid compositions and motifs responsible for topological transitions in protein complexes. [Thesis]
Prediction of structural classes of proteins has been pursued using various features of proteins such as amino acid composition (AAC), sequence information, structural motifs and amino acid coordinates. In some studies, it has been shown that using only AACs is enough to predict structural classes such as α, β, α+β, α/β and being monomer or dimer with high accuracy. These studies implicate that evolution has an impact on AAC for secondary and quaternary structure preferences of proteins. In this study, we use AACs to predict the topological preferences of protein complexes by applying several machine learning (ML) models. We used k-Nearest Neighbor (kNN) and Support Vector Machine (SVMs) algorithms utilizing AACs as the only feature for the prediction of secondary and quaternary structural classes of proteins. We successfully predicted the five secondary structural classes (α, β, α+β, α/β, s) of proteins with average F1-score of 0.65 with multiclass model. Different quaternary structural classes of complexes having four subunits have also shown that distinctive complexes which have higher symmetry can be predicted more robustly, up to an F1-score of 0.86, and proteins in two virus capsid structure classes with different symmetry can be predicted up to an F1-score of 0.89, proving how a simple feature of proteins is effective for quaternary structure of the protein complexes. To gain a physics-based understanding of these findings, we modeled the chains at the level of H/P (Hydrophobic/Polar) two-letter alphabet and detected unique 10-16 letter long sequences belonging to different quaternary topologies. We applied coarse-grained Dissipative Particle Dynamics (DPD) simulations on complexes which have repetitions of these sequences and found associations unique to the sequences. Thus, although the AACs are effective in the formation of quaternary structures, sequences creating special hydrophobic patches at the interface determine the topological details.
Repository Staff Only: item control page