Prediction of permissive insertion sites in proteins

Tayeh, Husamaldin H. A. (2013) Prediction of permissive insertion sites in proteins. [Thesis]

[thumbnail of HusamaldinTayeh_461962.pdf] PDF
HusamaldinTayeh_461962.pdf

Download (9MB)

Abstract

The procedure of domain insertion is proven to be very effective in the process of creating modified proteins that can be used for different protein engineering applications. Domain insertion alters the functionality of the protein by inserting gene or genes into certain domains. Proteins usually tolerate insertions in specific sites only, therefore identifying those permissive insertion sites is crucial for any successful insertion attempt. Normally, determining permissive insertion sites is performed experimentally by a genetic approach. However an educated guess can assist in predicting the potential permissive insertion sites. In this work, we introduced a method for predicting permissive insertion sites through the utilization of machine learning and data mining techniques. We have adopted an educated guess approach to predict permissive sites by extracting distinctive features from the amino acids surrounding the insertion site included within any captured amino acid window. The window size was made adjustable and can capture any odd number of amino acids. We used a number of features related to amino acids obtained from this window and then used a machine learning based approach to construct a trained SVM model using 135 permissive and non-permissive sites obtained from 10 different proteins. Our trained model was used to predict permissive insertion sites in Outer membrane usher protein FasD, Lactose operon repressor LacI, Type II secretion system protein XpsD, and Maltose periplasmic protein MalE and 70.59%, 61.11%, 61.90% and 90.00% accuracies were achieved respectively.
Item Type: Thesis
Additional Information: Yükseköğretim Kurulu Tez Merkezi Tez No: 348687.
Uncontrolled Keywords: Permissive insertion sites. -- Dipeptide composition. -- SVM. -- Feature selection. -- Gen yerleştirmeye elverişli protein alanları. -- Dipeptit kompoziyon. -- SVM. -- Özellik seçimi.
Subjects: Q Science > QA Mathematics > QA076 Computer software
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 03 Apr 2018 11:36
Last Modified: 26 Apr 2022 10:14
URI: https://research.sabanciuniv.edu/id/eprint/34367

Actions (login required)

View Item
View Item