Discovering discriminative and class-specific sequence and structural motifs in proteins

Meydan, Cem (2013) Discovering discriminative and class-specific sequence and structural motifs in proteins. [Thesis]

[img]PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: http://risc01.sabanciuniv.edu/record=b1534407 (Table of Contents)


Finding recurring motifs is an important problem in bioinformatics. Such motifs can be used for any number of problems including sequence classi cation, label prediction, knowledge discovery and biological engineering of proteins t for a speci c purpose. Our motivation is to create a better foundation for the research and development of novel motif mining and machine learning methods that can extract class-speci c and discriminative motifs using both sequence and structural features. We propose the building blocks of a general machine learning framework to act on a biological input. This thesis present a combination of elements that are aimed to be applicable to a variety of biological problems. Ideally, the learner should only require a number of biological data instances as input that are classi- ed into a number of di erent classes as de ned by the researchers. The output should be the factors and motifs that discriminate between those classes (for reasonable, non-random class de nitions). This ideal work ow requires two main steps. First step is the representation of the biological input with features that contain the signi cant information the researcher is looking for. Due to the complexity of the macromolecules, abstract representations are required to convert the real world representation into quanti able descriptors that are suitable for motif mining and machine learning. The second step of the proposed work ow is the motif mining and knowledge discovery step. Using these informative representations, an algorithm should be able to nd discriminative, class-speci c motifs that are over-represented in one class and under-represented in the other. This thesis presents novel procedures for representation of the proteins to be used in a variety of machine learning algorithms, and two separate motif mining algorithms, one based on temporal motif mining, and the other on deep learning, that can work with the given biological data. The descriptors and the learners are applied to a wide range of computational problems encountered in life sciences.

Item Type:Thesis
Uncontrolled Keywords:Bioinformatics. -- Machine learning. -- Deep learning. -- Biyoenformatik. -- Makine öğrenimi. -- Derin öğrenim.
Subjects:T Technology > TA Engineering (General). Civil engineering (General) > TA164 Bioengineering
ID Code:31346
Deposited By:IC-Cataloging
Deposited On:15 May 2017 16:06
Last Modified:15 May 2017 16:08

Repository Staff Only: item control page