Aydın, Zafer and Altunbaşak, Yücel and Pakatcı, Kemal İsa and Erdoğan, Hakan (2007) Training set reduction methods for protein secondary structure prediction in single-sequence condition. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2007, EMBS 2007., Lyon, France
PDF
embc_2007_ieee_compatible.pdf
Restricted to Registered users only
Download (75kB) | Request a copy
embc_2007_ieee_compatible.pdf
Restricted to Registered users only
Download (75kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/IEMBS.2007.4353469
Abstract
Orphan proteins are characterized by the lack of
significant sequence similarity to database proteins. To infer the
functional properties of the orphans, more elaborate techniques
that utilize structural information are required. In this regard,
the protein structure prediction gains considerable importance.
Secondary structure prediction algorithms designed for orphan
proteins (also known as single-sequence algorithms) cannot
utilize multiple alignments or alignment profiles, which are
derived from similar proteins. This is a limiting factor for the
prediction accuracy. One way to improve the performance of
a single-sequence algorithm is to perform re-training. In this
approach, first, the models used by the algorithm are trained
by a representative set of proteins and a secondary structure
prediction is computed. Then, using a distance measure, the
original training set is refined by removing proteins that are
dissimilar to the given protein. This step is followed by the
re-estimation of the model parameters and the prediction of
the secondary structure. In this paper, we compare training set
reduction methods that are used to re-train the hidden semi-
Markov models employed by the IPSSP algorithm [1].We found
that the composition based reduction method has the highest
performance compared to the alignment based and the Chou-
Fasman based reduction methods. In addition, threshold-based
reduction performed better than the reduction technique that
selects the first 80% of the dataset proteins.
Item Type: | Papers in Conference Proceedings |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering and Natural Sciences |
Depositing User: | Hakan Erdoğan |
Date Deposited: | 31 Oct 2007 20:01 |
Last Modified: | 26 Apr 2022 08:43 |
URI: | https://research.sabanciuniv.edu/id/eprint/6906 |