Discovering coding LNCRNAS using deep learning training dynamics

Nabi, Afshan (2021) Discovering coding LNCRNAS using deep learning training dynamics. [Thesis]

[thumbnail of 10412079.pdf] PDF
10412079.pdf

Download (2MB)

Abstract

Long non-coding RNAs (lncRNAs) are the largest class of non-coding RNAs (ncRNAs). However, recent experimental evidence has shown that some lncRNAs contain small open reading frames (sORFs) that are translated into functional micropeptides. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (riboseq) experiments, which are expensive and cell-type dependent. In addition, while very accurate machine learning models have been trained to distinguish between coding and non-coding sequences, little attention has been paid to the increasing evidence about the incorrect ground-truth labels of some lncRNAs in the underlying training datasets. We present a framework that leverages deep learning models’ training dynamics to determine whether a given lncRNA transcript is misannotated. Our models achieve AUC scores > 91% and AUPR > 93% in classifying non-coding vs. coding sequences while allowing us to identify possible misannotated lncRNAs present in the dataset. Our results overlap significantly with a set of experimentally validated misannotated lncRNAs as well as with coding sORFs within lncRNAs found by a ribo-seq dataset. The general framework applied here offers promising potential for use in curating datasets used for training coding potential predictors and assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs.
Item Type: Thesis
Uncontrolled Keywords: Noncoding RNA, coding lncRNAs. -- Deep learning. -- Training dynamics. -- kodlamayan RNA. -- kodlayan lncRNA. -- Derin Ögrenme. -- Egitim Dinamikler.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 20 Oct 2021 15:10
Last Modified: 26 Apr 2022 10:39
URI: https://research.sabanciuniv.edu/id/eprint/42505

Actions (login required)

View Item
View Item