Zero/Few-Shot Dark Kinase-Phosphosite Associatıon Predictıon With Biologically Grounded Data Augmentation

Pekey, Mert (2025) Zero/Few-Shot Dark Kinase-Phosphosite Associatıon Predictıon With Biologically Grounded Data Augmentation. [Thesis]

PDF
10733682.pdf

Download (1MB)

Abstract

Protein phosphorylation, a fundamental cellular process mediated by kinases, iscrucial for signaling, and its dysregulation is implicated in numerous human diseases.A significant challenge persists in identifying substrate phosphosites for the vastnumber of understudied ’dark’ kinases, for which conventional supervised machinelearning methods are ineffective due to data scarcity. To address this gap, thisthesis develops a zero- and few-shot learning framework and introduces biologicallygrounded data augmentation strategies, all evaluated on the DARKIN benchmark.We introduce two novel deep learning architectures: DARKIN-FT, a compatibilitybasedmodel that enhances performance through end-to-end fine-tuning of phosphositeencoder, and DARKIN-Interact, a binary classification model that directlycaptures kinase–substrate interactions via joint attention over sequence pairs. Thecentral contribution is a systematic investigation into biologically grounded dataaugmentation, evaluating three distinct strategies: (i) kinase-conditional phosphositegeneration via a fine-tuned ProGen2 model, (ii) weak supervision using predictionsfrom the Kinase Substrate Specificity Atlas (KSSA), and (iii) augmentationwith homologous sequences. Our results demonstrate that DARKIN-FT and DARKIN-Interact significantly outperformexisting baselines on the DARKIN benchmark. The investigation intodata augmentation yielded mixed results: while kinase conditional generation withProGen2 and weak labeling with KSSA degraded the performance, augmentationwith homologous sequences improved the Macro Average Precision of the DARKINInteractmodel. While the results are promising, challenges persist in disambiguatingkinases with high sequence similarity.Overall, this thesis establishes a framework for kinase–phosphosite interaction predictionin low-data regimes and provides valuable insights into the strengths andlimitations of data augmentation in the dark kinase-phosphosite association task.
Item Type: Thesis
Uncontrolled Keywords: Protein Sequence Classification, Zero/Shot Learning, Phosphorylation,Dark Kinases, Post-translational Modifications, Conditional Generative Models,Data Augmentation. -- Protein Dizisi Sınıflandırması, Sıfır/ Az Örnekli Öğrenme,Fosforilasyon, Karanlık Kinazlar, Translasyon Sonrası Modifikasyonlar, KoşulluÜretici Modeller, Veri Artırma.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng.
Faculty of Engineering and Natural Sciences
Depositing User: Dila Günay
Date Deposited: 15 Jan 2026 16:26
Last Modified: 15 Jan 2026 16:26
URI: https://research.sabanciuniv.edu/id/eprint/53625

Actions (login required)

View Item
View Item