Işık, Zeynep (2025) Zero- And Few-Shot Dark Kinase–Phosphosite Prediction Via Task-Aware Protein Embeddings. [Thesis]
10734796.pdf
Download (1MB)
Abstract
Accurately mapping kinases to their substrate phosphosites is fundamental fordecoding cellular signaling and understanding disease mechanisms. While highthroughputtechniques can identify the phosphosites, finding the kinase that catalyzesthe phosphorylation is challenging. Thus, over 95% of experimentally detectedhuman phosphosites lack kinase annotations. It is possible to formulatethe kinase-phosphosite association problem as a supervised multi-class classificationtask; however, a large portion of the human kinases are under-studied (darkkinases) and have few or no phosphosites associated with them, thus dark kinasesfall outside the reach of conventional supervised learning methods. In this thesis, weformulate kinase–phosphosite association as zero-shot and few-shot learning tasks:in the zero-shot setting, the model must predict associations for kinases never seenduring training; in the few-shot setting, it may leverage only a handful of labeledexamples.We employ transformer-based protein language models (pLMs) to embed both kinasedomains and phosphosite peptides, and we systematically explore domainadaptationstrategies—ranging from full fine-tuning and partial layer re-initializationto task-specific pre-training—under severe data constraints. Surprisingly, a denovo–trained ESM-1b model outperforms its fully fine-tuned pretrained counterpart,suggesting that general-purpose pLM embeddings may lack task-specific biochemicalcontext. Our best results are obtained by combining kinase- and phosphosite-aware pLMs with partial re-initialization of upper transformer layers. On the DARKINbenchmark, this approach delivers state-of-the-art performance in both zero-shotand few-shot kinase prediction, offering a promising direction for illuminating thedark phosphoproteome.
| Item Type: | Thesis |
|---|---|
| Uncontrolled Keywords: | Task Adaptation, Zero-shot Learning, Few-shot Learning,Transformers, Protein Language Models, Kinases, Phosphorylation. -- Göreve Uyarlama, Sıfır-Örnekli Öğrenme, Az-ÖrnekliÖğrenme, Dönüştürücüler, Protein Dil Modelleri, Kinazlar, Fosforilasyon. |
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware |
| Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
| Depositing User: | Dila Günay |
| Date Deposited: | 15 Jan 2026 16:34 |
| Last Modified: | 15 Jan 2026 16:34 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53626 |

