Active learning for drug blood-brain barrier permeability prediction

Salem, Ahmed Mohamed Mahmoud Elmoselhy (2024) Active learning for drug blood-brain barrier permeability prediction. [Thesis]

PDF
10692517.pdf
Download (11MB)

Abstract

The blood-brain barrier (BBB) is a highly selective, semipermeable border that regulatesthe transfer of chemicals between the circulatory and central nervous systems(CNS). Assessing whether a compound can permeate the BBB is critical in drugdevelopment for treating CNS disorders, as it determines the compound’s ability toreach targets within the brain. The chemical space is vast, and traditional methodsfor measuring a chemical compound’s BBB permeability are time-consuming andcostly. However, with the availability of open datasets for compounds with experimentallyverified permeability assessments, several machine learning (ML) modelshave been proposed to accelerate BBB permeability prediction. A large pool oflabeled examples is necessary for a machine learning model to learn BBB permeabilitystatus in a supervised setting. Yet, the size of labeled datasets remainsfar from comprehensive when compared to the immense chemical space, limitingthe effectiveness of traditional supervised passive learning procedures. The activelearning (AL) framework offers an alternative. Active learners iteratively achievehigh-accuracy classifiers with fewer label requests compared to passive learning bystrategically selecting which examples to label in each iteration. In this thesis,we explored various AL strategies for predicting the BBB permeability of chemicalcompounds and compared their effects on the performance of machine learning models.Specifically, we examined the following sampling strategies: random sampling,uncertainty-based sampling, and dissimilarity-based sampling. Additionally, we proposedand implemented two novel AL methods: explore-intensify and round-robincycle switching. We also performed a comparative analysis of all the AL methodsagainst passive learning in two separate setups: one based on a label-stratified splittingtechnique and another based on splitting the data by the molecular scaffolds ofthe chemical compounds, which is a more challenging evaluation setup. Our resultsshow that the scaffold-splitting setup resulted in lower performance compared to thelabel-stratified setup across both passive and active learning paradigms. Furthermore,our experiments revealed that the active learning approaches we implementedmatched the performance of passive learning in nearly every performance metric wetested, typically after labeling only 10-65% of the data, depending on the specificmetric. Moreover, the results of our proposed active learning methods demonstratedthat the round-robin cycle switching strategy outperformed other active learningstrategies in the stratified-split setup. This highlights the potential of dynamic ALmethods to efficiently reduce the need for large labeled datasets while maintaininghigh performance in predicting BBB permeability.

Item Type:	Thesis
Uncontrolled Keywords:	Active Learning, Dynamic Sampling, Scaffold Splitting, MolecularScaffolds, Blood-Brain Barrier, QSAR. -- Aktif Öğrenme, Dinamik Örnekleme, İskele Ayrımı, Molekülerİskeleler, Kan-Beyin Bariyeri.
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Divisions:	Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences
Depositing User:	Dila Günay
Date Deposited:	22 Apr 2025 10:04
Last Modified:	02 Jan 2026 00:01
URI:	https://research.sabanciuniv.edu/id/eprint/51774

Actions (login required)

: View Item