Assessing predictions on fitness effects of missense variants in HMBS in CAGI6

Zhang, Jing and Kinch, Lisa and Katsonis, Panagiotis and Lichtarge, Olivier and Jagota, Milind and Song, Yun S. and Sun, Yuanfei and Shen, Yang and Kuru, Nurdan and Dereli, Onur and Adebali, Ogün and Alladin, Muttaqi Ahmad and Pal, Debnath and Capriotti, Emidio and Turina, Maria Paola and Savojardo, Castrense and Martelli, Pier Luigi and Babbi, Giulia and Casadio, Rita and Pucci, Fabrizio and Rooman, Marianne and Cia, Gabriel and Tsishyn, Matsvei and Strokach, Alexey and Hu, Zhiqiang and van Loggerenberg, Warren and Roth, Frederick P. and Radivojac, Predrag and Brenner, Steven E. and Cong, Qian and Grishin, Nick V. (2024) Assessing predictions on fitness effects of missense variants in HMBS in CAGI6. Human Genetics . ISSN 0340-6717 (Print) 1432-1203 (Online) Published Online First https://dx.doi.org/10.1007/s00439-024-02680-3

Full text not available from this repository. (Request a copy)

Abstract

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall’s tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
Item Type: Article
Divisions: Faculty of Engineering and Natural Sciences
Depositing User: Onur Dereli
Date Deposited: 20 Aug 2024 15:46
Last Modified: 20 Aug 2024 15:46
URI: https://research.sabanciuniv.edu/id/eprint/49816

Actions (login required)

View Item
View Item