Advanced Search
Submit Manuscript Volume 35, No 10, Oct 2025
ISSN: 1001-0602
EISSN: 1748-7838 2018
impact factor 17.848*
(Clarivate Analytics, 2019)
Volume 35 Issue 10, October 2025: 750-761
AlphaCD: a machine learning model capable of highly accurate characterization for 21,335 cytidine deaminases
Kui Xu1,† , Guoying Hua1,† , Mingdi Wu1,† , Haihang Zhang1,† , Jingda Liu1 , Hu Feng1 , Erwei Zuo1,*
1State Key Laboratory of Genome and Multi-omics Technologies, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, ChinaThe vast scope but limited-supporting evidence in sequence databases hinders identification of proteins with specific functionality. Here, we experimentally characterized catalytic efficiency, target site window, motif preference, and off-target activity of 1100 apolipoprotein B mRNA-editing enzyme, catalytic polypeptide (APOBEC)-like family cytidine deaminases (CDs) fused with nCas9 in HEK293T cells, thereby generating the largest dataset of experimentally validated functions for a single protein family to date. These data, together with amino acid sequence, three-dimensional structure, and eight additional features, were used to construct a machine learning (ML) model, AlphaCD, which showed high accuracy in predicting catalytic efficiency (0.92) and off-target activity (0.84), as well as target windows (0.73) and catalytic motifs (0.78). We applied the trained model to predict the above catalytic features of 21,335 CDs in Uniprot, and subsampling of 28 CDs further validated its prediction accuracy (0.84, 0.87, 0.75, 0.73, respectively). Alanine scanning-based mutagenesis was then employed to reduce off-targets in one example CD, which produced a remarkably high fidelity, high efficiency cytosine base editor, thus demonstrating AlphaCD application in high-accuracy, high-throughput protein functional characterization, and providing a strategy for accelerated characterization of other proteins.
https://doi.org/10.1038/s41422-025-01164-x