Advanced Search

Submit Manuscript

Volume 34, No 12, Dec 2024

ISSN: 1001-0602 
EISSN: 1748-7838 2018 
impact factor 17.848* 
(Clarivate Analytics, 2019)

Volume 34 Issue 12, December 2024: 830-845   |  Open Access

ORIGINAL ARTICLES

GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model

Xiaodong Yang1,2,3,† , Guole Liu4,5,† , Guihai Feng1,6,7,† , Dechao Bu2,3,8,† , Pengfei Wang3,9,† , Jie Jiang10,† , Shubai Chen2,3,† , Qinmeng Yang9,† , Hefan Miao1,3 , Yiyang Zhang3,11 , Zhenpeng Man3,11 , Zhongming Liang3,11 , Zichen Wang4,5 , Yaning Li2,3 , Zheng Li9 , Yana Liu1 , Yao Tian1,3 , Wenhao Liu1 , Cong Li1,3 , Ao Li4,5 , Jingxi Dong1 , Zhilong Hu3,9 , Chen Fang1,3 , Lina Cui1,3 , Zixu Deng2,3 , Haiping Jiang1,3 , Wentao Cui3,9 , Jiahao Zhang3,11 , Zhaohui Yang2,3,8 , Handong Li5,10 , Xingjian He10 , Liqun Zhong4,5 , Jiaheng Zhou4,5 , Zijian Wang9 , Qingqing Long9 , Ping Xu3,9 , The X-Compass Consortium13 , Hongmei Wang1,6,7 , Zhen Meng3,9 , Xuezhi Wang3,9 , Yangang Wang3,9 , Yong Wang3,11 , Shihua Zhang3,11 , Jingtao Guo1,3,6,7 , Yi Zhao2,3,8,* , Yuanchun Zhou3,9,* , Fei Li3,9,* , Jing Liu5,10,* , Yiqiang Chen2,3 , Ge Yang4,5 , Xin Li1,3,6,7,*

1State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
2Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
3University of Chinese Academy of Sciences, Beijing, China
4State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
5School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
6Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Sciences, Beijing, China
7Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China
8Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
9Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
10Institute of Automation, Chinese Academy of Sciences, Beijing, China
11CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
These authors contributed equally: Xiaodong Yang, Guole Liu, Guihai Feng, Dechao Bu, Pengfei Wang, Jie Jiang, Shubai Chen, Qinmeng Yang
13A list of authors and their affiliations appears at the end of the paper
Correspondence: Yi Zhao(biozy@ict.ac.cn)Yuanchun Zhou(zyc@cnic.cn)Fei Li(lifei@cnic.cn)Jing Liu(jliu@nlpr.ia.ac.cn)Xin Li(xinli@ioz.ac.cn)

Deciphering universal gene regulatory mechanisms in diverse organisms holds great potential for advancing our knowledge of fundamental life processes and facilitating clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms and does not integrate various cell types across species. Recent breakthroughs in single-cell sequencing and deep learning techniques present an unprecedented opportunity to address this challenge. In this study, we built an extensive dataset of over 120 million human and mouse single-cell transcriptomes. After data preprocessing, we obtained 101,768,420 single-cell transcriptomes and developed a knowledge-informed cross-species foundation model, named GeneCompass. During pre-training, GeneCompass effectively integrated four types of prior biological knowledge to enhance our understanding of gene regulatory mechanisms in a self-supervised manner. By fine-tuning for multiple downstream tasks, GeneCompass outperformed state-of-the-art models in diverse applications for a single species and unlocked new realms of cross-species biological investigations. We also employed GeneCompass to search for key factors associated with cell fate transition and showed that the predicted candidate genes could successfully induce the differentiation of human embryonic stem cells into the gonadal fate. Overall, GeneCompass demonstrates the advantages of using artificial intelligence technology to decipher universal gene regulatory mechanisms and shows tremendous potential for accelerating the discovery of critical cell fate regulators and candidate drug targets.


https://doi.org/10.1038/s41422-024-01034-y

FULL TEXT | PDF

Browse 365