Advanced Search
Submit Manuscript Volume 33, No 10, Oct 2023
ISSN: 1001-0602
EISSN: 1748-7838 2018
impact factor 17.848*
(Clarivate Analytics, 2019)
Volume 33 Issue 10, October 2023: 745-761 |
The complete and fully-phased diploid genome of a male Han Chinese
Chentao Yang1,2,3,† , Yang Zhou3,4,† , Yanni Song5,† , Dongya Wu1,2,6,7,† , Yan Zeng3,† , Lei Nie3 , Panhong Liu3 , Shilong Zhang8 , Guangji Chen3,9 , Jinjin Xu3 , Hongling Zhou5 , Long Zhou2,6,10 , Xiaobo Qian3,9 , Chenlu Liu11 , Shangjin Tan3 , Chengran Zhou3 , Wei Dai3 , Mengyang Xu3,12 , Yanwei Qi12 , Xiaobo Wang5 , Lidong Guo9,12 , Guangyi Fan12 , Aijun Wang12 , Yuan Deng3 , Yong Zhang3 , Jiazheng Jin3 , Yunqiu He1,2 , Chunxue Guo3,13 , Guoji Guo14 , Qing Zhou6,11 , Xun Xu3 , Huanming Yang3 , Jian Wang3 , Shuhua Xu15,16,17,18,19 , Yafei Mao8 , Xin Jin3 , Jue Ruan5,* , Guojie Zhang1,2,6,10,20,*
1Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, ChinaSince the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
https://doi.org/10.1038/s41422-023-00849-5