Finally, new technology has allowed researchers to complete the human genome sequence – apart from for the pesky Y chromosome.
The first draft of the human genome sequence was published in 2001, revolutionizing genomics. Since then, technological advances have improved its resolution. However, the most recent version, GRCh38.p13 – used as a reference since 2013 and patched in 2019 – has still been missing 8% of the sequence owing to difficulties sequencing heterochromatin and other complicated sections.
Now, a new (as-yet un-peer-reviewed) preprint from the Telomere-to-Telomere (T2T) Consortium has announced the addition of 200 million DNA base pairs and 115 protein-coding genes to the human genome sequence, bringing the totals up to 3.05 billion base pairs (a 4.5% increase) and 19,969 protein-coding genes (a 0.4% increase), and correcting some errors.
The T2T Consortium was set up by corresponding authors Karen Miga (University of California Santa Cruz, CA, USA), Adam Phillippy (National Human Genome Research Institute, MD, USA) and Evan E Eichler (University of Washington School of Medicine, WA, USA), following their intrigue over the “unmappable” centromere regions.
The new classification system – likened to Dmitri Mendeleev’s ubiquitously known periodic table – has recently been published in Science. The system classifies the 3D genomic structure of organisms across the tree of life.
The new sequence – named T2T-CHM13 – was made possible through combining the advantages of new, competing long-read technologies from Pacific Biosciences (CA, USA) and Oxford Nanopore (UK), which increased the length of DNA accurately scannable at one time from a few hundred base pairs to 20,000. The longer pieces are – like a child’s jigsaw with larger pieces – much easier to put together correctly.
The team used a cell line derived from a hydatidiform mole – the result of the insemination of an egg with no nucleus – meaning they could avoid the problem of needing to distinguish chromosomes from two different people. However, the sperm cell used carried an X chromosome, meaning the new sequence does not cover the Y chromosome.
The team also estimates that owing to challenges such as those intrinsic to passaged cell lines and some problematic areas of the genome where quality checks were difficult, approximately 0.3% of the T2T-CHM13 sequence could contain errors.
The team is now working on the Y chromosome and also plans to sequence a genome containing chromosomes from two parents. The T2T Consortium has also teamed up with the Human Pangenome Reference Consortium in a bid to sequence over 300 genomes from across the globe over the next 3 years.