A new study suggests that 20% of the previously accepted human genome has been misinterpreted and actually consists of non-coding DNA.
Recent research led by Michael Tress of the Spanish National Cancer Research Centre (Madrid, Spain), has discovered that up to 4234 genes previously characterized as exons, could actually consist of non-coding DNA. The research, which attempts to iron out the discrepancies in the three main reference human proteomes, has underlined the uncertainties in the actual number of genes that generate the human proteome and has important implications in biomedicine.
Comparing the reference proteomes from GENCODE/Ensemble, UniProt and RefSeq, the team compiled a list of genes that didn’t appear in all three references. This yielded 2764 genes that were examined and found to display characteristics indicative of introns or pseudogenes.
Further investigation uncovered a collection of 1470 coding genes that were present in all three references but could also be identifiable as non-coding DNA. So far, the study is proving accurate, “we have been able to analyze many of these genes in detail and more than 300 genes have already been reclassified as non-coding,” stated Tress.
15 years after the sequencing of the human genome, this research has made it clear that we are still a long way from finalizing a definitive human proteome. “Our evidence suggests that humans may only have 19,000 coding genes, but we still do not know which 19,000 genes are,” remarked first author, Federico Abascal of the Wellcome Trust Sanger Institute (Cambridge, UK).
Without this conclusive proteome potential advances in biomedicine may be limited as the number of protein-producing genes, their function and identification, are vital to the study of diseases.
Furthermore, the study could help improve research in many more areas of biosciences, preventing researchers from using inaccurate information that incorrectly identifies non-coding DNA as a functional, expressive gene as has clearly occurred in the past.
"Surprisingly, some of these unusual genes have been well studied and have more than 100 scientific publications based on the assumption that the gene produces a protein," observed David Juan from Pompeu Fabra University (Barcelona, Spain), highlighting the importance of the findings.
While a lot of uncertainty remains, according to the study, the final figure could lie in a range of 2000 genes either side of the currently accepted figure. This study provides another step towards a definitive human proteome and the establishment of concrete information from which future research can benefit.
Written By Tristan Free
Updated 29 April, 2019
Source Abascal F, Juan D, Tress ML et al. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res. doi:10.1093/nar/gky587 (2018) https://academic.oup.com/nar/article/46/14/7070/5047265 https://www.cnio.es/ing/publicaciones/an-international-team-led-by-the-cnio-reveals-that-human-genome-could-contain-up-to-20-fewer-genes