Have you ever tried to piece together a jigsaw puzzle without knowing what the end result should look like?
This is the same problem that some genomic researchers face when they try to map sequence scaffolds from next-generation DNA sequencing data onto a chromosome. These chromosome assemblies can provide information on genome organization and structural variation, providing some insight into evolutionary history. To piece together the chromosomes, scientists rely on physical or genetic maps, but for many species, such guides do not exist.
“We designed a method to improve the newly sequenced genome using comparative genomics to further assemble it in the context of evolution,” explained Jian Ma, assistant professor of bioengineering at the University of Illinois at Urbana-Champaign and author of a new study about the algorithm in the Proceedings of the National Academy of Sciences (1). “You can predict the adjacency configuration of scaffolds from the new genome based on what’s going on in the rest of the genomes that are closely related,” Ma said.
With RACA and validation assistance from scientists at the Beijing Genomics Institute (BGI), Ma’s team predicted the appropriate chromosome fragment assemblies for the Tibetan antelope. To do so, the group reconstructed 60 of the antelope’s chromosome fragments out of 1434 sequence scaffolds produced by the BGI’s SOAPdenovo assembly program. Sixteen of the chromosome fragments were similar to chromosome fragments in cattle.
“After running the program, the genome quality was significantly improved,” explained Ma. “We had much fewer chromosome fragments, long continuity, and could compare with other species to do the analysis. We could also correct potential assembly errors during that process.”
One of the team’s main challenges was finding ways to thoroughly evaluate their results and benchmark their tools. To do so, the group compared the RACA results with both simulated genome assemblies and real genome assemblies from the 2012 genome assembly gold-standard evaluations (GAGE) study performed at Johns Hopkins University.
“We essentially used the data in that [GAGE] study because they have the read data and the true answers, so that you could actually benchmark your tools,” said Ma. “We looked at the results from various assemblers they used in their study and we showed that we could improve those results.”
For now, Ma says the technology could immediately help scientists in genome research efforts such as Genome 10K, a project whereby scientists hope to sequence the genome of 10,000 vertebrate species within the next several years. “Most [genome researchers] are using NGS technology, and so we think that this method can be used to systematically improve the genomes of these new species as part of the genome campaign,” said Ma.
- Kim, J., D. M. Larkin, Q. Cai, Asan, Y. Zhang, R.-L. Ge, L. Auvil, B. Capitanu, G. Zhang, H. A. Lewin, and J. Ma. 2013. Reference-assisted chromosome assembly. Proceedings of the National Academy of Sciences (January).