to BioTechniques free email alert service to receive content updates.
Genome Alignment Gets Competitive

01/06/2012
Sarah C.P. Williams

After the success of last year’s Assemblathon, scientists have announced the Alignathon competition to compare genome alignment programs.  

Bookmark and Share

There’s nothing like a little competition to get computational biologists crunching numbers. A team of scientists at the University of California, Santa Cruz, has begun accepting submissions for their Alignathon contest, which pits genome alignment methods and labs against each other. The goal: a better understanding of the strengths and weaknesses of each alignment program, what makes a program successful, and which—if any—program comes out ahead. The results will help researchers design the best way to deal with the flood of data expected to come from the Genome 10K Project, the international effort that’s under way to sequence the genomes of 10,000 vertebrates.

One set of genomes that competitors in the Alignathon must align is a set of 12 fly genomes, shown here on a phylogenetic tree. Source: University of California, Santa Cruz







“Imagine that we can reach our goal of sequencing 10,000 vertebrate genomes,” said Benedict Paten, an organizer of the Alignathon and postdoctoral research in the Santa Cruz lab of computational biologist David Haussler. “Well, then the problem becomes working out how those 10,000 genomes are related. And the first step in understanding that is to line up all the genomes and work out which sections have similarities.”

But there’s no single computational method that’s accepted as the best way to find similarities between genomes of different species. That’s where the Alignathon comes in. Each lab that enters the competition will use their preferred algorithm to align three sets of genomes provided by the competition organizers. Two sets will be simulated data created for the competition, four primate-like genomes in one set and five mammal-like genomes in another. A third set will consist of real data from 12 fly genomes.

In December 2010, Haussler’s lab launched the Assemblathon competition, designed to compare methods of assembling full genomes from the short segments of genetic information produced by genetic sequencing technologies. The results of the initial competition were published in the journal Genome Research in September (1). Seventeen teams from seven countries participated, submitting a total of 62 different assemblies.

“What we found in the Assemblathon is that there was a huge amount of variety between different assembly programs and different groups,” said Paten. “Two groups could essentially run the same program and get different results.”

No clear winner emerged from the Assemblathon. Some programs ranked better at assembling genomes at a high order but made mistakes in single base pairs; others had few base-pair errors but more errors in the larger organization of a genome. But a winner is not the goal of either the Assemblathon or the Alignathon, said Paten. The goal is to have a benchmark against which current and future methods can be compared. Paten said he expects the same variety to come from the Alignathon.

“Just as the assembly problems are, in a mathematical sense, hard, the alignment problems are also very hard,” he said. “Genomes are subject to changes at all kinds of levels, from single nucleotide changes to small insertions and deletions to copy number changes or large rearrangements. An alignment program has to be able to take into account all of these different possible mechanisms for change.”

Because no single lab possesses the resources to compare each alignment method on their own, Haussler’s lab hopes the competition will speed the comparison process at the same time as encouraging a spirit of collaboration. They’re expecting around 10 labs to participate in the initial Alignathon, and future competitions could focus on other questions and include more labs. In addition, a second Assemblathon is in the works. While the first Assemblathon relied on simulated data, the second-generation competition will test methods on real data.

“And there are other future competitions you can imagine,” said Paten. “Competitions to assess not just alignment techniques but ways of reconstructing an evolutionary history from those alignments.” For those participating in the Genome 10K Project, the competitions not only provide avenues with which to analyze data, but help keep methods fresh, spirits high, and collaboration alive.

References

  1. Earl, D., K. Bradnam, J. St John, A. Darling, D. Lin, J. Fass, H.O. Yu, V. Buffalo, D.R. Zerbino, et al. 2011. Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res. 21:2224-41.