to BioTechniques free email alert service to receive content updates.
Putting Genome Interpretation to the Test

Ashley Yeager

How well do methods for interpreting genome variation work? Ashley Yeager takes a look at a community experiment that is trying to assess just how useful genome interpretation tools are in real-world situations.

At the American Society of Human Genetics (ASHG) conference in November 2012 in San Francisco, CA, Steven Brenner, a computational geneticist from the University of California, Berkley, stood up in front of an audience and argued that it was unlikely that a single genome interpretation tool could identify variants for an array of illnesses or phenotypic traits (1). Instead, interpretation methods would likely need to be gene-specific or tailored for precise applications.

The predictors, assessors, and observers who participated in CAGI 2011, which was held in San Francisco, CA. Source: CAGI

This figure shows the ROC curves for the prediction of patients with Crohn's disease against the result of 1,000 random predictions, which are shown in gray. Source: CAGI

Steven Brenner helped develop CAGI to determine how well genome interpretation tools could translate to the clinic. Source: UC Berkeley

John Moult, one of the organizers of CAGI, says the challenges are giving scientists a better sense of the genome interpretation tools that currently exist. Source: University of Maryland

Brenner came to that conclusion after looking over the results of the Critical Assessment of Genome Interpretation (CAGI), a community experiment that challenges researchers to computationally predict the phenotypes of genetic variants. The teams then compare their results with unpublished experimental data, showing researchers and clinicians which tools can most accurately interpret large amounts of genomic sequence variation data and which ones might be reliable enough to use in the clinic. The results from the first two rounds of challenges have been clear for Brenner: most genomic interpretation tools are not reliable enough for the clinic yet.

After his talk at ASHG, several clinicians came up to him and expressed their concerns. Many had been using genome interpretation tools more generally, possibly making their conclusions less reliable. “General methods are limited in how well they will perform, which is not what people assumed before,” he said. “What that reaction showed me was that CAGI has a broad set of people that derive value from the experiment’s findings.”

Increasing Confidence

Brenner and John Moult, a computational biologist at the University of Maryland in Rockville, MD, organized the first CAGI experiment in 2010. It was a pilot project to get a better sense of the tools researchers in the community were using to study human genome variation and the phenotypic predictions coming from them. “Coming into CAGI, we had no understanding of how well methods for interpreting genome variation worked,” Brenner said. “Now, we’re starting to get a hint of what the big picture is.”

The goal was to provide a better sense of the correct level of confidence scientists and clinicians should have in the methods to predict the phenotype of sequence variants that are out there right now. “There’s a lot of uncertainty about how these methods work on real problems and so the challenges address the question of how can we test them in real-world situations,” Moult said.

In the beginning, Brenner and Moult had little idea of what to expect. The first year of the experiment was supposed to be very small, a pilot to see who would participate and what tools actually existed. In the end, the 2010 challenges drew more than 100 prediction submissions from eight countries, exceeding the organizers’ expectations.

Forty of the participants traveled to Berkeley in December 2010 to review the results. The top prize was awarded to Yana Bromberg, a bioinformatician at Rutgers University in New Jersey, for her work on interpretation software called screening for non-acceptable polymorphisms, or SNAP for short, which evaluates the effects of single amino acid substitutions on protein function (2). It was the first time Moult and Brenner had heard of SNAP.

In 2011, teams worked on 11 challenges, resulting in 117 predictions from 21 groups representing 18 countries. The challenges expanded, including exercises on exome variation and breast cancer gene variation. Again, SNAP was often one of the best interpretation tools, ranking high on several of the challenges.

One of the challenges in the second year of the experiment asked variation predictors to analyze exome sequence data from 42 Crohn's disease patients and 6 healthy individuals. Researchers didn’t know how many of the exomes had variations associated with the disease, but many of the tools predicted the disease in patients significantly better than random. The best performing teams used an unexpected approach, looking at rare variants on a large panel of genes (1).

“The Crohn's results were so great, we wonder if they were an artifact," Brenner says, explaining that the CAGI organizers have included the challenge again in this year’s experiment to verify the results. If the results hold, "it could be a huge breakthrough there in interpreting genetic variation under certain circumstances,” he said.

The first year results were significant in a statistical sense, but the second year, Brenner said, “really gave us a baseline for better understanding personal genome variations and also started to show which types of interpretation methods might be best for specific applications.”

Nowhere Near

In 2013, the experiment had 10 challenges, which included a test that focused on genetic and phenotypic variation in breast cancer as well as the tried-and-true test to predict individuals' phenotypic traits based on their genomes. The information for the personal genome analysis comes from the Personal Genome Project (PGP). “It acts as a valuable resource for diagnostics evaluations and standardization testing like CAGI,” Harvard molecular geneticist George Church said in an email, adding that the PGP has been providing data to CAGI since its first year.

But this year, there’s a change to the personal genome challenge. For the past two years, participants used the data to predict individual phenotypic traits based on a genome. But phenotypic profiles of all PGP participants are now public. “The availability of the complete profiles makes it impossible to have a valid assessment of individual trait predictions,” Brenner explains.

So instead of predicting the phenotype based on a single genome, in the 2012 challenge, the participants will develop tools that play a “matching game.” The goal will be to match 77 genomes with their corresponding phenotypic profiles, each of which includes 239 traits such as high cholesterol, diabetes, and astigmatism. And to spice things up, the organizers have included 214 phenotypic profiles that do not match any of the 77 genomes.

Ultimately, the CAGI predictors will release the PGP challenge results to those who volunteered their genomes so the individuals can learn more about their genetic susceptibilities for disease. But the reliability of the results is not necessarily high yet, Brenner cautions, so it’s important that individuals, scientists, and clinicians take that into account if someone shows a predicted high risk for cancer or other serious illnesses.

“We are nowhere near having a method for genome interpretation where a doctor could use it and then go and give surgery based on what we are saying,” Moult says. He and Brenner hope CAGI is a first step toward getting there one day.


  1. CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction. (2012). American Society for Human Genetics Conference: Poster.
  1. Bromberg, Y. and B. Rost. 2007. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Research 35: 3823-3835.