When Peter Robinson analyzed his first DNA sequence 15 years ago, the entire process took about a week, and he performed the last step of the analysis using pencil and paper. Now, at the Institute for Medical Genetics and Human Genetics in Berlin, Germany, he has access to not one, but two next-generation sequencers, which allow him to analyze millions of sequences in just a few days. “We’ve seen that in the approximately two years that we’ve been using this technology; it’s gone from being cutting-edge, Star Trek-type of technology to being something that we think we command enough to start to offer in clinical services.”
Exome sequencing represents an efficient strategy for searching for alleles underlying rare Mendelian disorders because most of these alleles disrupt protein-coding sequences, and a large fraction of rare, protein-altering variants are predicted to be harmful or otherwise impact protein function. “The collection of all of these exons that are coding for proteins are, as far as we know now, the most important bits in our genome for genetic disease,” says Robinson.
Although exome sequencing can also be applied to complex common diseases, a larger sample size would likely be necessary because of the greater genetic heterogeneity seen in these diseases. Furthermore, this technique would only identify variability that affects disease risk, and the analysis required for understanding the cumulative risk imparted by these alleles would be complicated. “One variant could be modified by 199 other variants in the genome, and we don’t really know how to add these up,” says Robinson. “There is essentially a lot of research that’s going to have to go on in order to get something that’s clinically usable.”
Using exome sequencing, scientists have identified novel mutations involved in a range of inherited conditions, from Lou Gehrig’s disease to intellectual disability, as well as somatic mutations in diseases such as cancer. The first analysis of an individual’s exome, spearheaded by the J. Craig Venter Institute, was published in PLoS Genetics in 2008 (1). In another study published in Nature the next year, a team of scientists demonstrated that it was possible to identify candidate genes for a rare, dominantly inherited disorder known as Freeman-Sheldon syndrome by sequencing the exomes of only four unrelated individuals (2).
Last year, Nicholas Hayward, an oncogenomicist at the Queensland Institute of Medical Research in Australia, reported in Nature Genetics the discovery of frequent somatic mutations in two genes involved in metastatic melanoma (3). After sequencing eight melanoma exomes, his team found that nearly a quarter of melanoma cell lines have mutations in the protein-coding regions of either MAP3K5 or MAP3K9. “These kinases were not known to be mutated at a high frequency in melanoma before,” says Hayward. “This really informed us about a pathway that was previously underappreciated in melanoma.”
Studies such as Hayward’s illustrate the clinical relevance of exome sequencing, says Anne Bowcock, a geneticist at Washington University School of Medicine who was not involved in the study. “If we can find the genetic cause, it would give us an indication of the altered pathway, and there may already be drugs one could use to target that disease,” she says. “It’s going to revolutionize medicine.”
In addition, exome sequencing could be used as a clinical diagnostic tool. However, one major hurdle is that many health care facilities lack the resources necessary to invest in expensive equipment and hire relevant experts, including bioinformaticians. “Bioinformatics has not been prominent in medicine until very recently, and it’s just starting to be recognized how important it is going to be to train bioinformaticians to actually help interpret medical data,” says Robinson.
In particular, a strong need for improved databases that contain more medically relevant data remains. Databases of human genetic variations are used to filter out common variants that are unlikely to be the cause of rare Mendelian disorders. Clinical evaluations would be more efficient if there were a system that could reliably and quickly integrate patient data with information from various databases. “We can’t take two days to look through the genome. It has to be quick, because there’s not going to be more than two hours per patient,” says Robinson. “We’re really far away from that right now.”
Several companies, such as Agilent and Illumina, offer kits for the selective enrichment of sequences from the exome. In principle, these kits use a method that resembles the direct genomic selection approach described by Bowcock and her colleague Michael Lovett, also a geneticist at Washington University, in Nature Methods in 2005 (4).
Despite having helped to develop direct genomic selection for quickly and selectively enriching specific sequences across large genomic regions, Bowcock acknowledges that exome sequencing has its limitations. For example, this method does not optimally detect copy number variants, chromosomal rearrangements, and repeat mutations. Moreover, coverage may not be uniform across the exome, and about 10% of the exome can be insufficiently covered or missed altogether. At the time, the developers “knew it was a stopgap measure before sequencing genomes was cheap,” Bowcock explains. “Once genomes get to maybe under $3000–5000, I think we’ll start switching over [to whole-genome sequencing].”
Technological advances may soon bring the cost of sequencing an individual’s whole genome down to $1000, which would enable whole-genome sequencing of very large numbers of patients. Sequencing the entire genome would allow scientists to examine the relationship between variation in non-coding regions and disease, which may help to overcome potential pitfalls associated with exome sequencing. “I’m optimistic that the majority of research labs will convert to whole-genome analysis probably within as little as two years from now, and we might find that the clinical applications would be only a couple of years after that,” says Hayward.
But whole-genome sequencing will be accompanied by another slew of problems, including expensive data storage and a vast amount of data to analyze. Currently, few clinical sequencing centers are equipped to store and analyze the large amounts of data from whole-genome sequencing. In the end, this increased data load translates to longer waits for patients. Right now, several months could go by before a diagnosis can be made. “For cancer patients, currently that would be too long. We need to bring this whole timeline back to a matter of a couple of weeks,” says Hayward.
But Hayward sees a bright future ahead. “I think that it is feasible with the new technologies because they’re becoming faster and faster, and there is already discussion that within a year from now it might be possible to sequence an entire genome in a day,” he says.
- Ng, P. C., S. Levy, J. Huang, T. B. Stockwell, B. P. Walenz, K. Li, N. Axelrod, D. A. Busam, R. L. Strausberg, and J. C. Venter. 2008. Genetic variation in an individual human exome. PLoS Genet 4: e1000160.
- Ng, S. B., E. H. Turner, P. D. Robertson, S. D. Flygare, A. W. Bigham, C. Lee, T. Shaffer, M. Wong, A. Bhattacharjee, E. E. Eichler, M. Bamshad, D. A. Nickerson, and J. Shendure. 2009. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461: 272-276.
- Stark, M. S., S. L .Woods, M. G. Gartside, V. F. Bonazzi, K. Dutton-Regester, L. G. Aoude, D. Chow, C. Sereduk, N. M. Niemi, N. Tang, J. J Ellis, J. Reid, V. Zismann, S. Tyagi, D. Muzny, I. Newsham, Y. Wu, J. M. Palmer, T. Pollak, et. al. 2011. Frequent somatic mutations in MAP3K5 and MAP3K9 in metastatic melanoma identified by exome sequencing. Nature Genetics 44:165-169.
- Bashiardes, S., R. Veile, C. Helms, E. R. Mardis, A. M. Bowcock, and M. Lovett. 2005. Direct genomic selection. Nature Methods 2: 63-69.