to BioTechniques free email alert service to receive content updates.
The Rare Variant Dilemma

Janelle Weaver, Ph.D.

Errors introduced by next-generation sequencing make it difficult to detect rare variants that play an important role in diseases such as cancer. Janelle Weaver reports on how scientists are making progress in characterizing these errors and developing strategies to achieve increased accuracy.

After sequencing and analyzing the exomes of 2400 individuals, the researchers behind the National Heart, Lung, and Blood Institute’s Exome Sequencing Project concluded that most single nucleotide variants were rare, occurring in less than 0.5% of the sample population (1). This revelation explains why genome-wide association studies (GWAS), which attempt to associate common genetic variants with specific disease phenotypes, have been generally unsuccessful for complex diseases.

Some of the members of the Loeb lab who worked on the duplex sequencing: Lawrence Loeb, Scott Kennedy, Michael Schmitt, and Jesse Salk. Source: University of Washington

“It’s really the rare mutations that are the most important,” says Jan Vijg. Source: Albert Einstein College of Medicine

Overview of duplex sequencing. Source: PNAS

“It’s really the rare mutations that are the most important,” says Jan Vijg, who studies the relationship between genome damage and aging at the Albert Einstein College of Medicine. “We don’t tend to find risk variants for particular diseases among common variants, so it’s pretty clear that all of the rare variants together are the ones that really are responsible for many disease phenotypes, and therefore we need to look at them.”

While next-generation DNA sequencing holds great promise for the detection of disease-associated mutations and the development of personalized medicine, the ability to detect rare variants is limited by errors introduced during the sample preparation, sequencing, and analysis steps. As a result, approximately 1% of bases are incorrectly identified.

While this error rate is acceptable for some applications, it has been a major hurdle for cancer researchers (2). So now, new techniques are being developed to accurately identify rare variants within the genomic haystack.

Lowering Error Frequency

At the University of Washington, Larry Loeb is one of those cancer researchers trying to get a better grasp on rare genetic variants. But because of the inaccuracy of previous sequencing methods, his lab could only look at mutations with a frequency of more than 10%. “We wanted a method that would be much more accurate so that we could look at mutations that might not be present in all of the cells within the tumor—they might be subclonal or random,” he says.

Loeb and his team described such a method, called Duplex Sequencing, in The Proceedings of the National Academy of Sciences (3). Through the independent tagging and sequencing of both strands of a DNA duplex, this method achieved a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. As a result, this approach is ideal for high sensitivity detection of rare DNA variants as well as single-molecule counting to precisely determine absolute DNA or RNA copy numbers.

In Duplex Sequencing, both strands of a duplex DNA fragment are tagged with a random, yet complementary double-stranded nucleotide sequence. Double-stranded tag sequences are incorporated into standard Illumina sequencing adapters by first introducing a single-stranded randomized nucleotide sequence into one adapter strand and then extending the opposite strand with a DNA polymerase to yield a complementary double-stranded tag. Following ligation of the tagged adapters to sheared DNA, the individually labeled strands then undergo PCR amplification and paired-end sequencing.

By comparing the sequence obtained from each of the two strands in a duplex, Loeb and colleagues could distinguish sequencing errors from true mutations. Because the two strands of a DNA duplex are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error.

“The best other method makes about one mistake in every thousand nucleotides,” says Loeb. “That’s good enough if you want to sequence genetic abnormalities that are present in all the cells in the body. But if you want to sequence rare variants, or if you want a distribution of mutations within a tumor, or if you want to sequence viral populations, the error rate is too high.”

Achieving Accuracy

While scientists are all too aware of the errors introduced during PCR and next-generation sequencing, errors generated during DNA extraction and sample preparation have received less attention. Addressing this issue in a study published in Nucleic Acids Research (4), a team led by Gad Getz of the Broad Institute of the Massachusetts Institute of Technology and Harvard University reported the discovery of a novel source of artifactual mutations occurring during the sample preparation process.

According to this study, the C>A/G>T transversion artifacts found at low allelic fractions in ultra-deep coverage-targeted capture sequencing data resulted from oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. The addition of metal chelators to the shearing buffer reduced these oxidation artifacts, and a post-processing filtering method was capable of screening out oxidation-induced artifacts in the sequencing data. These findings suggest that changes in laboratory procedures and the use of informatics tools can help researchers curb the impact of artifacts.

“People are aware that there are mistakes with any sequencing technology, so I don’t think it’s surprising,” says Nadav Ahituv, who studies the role of gene regulatory sequences in human biology and disease at the University of California, San Francisco and was not involved in the study. “What’s nice is that they are homing in on the cause for it and they’re suggesting nice computational tools to reduce the problem.”

Beyond the experimental and informatics approaches described in these recent papers, there are other potential ways to improve sequencing accuracy, according to Jussi Taipale, an expert in cancer systems biology at the Karolinska Institute who was not involved in these studies. “The accuracy is always going to be limited by the error rates of the polymerase, because you essentially have to use it,” he says. “If we can develop polymerases that have a lower error rate, and if we can deal with all of the chemical sources of mutation, that would of course help even more.”

Clinical Impact

Researchers such as Vijg who study rare variants may not have to wait long to implement some of these approaches. For instance, Duplex Sequencing can be adapted for various sequencing platforms, and the adapters containing double-stranded tag sequences can replace standard sequencing adapters without significantly changing the normal workflow of sample preparation for Illumina sequencing. “We would definitely be able to implement it quickly in the current workflow,” says Vijg.

Perhaps most importantly, Duplex Sequencing could reveal rare variants that confer drug resistance. “If we already know that somewhere in the tumor, there are cells with a particular variant that gives them the opportunity to escape a particular drug, then we could try another drug instead. It might have a direct impact on treatment,” says Vijg. Moreover, future advances in single-cell sequencing could further help researchers identify these types of mutations (5).

But whether Duplex Sequencing could be applied to many types of mutations remains to be determined. “They really applied this mostly to small mutations—point mutations. To do this also on large changes, like big deletions, translocations, or copy number variations, it’s not immediately clear to me if this can be done.”

In the end, Duplex Sequencing probably won’t replace standard methods. For one, the approach would be too costly for whole-exome sequencing, says Loeb. “What it’s really good for is sequencing heterogeneous mixtures of cells or asking biological questions where you need super accuracy. So, in a sense, it will probably be limited to the study of cancer, viral populations, ancient DNA forensics—things like this,” he says.

Ahituv agrees that various sequencing approaches will be used in parallel. “The most important implication from both papers is that if we want to look for rare variants using next-generation sequencing technologies, we should be very careful,” he says.


1. Tennessen, J. A., A. W. Bigham, T. D. O'Connor, W. Fu, E. E. Kenny, S. Gravel, S. McGee, R. Do, X. Liu, G. Jun, H. M. Kang, D. Jordan, S. M. Leal, S. Gabriel, M. J. Rieder, G. Abecasis, D. Altshuler, D. A. Nickerson, E. Boerwinkle, S. Sunyaev, C. D. Bustamante, M. J. Bamshad, J. M. Akey, G. O. Broad, G. O. Seattle, and on behalf of the NHLBI Exome Sequencing Project. 2012. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64-69.

2. Salk, J.J., E.J. Fox, and L.A. Loeb. 2010. Mutational heterogeneity in human cancers: origin and consequences. Annu Rev Pathol 5:51-75. doi: 10.1146/annurev-pathol-121808-102113.

3. Schmitt, M.W., S.R. Kennedy, J.J. Salk, E.J. Fox, J.B. Hiatt, and L.A. Loeb. 2012. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A 109(36):14508-13. doi: 10.1073/pnas.1208715109.

4. Costello, M., T.J. Pugh, T.J. Fennell, C. Stewart, L. Lichtenstein, J.C. Meldrim, J.L. Fostel, D.C. Friedrich, D. Perrin, D. Dionne, S. Kim, S.B. Gabriel, E.S. Lander, S. Fisher, and G. Getz. 2013. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. doi: 10.1093/nar/gks1443.

5. Gundry, M., and J. Vijg. 2012. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutat Res 729(1-2):1-15. doi: 10.1016/mrfmmm.2011.10.001.