Ultra-deep sequencing (UDS) of amplicons is a major application for next-generation sequencing tech nologies, even more so for the 454 Genome Sequencer FLX. Especially for this application, errors that might be introduced during any of the sample processing or data analysis steps should be avoided or at least recognized, as they might lead to aberrant sequence variant calling. Since 454 pyrosequencing relies on PCR-driven target amplification, it is key to differentiate errors introduced during the amplification step from genuine minority variants. Thereto, optimal primer design is imperative because primer selection, primer dimer formation, and nonspecific binding may all affect the quality and outcome of amplicon-based deep sequencing. Also, other intrinsic PCR characteristics including amplification drift and the formation of secondary structures may influence sequencing data quality. We illustrate these phenomena using real life case studies and propose experimental and analytical evidence-based solutions for effective practice. Furthermore, because accuracy of the DNA polymerase is vital for reliable UDS results, a comparative analysis of error profiles from seven different DNA polymerases was performed and experimentally as-sessed in parallel by 454 sequencing. Finally, intra- and interrun variability evaluation of the 454 sequencing protocol revealed highly reproducible results in amplicon-based UDS.
An important factor in any minor variant detection approach is the sensitivity for detecting DNA sequence variants or mutations in an excess of nonmutated genomes. This challenge is encountered across many applications, including the detection of neoplastic cells in a majority of normal cells, the detection of somatic mutations in tumor biopsies, and the detection of a minor viral variant in a background of a large viral population. Sanger sequencing (1) has long been regarded as the gold standard for mutation detection, because prior knowledge of mutations is not required and assay development is limited only by sequencing primer design and read length. This approach, however, is unreliable for detecting variants that constitute <20% of the total population of genomes in a sample (2,3), it generates only an average sequence of the PCR product, it does not allow determining linkage of mutations, and the analysis of samples with heterogeneous insertion-deletion mutations rema ins challenging. Minor variant detection methods that rely on subcloning of PCR products, in conjunction with conventional sequencing, are expensive, time-consuming, and suffer from the drawback that PCR-based errors are propagated into the cloned DNA and cannot be discriminated from bona fide mutations. On the other hand, nonsequencing based assays for minor variant detection [e.g., allele-specific PCR or probe-based methods (4)] generally offer high sensitivity, but prior knowledge of the mutation of interest is required, and no information on the sequence context can be generated, nor is linkage of the identified mutations possible.
The power of the new sequencing technologies and their utility for variant detection derive from the ability to sequence sing le molecu les in massive amounts. In this process, each of the single molecules of an amplicon is clonally amplified and is sequenced individually, allowing for the identification of rare va ria nts and haploty pein formation over the whole read length. At present, the 45 4 py rosequenci ng tech nology achievest helong estread lengths (400–500 bp reads), using the titanium chemistry on the Genome Sequencer FLX (GS FLX; 454 Life Sciences, Roche Applied Science, Branford, CT, USA). This technology enables the clonal sequencing of hundreds of thousands of molecu les, which allows the ultra deep sequencing (UDS) of amplicons at high coverage (5). The combination of high coverage and long read length has made the GS FLX a promising tool for sensitive and quantitative detection of variants, with potential applications in various clinically relevant research areas, including virolog y, oncology and human genetics.
For example, monitoring HIV-1 drug resistance has become increasingly important for guiding treatment, especially for patients failing antiretroviral therapy (6,7). The conventional method uses bulk population genotyping of the viral quasispecies in an infected patient to predict HIV-1 drug resistance profiles. Some studies suggest that low-frequency (<20%) resistance mutations may have an impact on therapy outcome as a result of transmitted drug resistance or remnants of earlier drug selection in patients previously exposed to antiretroviral therapy, while other studies did not observe such correlation (8-15). With UDS becoming more widely available, the relevance of minor mutations in the context of different antiretroviral therapy regimens might help define the clinical benefit of low-frequency resistance testing. In oncology, UDS has been applied to identify rare somatic mutations in complex tumor samples, which might impact diagnostics and therapeutics. For example, Thomas et al. (16) reported the presence of low-abundance oncogene mutations in complex samples with low tumor content for which conventional Sanger sequencing was not informative. Another study screened 623 cancer-related genes in 188 human lung adenocarcinoma, revea ling more than 1000 somatic mutations across the samples (17). UDS has also been applied to search for rare mutations in samples from patients suffering from tuberous sclerosis complex, an autosomal dominant neurocutaneous syndrome (18), and in samples from B cell chronic lymphocytic leukemia patients (19).
Over the past years, we have built experience in the design and optimization of UDS assays on amplicons using 454 massive parallel pyrosequencing technology, primarily with applications in virology and oncology. Here, we discuss and illustrate by means of case studies from our laboratory different sources of errors that may occur during UDS. Emphasis is on the experimentally controllable variables affecting fidelity, quality, and outcome of amplicon-based deep sequencing.