High-throughput sequencing technologies frequently necessitate the use of PCR for sequencing library amplification. PCR is a sometimes enigmatic process and is known to introduce biases. Here we perform a simple amplification-sequencing assay using 10 commercially available polymerase-buffer systems to amplify libraries prepared from both modern and ancient DNA. We compare the performance of the polymerases with respect to a previously uncharacterized template length bias, as well as GC-content bias, and find that simply avoiding certain polymerase can dramatically decrease the occurrence of both. For amplification of ancient DNA, we found that some commonly used polymerases strongly bias against amplification of endogenous DNA in favor of GC-rich microbial contamination, in our case reducing the fraction of endogenous sequences to almost half.
PCR is a ubiquitous and indeed fundamental tool in genetic experimentation. For decades it has allowed rapid in vitro enrichment of targeted loci, increasing the throughput and resolution of many laboratory applications as well as opening new possibilities for genetic analysis (1). PCR is, however, known to be a highly problematic process for the amplification of multiple templates in parallel, where factors such as base composition can have critical effects on experimental results (2). For example, differential template composition can cause allelic dropout in PCR-mediated genotyping (3), or it can cause misrepresentation of pathology associated gene dosage determined by multiplex PCR assays (4,5).
PCR biases can be especially damaging to high-throughput DNA sequencing applications. The most widespread of these technologies require the conversion of DNA or RNA samples into sequencing libraries by attaching two different adaptor sequences to the ends of template molecules. Libraries are only rarely used directly for sequencing (6), but instead are often amplified immediately (7) or after some down-stream processes such as target enrichment by hybridization capture (8). Amplification of sequencing libraries, i.e., amplifying a mixture of templates indiscriminately in parallel, is an enormous challenge and the choice of PCR parameters may strongly determine to which level it is possible to maintain complexity and ensure a representative final product. In fact, it has been shown that library amplification can severely bias the GC-content of sequences (9,10). However, despite extensive use of PCR for the amplification of sequencing libraries, we are only aware of a single study in this direction, which aimed to reduce GC biases in this process (11). In this study, Aird et al. describe an optimized PCR protocol that mitigates a GC-bias introduced during the PCR stage of library preparation. They show that switching polymerases in conjunction with a longer denaturation step and a lower annealing and extension temperature retains the GC-content profile seen before PCR. While satisfactorily overcoming the GC-bias, their investigation was limited to only the GC-bias and a comparison of two polymerases (Phusion HF and AccuPrime Taq HiFi). Another possible bias, size bias, has yet to be investigated, despite being relevant to applications using samples of limited quantity and/or sequencing technologies supporting longer read lengths, such as Roche/454's Genome Sequencer. Furthermore, to date there has not been a thorough comparison of the capabilities of different polymerases.
In this study we amplify Illumina sequencing libraries prepared from human genomic DNA using 10 different commercially available polymerase/buffer systems and analyze sequence data to determine the effect of these polymerases on fragment length and GC distributions. We also investigate the effect of amplification into the plateau phase of PCR. We then further characterize a subset of the polymerase/ buffer systems with respect to their performance on an ancient DNA library, an extreme application where amplification biases can have severe and deteriorating effects. Ancient DNA libraries are characterized by a distinctive fragmentation pattern, with lengths typically less than 200 bp (12), and low initial quantities made up of a minimum percentage of endogenous DNA often scattered among a highbackground of microbial contamination (13,14). Since these libraries are created from valuable and limited sources, it is often desired to sequence them to exhaustion. Unbiased amplification is particularly important, as an unequal accumulation of a specific template characteristic, such as short fragments or high GC-percentage, can drive up sequencing costs.
Materials and methods
Two µg of human genomic DNA derived from B-lymphocytes was sheared on a Covaris S2 (Covaris Inc, Woburn, MA) with settings of 10% duty cycle, intensity 4 and 200 cycles per burst. The fragment library was then visualized with a High Sensitivity DNA chip on a BioAnalyzer (Agilent, Santa Clara, CA) to ensure an appropriate peak and length distribution. Ancient DNA was extracted from 42 mg of a Neandertal toe phalanx from Denisova Cave, Russia (15) using a silica-based extraction technique (16).
Illumina Library Preparation
Illumina libraries were prepared from both the human and Neandertal samples in accordance with a previously published protocol (17), beginning at the blunt-end repair step, and using the following modifications: (i) For the human sample, full-length adaptors (P5 and indexed P7) were ligated directly onto all fragments in the adaptor ligation step, thus eliminating the need for amplification with 5′-tailed primers prior to sequencing. (ii) For the Neandertal sample, uracils were excised from the template strands as described elsewhere (18). (iii) For both samples, all purifications steps were done with the MinElute PCR purification kit (Qiagen, Hilden, Germany). Contamination from modern human DNA in the Neandertal extract was determined to be ∼2.4% based on a mitochondrial capture assay described elsewhere (19).
Both libraries were quantified by qPCR as described elsewhere (17,20) and diluted to working concentrations of ∼2.0x106 molecules/µl for the human DNA library, and ∼2.0x105 for the Neandertal DNA library.