After oligo(dC) tail addition, a chimeric oligonucleotide with a defined sequence X at its 5′ end and 4–7 complementary deoxyguanosines (7 is optimal [Supplementary Figure S2]) at its 3′ end is annealed to the homopolymer tail and joined to the 5′ end of the opposing strand using T4 DNA ligase. Due to the stable nature of the seven dC:dG base pairs, this ligation event is extremely efficient. The DNA is next amplified by PCR using the same oligonucleotide used for ligation as the forward primer, and a chimeric reverse primer composed of Y’ at its 5′ end and 16 complementary deoxyguanosines at its 3′ end that are used to prime DNA synthesis from the oligo(dC) tail.
The reverse primer can anneal to and prime from anywhere along the homopolymer tail. In the absence of a chain terminator, the tail length generated can exceed hundreds of nucleotides (Supplementary Figure S1). By using ddCTP in the tailing reaction, the contribution of poly(dC) to the final product is effectively limited. Although either titration of TdT or reduction of reaction time could also be used to limit tail length, we found that the use of chain terminators in the context of excess enzymatic activity yielded the most precise and reproducible results (Supplementary Figure S1 and data not shown). Here HTML-PCR was used exclusively to generate Illumina sequencing libraries. The complete details of each enzymatic reaction together with oligonucleotide andsequencing primer sequences are provided in the Supplementary Materials and Methods section. Some bioinformatic methods of analysis were previously described (6) and additional detail is provided in the Supplementary Materials and Methods section.
Results and discussion
We first tested the utility of HTML-PCR by using the method, in conjunction with MPS, to determine the sequence of a previously unsequenced bacterial strain, V. cholerae E7947. After fragmenting the genomic DNA by high intensity sonication, DNA concentrations over a range of four orders of magnitude (100–0.01 ng) were individually blunted, 5′ end phosphorylated and treated with TdT in the presence of a 19:1 ratio of dCTP:ddCTP to generate 3′ oligo(dC) tails averaging 20 nucleotides in length. The tailed substrate was then ligated to the chimeric oligonucleotide olj623, which has seven dG nucleotides at its 3′ end and a sequence required for Illumina sequencing at its 5′ end. Finally, the products of this reaction were amplified by PCR using primers olj623 and a barcode-containing primer that contains sixteen dG nucleotides at its 3′ end and a second sequence required for Illumina sequencing at its 5′ end. The reactions with 1–100 ng of input template yielded a range of products from approximately 150–1000 bp (Figure 2, lanes 1, 6–8 and 11–13). Twelve cycles of PCR were sufficient only for the highest amount (100 ng) of input genomic DNA (Figure 2, lane 1), while 24 cycles was sufficient for input amounts down to 1 ng (Figure 2, lanes 6–8). For the lowest input amounts (0.1 and 0.01 ng), visible products were only observed in the 36 cycle samples (Figure 2, lanes 14 and 15): However, the size range of the resulting products was distinctly lower than in the other lanes. MPS revealed that the products in lanes 14 and 15 were a mixture of bona fide V. cholerae sequences and unintended sequences derived from primers and contaminating human (possibly investigator) DNA (see below).
Samples from Figure 2, lanes 11–15 were subjected to Illumina sequencing and the resulting sequences were aligned to the complete genome sequence of a closely related V. cholerae reference strain, N16961 (7). We also used conventional adapter ligation-mediated Illumina library preparation to sequence E7946. We found that when compared with the published sequence of the N16961 reference strain, the E7946 sequence contained 92 single nucleotide polymorphisms (SNPs) and 100 deletion/insertion polymorphisms (DIPs)(Supplementary Tables S1 and S2). For the samples from Figure 2, lanes 11–13, 96.8%, 94.6% and 68.1% of the raw unfiltered sequencing reads could be mapped to the N16961 reference genome respectively. After filtering for quality, 99.7%, 99.1% and 89.5% of the respective reads were mapped to the reference sequence. Importantly, all of the SNPs and DIPs observed with the conventional Illumina library preparation were observed with the samples from Figure 2, lanes 11–13. In other words, the traditional method and HTML-PCR yielded identical results; however, while 5 µg of genomic DNA were used to prepare the traditional library, 5,000 fold less DNA (1 ng; Figure 2, lane 13) was needed for preparation by HTML-PCR. For the samples from Figure 2, lanes 14 and 15, even after the reads were filtered for quality, only 56.9% and 11.0% respectively were mapped to the N16961 genome. Still, there was sufficient data from each sample to cover greater than >99% of the E7946 genome and >90% of the SNPs were detected. Therefore, HTML-PCR was at least partially successful down to 0.01 ng of input DNA, which is 100,000–500,000-fold less than that recommended by Illumina for their adapter-based method. Although smaller amounts of starting material can be used in different adapter-based methodologies, including the standard Illumina approach, HTML-PCR is clearly effective at concentrations well below those employed with these methods.