2Victorian Life Sciences Computation Initiative, Melbourne, Australia
3Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
4Cancer Epidemiology Centre, The Cancer Council Victoria, 615 St. Kilda Road, Melbourne, Australia
5Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Parkville, Australia
Previously, we reported Hi-Plex, an amplicon-based method for targeted massively parallel sequencing capable of generating 60 amplicons simultaneously. In further experiments, however, we found our approach did not scale to higher amplicon numbers. Here, we report a modification to the original Hi-Plex protocol that includes the use of abridged adapter oligonucleotides as universal primers (bridge primers) in the initial PCR mixture. Full-length adapter primers (indexing primers) are included only during latter stages of thermal cycling with concomitant application of elevated annealing temperatures. Using this approach, we demonstrate the application of Hi-Plex across a broad range of amplicon numbers (16-plex, 62-plex, 250-plex, and 1003-plex) while preserving the low amount (25 ng) of input DNA required.
Hi-Plex is an amplicon-based approach for targeted massively parallel sequencing that can be used with different sequencing platforms (1, 2). The system is suitable for high-throughput processing of specimens with a high level of accuracy (3) and can take advantage of read-pair overlap considerate variant-calling software, ROVER (4). The ability to use complete read-pair comparisons allows filtering of sequencing chemistry errors with a 1000-fold higher stringency than alternative approaches. Additionally, Hi-Plex shows significantly lower library preparation error rates than alternative amplicon-based target enrichment systems (Hsu et al., unpublished data). These characteristics are of particular value for applications such as low-level genetic variant detection among subpopulations of samples (e.g., tumor subclones).
Our earlier work used a 60-amplicon format to demonstrate proof-of-concept, aid development of accompanying software, and test the application of Hi-Plex in a range of settings. However, when we attempted to increase amplicon number, we were unable to resolve clear library bands following agarose gel electrophoresis unless indexing primers were added at relatively late stages of thermocycling. This is suboptimal because it largely removes the leveling of amplicon amplification efficiencies that is an important part of the Hi-Plex mechanism.
This approach is an advance over the previously published Hi-Plex method, allowing a greater range of amplicon numbers to be targeted (16–1003 are demonstrated). The method uses abridged adapter primers to reduce the off-target effects observed when including full-length adapter primers early in the PCR.
We hypothesized that early priming using bridge primers followed by late priming with indexing primers (Figure 1) could result in less susceptibility to off-target effects. If this could be achieved, the result would be a more specific Hi-Plex system capable of enriching a greater range of amplicons. The 5′ tails of the Hi-Plex gene-specific primers had previously served as the annealing sites for indexing primers added after the sixth amplification cycle (1). Here, we design a series of forward and reverse bridge primers that represent incrementally 3′-abridged tail sequences to be included with tailed gene-specific primers from the outset of PCR. Our rationale for this change was that a suitable combination of shorter primers would provide fewer opportunities for off-target priming events. Various combinations of these primers were tested by including them with 745 tailed gene-specific primer pairs in the initial PCR mix prior to thermocycling. A variety of bridge primer combinations were found to be appropriate for Hi-Plex, as judged by banding patterns following agarose gel electrophoresis (data not shown). For subsequent Hi-Plex experiments, the F8 and R5 bridge primers (F8_bridge and R5_bridge) were included and a thermal step-up was applied following addition of indexing primers as part of the protocol detailed in the following paragraph. When indexing primers abridged to F8_bridge and R5_bridge 3′ termini were added directly to the Hi-Plex reaction mix in the absence of bridge primers, a distinct library band could not be resolved when targeting the same 745 amplicons.
To the online Hiplex-primer tool (https://github.com/bjpop/hiplex-primer), we input the exon coordinates for 26 genes of relevance to breast cancer (PRKCA, ESR1, CDKN2A, PAX6, MLH1, BMP4, BRCA1, BRCA2, ATM, CHEK2, TP53, BRIP1, PTEN, BARD1, RAD51, RAD50, RAD51C, PPM1D, FANCC, NEIL1, CDH1, BLM, RAD51D, PALB2, XRCC2, RINT1). For 20 genes (839 amplicons), we used the settings previously described (1). For 6 genes (164 amplicons), the input arguments were modified to yield insert sequence lengths between tailed gene-specific primer regions ranging from 98 to 109 bases. The gene-specific portions of tailed gene-specific primers ranged in length from 20 to 26 bases. At least 15 additional bases on either side of each input pair of coordinates were included in the targeting. All primers were synthesized by Integrated DNA Technologies (Coralville, IA) and purified to standard desalting grade. Tailed gene-specific primer sequences are available upon request. Indexing primers (full-length adapter primers) were described in a previous publication (3). F8_bridge and R5_bridge bridge primers have 5′- to 3′-oriented DNA sequences of CTCTCTATGGGCAGTC and CTGCGTGTCTCCGAC, respectively.
Template material consisted of genomic DNA derived from an Epstein-Barr virus—transformed lymphoblastoid cell line (LCL) generated as part of the Australian Breast Cancer Family Study (ABCFS) (5). DNA was extracted using the QIAamp DNA Blood Kit (Qiagen, Dusseldorf, Germany) and quantitated using the Qubit dsDNA Assay system (Life Technologies, Grand Island, NY).
The initial Hi-Plex PCR mixture comprised 25 L of 1× Phusion HF PCR buffer (ThermoScientific, Waltham, MA), 400 M dNTPs (Bioline, London, UK), tailed gene-specific primers at aggregate concentrations as indicated in Table 1, 1 M F8_bridge primer, 1 M R5_bridge primer, and 2.5 mM MgCl2 with the inclusion of 1 U Phusion Hot Start II High-Fidelity DNA Polymerase, and 25 ng input genomic DNA. The following steps were conducted for the PCR: 98°C for 1 min, x cycles (x is indicated in Table 1 for different numbers of amplicons) of 98°C for 30 s, 55°C for 1 min, 60°C for 1 min, 65°C for 1 min, 70°C for 1 min, followed by addition of indexing primers to 1 M, then 4 cycles of 98°C for 30 s, 68°C for 1 min, 70°C for 1 min, followed by incubation at 68°C for 20 min. As described previously (1), the series of different annealing/ extension temperatures applied during the x cycles is intended to reduce amplification bias across diverse targets varying in G/C-content and secondary structure. In a 1003-plex experiment, tailed gene-specific primer concentrations were adjusted according to the algorithm presented in (2), except a coarser threshold for overachievement of tailed gene-specific primer pairs was set at 5-fold deeper coverage than the median, instead of 1.5-fold.
Gel excision and purification of the ∼275 bp library bands were performed as described previously (2, 3). MPS was performed on a MiSeq instrument (Illumina, San Diego, CA) using version 2 chemistry in accordance with the manufacturer's instructions except that 3.4 L 100 M TSIT_Read1, TSIT_Read2, and TSIT_i7_read primers were spiked in to the appropriate primer reservoirs in the reagent cartridge prior to sequencing, as described by Nguyen-Dumont et al. (2).
Mapping of sequence reads to the human genome (hg19 assembly downloaded from the UCSC database) was done using bowtie-2–2.1.0 (6) with default parameters except for –trim5 20 and –trim3 20. ROVER variant caller (4) was applied using a variant proportion threshold of 0.15 and minimum required variant depth of 2 read-pairs.
Using this approach, we performed Hi-Plex using 1003 tailed gene-specific primer pairs targeting the coding and flanking intronic regions of a panel of 26 breast cancer relevant genes (Figure 2). We observed that 94.12% of amplicons were represented at a coverage depth within 25-fold of the target-set median, and 89.93% of amplicons were represented within 10-fold of the median. As observed previously in the 60-amplicon Hi-Plex system (2), the uniformity of target coverage can be improved by reducing the concentrations of over-achieving primer pairs. The same algorithm for guiding the adjustment of primer concentrations was used except that we applied a much coarser threshold for overachievement: 5-fold higher than the median (coarse) rather than 1.5-fold (fine) applied previously. For this experiment, we observed that 96.21% and 91.23% of mapped amplicons were covered to a depth within 25-fold or 10-fold of the all-amplicons medians, respectively. It would be expected that finer adjustment and/or redesign and replacement of underperforming primer pairs would improve the target coverage uniformity further. Given the naïve nature of our primer design algorithm (1), we expect future iterations of Hiplex-primer or other independent primer design tools will confer incremental improvements to the Hi-Plex system by implementing more sophisticated primer design considerations such as nearest-neighbor context (7), secondary structures, primer-dimers, sequence complexity, G/C-content, and predicted off-target priming.
To show that the new Hi-Plex protocol works with different numbers of primer pairs, we selected a range of amplicon numbers to test: 16, 62, and 250 amplicons, respectively. We created tailed gene-specific primer subset pools from the 1003-amplicon experiment, without adjusting individual primers relative to the pools. The subset primers were selected randomly, without regard to their performance in the 1003-amplicons setting or presence in other primer pools. For each number of amplicons, the system worked across a range of tailed gene-specific primer pool concentrations and total numbers of PCR cycles. The number of amplicons appeared to positively correlate with the tailed gene-specific primer pool concentration (aggregate) and negatively correlate with the total cycle number. Table 1 serves as a guide for initial conditions to try in a given Hi-Plex experiment, with the disclaimer that we used 25 ng of human genomic DNA as input for our experiments. Figure 2 shows agarose gel electrophoresis profiles across the different numbers of amplicons. Of the amplicons, 100%, 96.8%, and 95.0% (means derived from two specimens) were covered to a depth within 25-fold of the median of each set of amplicons for 16, 62, and 250 amplicons, respectively. The percentages of on-target mapped reads for 16, 62, 250, and 1003 amplicons were 99.9%, 99.3%, 23.7%, and 67.4%, respectively. The on-target rate is influenced by the presence of sequence homologs, including pseudogenes, and the specificity of primers in recognizing these homologous regions. Overall, our results indicate that the modified Hi-Plex protocol enables excellent assay performance across a wide range of amplicon numbers. It is likely that Hi-Plex will be applicable to a greater range of amplicon numbers, with higher on-target rates, in the future, particularly with more sophisticated primer design algorithms.
T.N.-D. and F.H. and contributed to experimental design, conduction of experiments, and data analysis. M.M. conducted experiments. B.J.P. conducted data analysis. G.G.G., J.L.H. and M.C.S. contributed to experimental design. D.J.P. conceived the study and contributed to experimental design and analysis. All authors participated in the drafting of the manuscript.
This work was supported by the Australian National Health and Medical Research Council (NHMRC) (APP1025879 and APP1029974), The Cancer Council Victoria (APP1066612), a Victorian Life Sciences Computation Initiative (VLSCI) grant (VR0182) on its Peak Computing Facility at The University of Melbourne, the National Institutes of Health (USA) (RO1CA155767) and the Victorian Breast Cancer Research Consortium (VBCRC). TN-D is a Susan G. Komen for the Cure Postdoctoral Fellow. MCS is a Senior Research Fellow of the NHMRC. We thank the Australian Breast Cancer Family Study (ABCFS, Principal Investigator John Hopper) for providing the cell-line derived DNA. This paper is subject to the NIH Public Access Policy.
The authors declare no competing interests.
Address correspondence to Daniel J Park, Genetic Epidemiology Laboratory, Department of Pathology, The University of Melbourne, Melbourne, Australia. E-mail: [email protected]
1.) Nguyen-Dumont, T., B.J. Pope, F. Hammet, C.M. Southey, and D.J. Park. 2013. A high-plex PCR approach for massively parallel sequencing. Biotechniques 55:69-74. 2.) Nguyen-Dumont, T., B.J. Pope, F. Hammet, M. Mahmoodi, H. Tsimiklis, M.C. Southey, and D.J. Park. 2013. Cross-platform compatibility of Hi-Plex, a streamlined approach for targeted massively parallel sequencing. Anal. Biochem. 442:127-129. 3.) Nguyen-Dumont, T., Z.L. Teo, B.J. Pope, F. Hammet, M. Mahmoodi, H. Tsimiklis, N. Sabbaghian, M. Tischkowitz. 2013. Hi-Plex for high-throughput mutation screening: application to the breast cancer susceptibility gene PALB2. BMC Med. Genomics 6:48. 4.) Pope, J.B., T. Nguyen-Dumont, F. Hammet, and D.J. Park. 2014. ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets. Source Code Biol. Med. 9:3. 5.) John, M.E., J.L. Hopper, J.C. Beck, J.A. Knight, S.L. Neuhausen, R.T. Senie, A. Ziogas, I.L. Andrulis. 2004. The Breast Cancer Family Registry: an infrastructure for cooperative multinational, interdisciplinary and translational studies of the genetic epidemiology of breast cancer. Breast Cancer Res. 6:R375-R389. 6.) Langmead, B., and S.L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357-359. 7.) Kibbe, A.W. 2007. OligoCalc: an online oligonucleotide properties calculator. Nucleic Acids Res. 35:W43-W46.