2Department of Biology, McMaster University, Hamilton, Canada
3Chemical Engineering Department, University of Michigan, Ann Arbor, MI
4MYcroarray, Ann Arbor, MI
Targeted DNA enrichment through hybridization capture (EHC) is rapidly replacing PCR as the method of choice for enrichment prior to genomic resequencing. This is especially true in the case of ancient DNA (aDNA) from long-dead organisms, where targets tend to be highly fragmented and outnumbered by contaminant DNA. However, the behavior of EHC using aDNA has been quite variable, making success difficult to predict and preventing efficient sample equilibration during multiplexed sequencing runs. Here, we evaluate whether quantitative PCR (qPCR) measurements of aDNA samples correlate with on-target read counts before and after EHC. Our data indicate that not only do simple target qPCRs correlate strongly with high-throughput sequencing (HTS) data but that certain sample characteristics, such as overall target abundance as well as experimental parameters (e.g., bait concentration and secondary structure propensity), consistently influenced enrichment of our diverse set of aDNA samples. Taken together, our results should help guide experimental design, screening strategies, and multiplexed sample equilibration, increasing yield and reducing the expected and actual cost of aDNA EHC high-throughput sequencing projects in the future.
Recently, high-throughput sequencing (HTS) has enabled reconstruction of large genomic regions from heavily degraded paleontological and archival specimens (1-3). These results promise to expand ancient DNA (aDNA)-based genomic projects to greater breadth and rigor. However, aDNA extracts are typically dominated by DNA from microorganisms, with genomic regions of interest often representing only a tiny fraction of the total genomic content of the target organism. Therefore, even with the significant amount of raw sequence data generated by HTS, obtaining multi-fold coverage of a target genomic region can be prohibitively expensive. As such, most aDNA extracts require targeted enrichment prior to HTS, especially when the goal is to pool and sequence multiple individuals to sufficient coverage depth.
Target aDNAs are typically well under the practical length for resequencing with overlapping PCR amplicons. Thus, enrichment through hybridization capture (EHC) coupled with HTS is rapidly replacing targeted PCR as the method of choice for reconstructing several and/ or long aDNA genomic targets (4-18). However, the reported on-target read proportions obtained using EHC with aDNA vary considerably (Table 1). These results hinder efficient sample equilibration in multiplexed sequencing runs, and are a primary barrier to EHC's routine use by many groups working with aDNA.
We quantified a 49 bp locus of the mammoth 12S rRNA mitochondrial gene on 11 different mammoth DNA extracts indexed libraries, and whole mitochondrial DNA-enriched libraries. The same libraries were high-throughput sequenced, and reads were aligned to the target. qPCR values, bait concentration, and secondary structure propensity were compared against on-target read counts, enrichment rates, and mitogenome coverage patterns.
EHC variation is unsurprising since the process is driven by mostly unevaluated interactions between experimental parameters, target choice, and DNA sample characteristics (19, 20). Of these, the absolute and/or relative abundance of targets are likely particularly relevant, especially given the highly complex non-target DNA content typical of aDNA extracts. Fortunately, quantitative PCR (qPCR) can cheaply, albeit indirectly, estimate target abundance and has been used extensively to evaluate aDNA sample quality (2, 21-25). Recently, Wales and colleagues (25) demonstrated that qPCR measures of target and non-target molecules in aDNA samples correspond ordinally to HTS data from non-enriched sequencing libraries. What remains unanswered, however, is whether qPCR-based metrics from aDNA samples correspond to reads-on-target after enrichment. Here we examine this question experimentally to determine if qPCR-based predictions could make EHC-HTS projects more predictable and efficient. In addition, we discuss experimental parameters that impact enrichment results and may thereby inform future experimental design strategies. Materials and methods
As depicted in Figure 1, we selected 11 DNA extracts from various Pleistocene-age Mammuthus specimens of a wide range of depositional contexts and preservation levels, as well as 2 associated non-mammoth controls. We then used a highly sensitive proboscidean-specific qPCR assay to measure the amplifiable molecules at a 49 bp locus (including primers) of the mammoth mitochondrial 12S rRNA gene using 2 template concentrations (1× and 0.1×). After converting the extracts to Illumina sequencing libraries (26, 27), we quantified the 12S locus again, along with the total number of adapted molecules using a global qPCR assay (26). We then enriched the libraries in duplicate using an in-solution EHC technique with two different concentrations of biotinylated RNA baits matching several known mammoth mitochondrial genome haplogroups (28-30). Following enrichment, we again quantified both the 12S locus and total adapted molecules. Finally, we sequenced the non-enriched and enriched libraries on an Illumina HiSeq platform and measured on-target read proportions, coverage patterns, and enrichment rates, evaluating how these corresponded to qPCR metrics, bait concentration, and bait properties.
Quantitative PCR assays
Quantitative PCRs were performed using a CFX96 real-time PCR instrument (BioRad, Hercules, CA). For the mammoth 49 bp 12S assay, we paired 2 previously-published (31) primers (B1_Forward: CCCTAAACTTTGATAGCTACC, A1_Reverse: GTAGTTCTCTGGCGGATAGC). The target has GC content of 42.9%, compared with the mammoth mitogenome average of roughly 38%. As a standard, we used a previously generated amplicon encompassing the mammoth 12S rRNA gene quantified with a GeneQuant Pro UV spectrophotometer (Amersham, now GE Healthcare; Freiburg, Germany). An optimal primer annealing temperature (54°C) was determined by a temperature gradient qPCR experiment. The assay exhibits near single-copy sensitivity and is accurate to at least 10 copies. Test amplifications from human, xenarthran, coprolite, soil, and chicken DNAgenerated sporadic and off-size products. For library target and total quantifications, the standards were amplicons of the target that were converted to a non-indexed Illumina sequencing library (26). Total library quantifications used primers matching the universal internal adapter sequence (IS7_short_amp. P5: ACACTCTTTCCCTACACGAC; IS8_short_amp.P7: GTGACTGGAGTTCAGACGTGT) (26).
Each 12S qPCR reaction used 3 μL (extracts) or 1 μL (libraries) template and followed the recipes and thermal programs in Supplementary Table S6, with plate reads after annealing steps. In 12S assays, dissociation ramps from at least 65°C to at least 89°C were used to confirm the positive product (74.5 ± 0.5°C). At least three different concentrations of standard were included in duplicate in each assay and values were estimated using the automatic threshold calculation by the CFX Manager software version 3.0. Extracts were screened using both 1× and 0.1× versions diluted in EBT (QIAGEN Buffer EB brought to 0.05% Tween-20) (QIAGEN, Dusseldorf, Germany). For indexed libraries, 12S quantifications used 0.1× versions and total quantifications used either 0.001× or 0.0001× versions. Duplicate quantifications were averaged, with values projected to be in the undiluted sample reported even when less than 10 copies (Supplementary Table S1). DNA extraction and library preparation
We extracted DNA from roughly 100 mg subsamples of 11 Mammuthus specimens and a Pleistocene Mylodon darwinii specimen (Supplementary Table S1) in aDNA-dedicated facilities using an organic protocol with filter concentration (2). Libraries used 50 μL of each extract diluted to various concentrations; the 0.1× concentration was used when inhibition was estimated to be >90% (Supplementary Table 1). Library preparation followed Meyer and Kircher (26), with each purification step performed with the MinElute PCR purification kit (QIAGEN) eluting in 20 μL EBT.
Libraries were diluted to between 1× and 0.1× concentration in EBT based on available volume (Supplementary Table S1) and then index-amplified with unique P5 and P7 indexing primers (27) using 5 μL of template according to the recipe and thermal program in Supplementary Table S6. Reactions were removed from the cycler and placed on ice upon reaching similar amplification points (11–23 cycles), estimated to be near PCR plateau. After review of the trace data without baseline correction, it was apparent that Mammoth 7 reached plateau by cycle 9. Reactions were purified with the MinElute kit to 20 μL EBT. Bait design
For bait references, we used the light strand of several published Mammuthus mitochondrial genomes (30, 32) from four different haplogroups (NCBI accession no.s EU153453, JF912199, EU153456, EU153447) and a complete mitogenome sequence from Mammut americanum (American Mastodon, NCBI accession no. EF632344) (33). We also included a 734 bp region of the cytochrome b, tRNA-Pro, tRNA-Thr from a fifth mammoth haplogroup (NCBI accession no. FJ015496) (29). The VNTRs of the control regions were removed prior to bait design. We also selected a number of nuclear loci, which we extracted from the Loxodonta africana genome sequence (NCBI RefSeq Assembly ID: GCF_000001905.1) and a number of entries in the NCBI nucleotide database and did not repeat-mask (Supplementary Datafile 1). With these, we designed 100 bp baits tiled every 6 bp along the targets and then collapsed identical baits, providing 8530 unique mitochondrial and 7996 unique nuclear baits, which MYcroarray manufactured in 2 separate batches as part of a MYbaits enrichment kit. Bait dimer and hairpin scores were calculated with the Primer3 plug-in (34) in Geneious Pro version 5.6.5 (Biomatters, Auckland, New Zealand) using a salt concentration setting matching capture conditions (0.82 M). Enrichment and re-amplification
We used the MYbaits targeted enrichment kit protocol with some adjustments: