Figure 2 plots qPCR measures against mitogenome reads both before and after enrichment. All 12S qPCR quantifications show significant positive power correlations with raw reads on-target before and after enrichment, with undiluted extract and indexed library quantifications correlating most strongly both before (R2 = 0.69, P = 0.002 and R2 = 0.66, P = 0.003, respectively) and after enrichment (R2 = 0.70, P = 0.001 and R2 = 0.80, P < 0.001, respectively). Interestingly, the ratios between the target values and total qPCR values (referred to hereafter as 12S:Total) are weaker predictors of HTS read counts, likely due at least in part to the well-documented length bias inherent to the polymerase used for our qPCR assays (AmpliTaq Gold; Invitrogen) (40).
This strong correspondence between the single-locus target qPCR measurements and mitogenome raw reads after enrichment is encouraging. However, raw on-target read proportions are often irrelevant in many sequencing projects, with the reliability of consensus calls more appropriately gauged by unique read coverage. Prior to enrichment, qPCR values correlate comparably in strength between unique and raw on-target reads, but this correlation is insignificant following enrichment (best correlation R2 = 0.36, P = 0.0527). This makes sense since the enriched libraries show a range of complexities (Figure 3). Unsurprisingly, the number of indexing cycles correlates strongly with measures of post-enrichment complexity, such as overall duplication rate (positive power correlation R2 = 0.83, P = 0.0001) and relative increases in unique on-target reads between 100,000 and 3 million reads sequencing depth (negative power correlation R2 = 0.90, P < 0.0001). Indexing cycles also predict raw reads on-target after enrichment with comparable strength as undiluted extract qPCRs (R2 = 0.74, P < 0.0007). As such, scaling the original 0.1× extract target qPCR values by indexing cycles greatly improves their correlation with post-enrichment unique read count, especially when also scaled by the dilution factor of the library used in the indexing amplification reaction (Figure 2, “0.1× extract scaled”).
These observations demonstrate that simple qPCR metrics can indeed predict on-target read counts after enrichment, and therefore could form the basis of equilibration schemes for efficient sequencing of enriched aDNA libraries. However, the relationship between qPCR and reads-on-target is nonlinear, and for any given combination of samples, targets, enrichment techniques, and data analysis strategies, pilot screening of a subset of enriched samples would be beneficial for accurate equilibration.
In addition to qPCR read count correlations, we also examined variables that potentially influence enrichment rate—the ratio between on-target read proportions after and before enrichment (Table 3)—as well as correlations between bait properties and the intra-target variation in enrichment rate and pre- and post-enrichment coverage patterns (Supplementary Figure S2). Correlations between these properties and per-base mitogenome coverage are included in Supplementary Tables S3 and S4.
Whether measured by increases in the qPCR-derived 12S:Total or on-target HTS reads, enrichment rates correlate negatively with starting abundances, a relationship observed in other aDNA enrichment data sets (4, 7). This suggests a pattern of diminishing returns wherein the degree to which samples could benefit from additional rounds of enrichment might decrease as the endogenous portion approaches a certain threshold. In addition to abundances, the number of indexing cycles (related to starting input DNA) is again a strong predictive variable, correlating negatively with unique enrichment rate (R2 = 0.84, P < 0.0001).
Bait concentration was influential as well. As measured by increases in the 12S:Total ratio, enrichment rates ranged from 22- to 2,217-fold (mean = 221) when enriched with 2.5 ng of baits, with the 2 lowest-copy libraries (Mammoths 7 and 11) failing to enrich in 1 and both replicates, respectively. With 25 ng of baits, enrichment rates improved 15- to 1374-fold over their 2.5 ng counterparts, with Mammoth 7 and 11 successfully enriching in both replicates. However, inter-replicate enrichment consistency did not improve significantly when bait concentration was increased (Student's t-test on inter-replicate coefficients of variation P = 0.29).
Bait coverage, which we designed to be deeper (and more diverse) in mitogenomic regions known to be polymorphic in extinct proboscideans, may have also affected enrichment. For all mammoth samples, whether analyzed at 3 million reads or at total available depth, bait and read coverage depth almost universally increased in positive correlation with aligned read coverage after enrichment. Bait coverage also correlates positively with raw enrichment rates. While not as strong an association as seen in other enrichment studies (e.g., Reference (41), this suggests that relatively minor bait coverage variation (mean coverage = 51×, standard deviation = 12.6×) impacts post-enrichment coverage.
The propensity for individual baits to form secondary structures also appeared to impact enrichment. As with bait coverage, correlations between coverage or enrichment rates and per-base average bait hairpin and dimer scores became universally stronger following enrichment (in this case, negatively). However, it is difficult to rule out inter-locus amplification bias as the origin of this pattern; only two samples (Mammoths 9 and 10) show low correlations between regional duplication rate and unique coverage depth, and for these samples unique enrichment rates do not correspond to bait properties. That amplification biases appear to dictate coverage patterns in this data set clearly encourages the use of amplification-minimal techniques (15, 42, 43) with less biased DNA polymerases (40, 44) or emulsion PCR (45).
Both qPCR measures and experimental design features correlate with post-enrichment read counts, enrichment rates, and coverage metrics. More sophisticated qPCR strategies, such as multi-locus and/or techniques that simultaneously estimate the target insert length distribution (22, 23, 46) might improve predictive power. Predicting unique target reads from qPCR metrics is a more complicated task, requiring complexity-based modification to be accurate. Therefore, normalizing complexity between samples as much as possible is obviously recommended, such as by equalizing starting library molarity prior to indexing as well as indexing amplification cycles. Since higher bait concentration and bait coverage corresponded to higher enrichment, maintaining even target coverage depth in bait sets and/or modifying bait sequences to reduce self-dimer/hairpin propensity might also improve enrichment consistency. However, it is unclear how these correspondences depend on hybridization and washing conditions, especially temperature and salt concentration, among other things. Given the demonstrated power of EHC for aDNA, such knowledge gaps encourage continued systematic evaluation of how enrichment is affected by experimental parameters and sample characteristics, beginning with variables that are cheaply assessed for large sample sets or easily controlled during experimental design. Author contributions
All authors designed the experiments. JE executed the experiments, performed analyses, and prepared the manuscript. All authors edited the manuscript.
We thank the entire McMaster Ancient DNA Centre for guidance and discussion during the experiments and manuscript preparation. S. Horn, J. Krause, and S. Sawyer graciously provided accessory data to their published works referenced here. M. Clementz, D. Fisher, C.R. Harington, P. Matheus, and G. Zazula and their respective institutions provided mammoth samples. We also thank the McMaster Farncombe Family Digestive Health Research Institute, especially C.E. King, for performing the sequencing experiments and data pre-processing. Comments from two anonymous reviewers greatly improved the manuscript. This research was funded by NSERC and CRC grants to HP and contributions from MYcroarray.
JMR has financial interest in MYcroarray, the company providing the enrichment kit for this study.
Address correspondence to Jacob Enk, McMaster Ancient DNA Centre, McMaster University, Hamilton, Ontario, Canada. E-mail: [email protected]
1.) Meyer, M., M. Kircher, M.T. Gansauge, H. Li, F. Racimo, S. Mallick, J.G. Schraiber, F. Jay. 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222-226. 2.) Poinar, H.N., C. Schwarz, J. Qi, B. Shapiro, R.D. Macphee, B. Buigues, A. Tikhonov, D.H. Huson. 2006. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311:392-394. 3.) Green, R.E., J. Krause, A.W. Briggs, T. Maricic, U. Stenzel, M. Kircher, N. Patterson, H. Li. 2010. A Draft Sequence of the Neandertal Genome. Science 328:710-722. 4.) Avila-Arcos, M.C., E. Cappellini, J.A. Romero- Navarro, N. Wales, J.V. Moreno-Mayar, M. Rasmussen, S.L. Fordyce, R. Montiel. 2011. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA. Sci Rep. 1:74. 5.) Bos, K.I., V.J. Schuenemann, G.B. Golding, H.A. Burbano, N. Waglechner, B.K. Coombes, J.B. McPhee, S.N. DeWitte. 2011. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478:506-510. 6.) Briggs, A.W., J.M. Good, R.E. Green, J. Krause, T. Maricic, U. Stenzel, C. Lalueza- Fox, P. Rudan. 2009. Targeted Retrieval and Analysis of Five Neandertal mtDNA Genomes. Science 325:318-321. 7.) Burbano, H.A., E. Hodges, R.E. Green, A.W. Briggs, J. Krause, M. Meyer, J.M. Good, T. Maricic. 2010. Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328:723-725. 8.) Fu, Q., M. Meyer, X. Gao, U. Stenzel, H.A. Burbano, J. Kelso, and S. Paabo. 2013. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl. Acad. Sci. USA 110:2223-2227. 9.) Horn, S. 2012. Case study: enrichment of ancient mitochondrial DNA by hybridization capture. Methods Mol. Biol. 840:189-195. 10.) Krause, J., A.W. Briggs, M. Kircher, T. Maricic, N. Zwyns, A. Derevianko, and S. Paabo. 2010. A complete mtDNA genome of an early modern human from Kostenki, Russia. Current biology: CB 20:231-236. 11.) Krause, J., Q. Fu, J.M. Good, B. Viola, M.V. Shunkov, A.P. Derevianko, and S. Paabo. 2010. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature 464:894-897. 12.) Mason, V.C., G. Li, K.M. Helgen, and W.J. Murphy. 2011. Efficient cross-species capture hybridization and next-generation sequencing of mitochondrial genomes from noninvasively sampled museum specimens. Genome Res. 21:1695-1704. 13.) Reich, D., R.E. Green, M. Kircher, J. Krause, N. Patterson, E.Y. Durand, B. Viola, A.W. Briggs. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468:1053-1060. 14.) Schuenemann, V.J., K. Bos, S. DeWitte, S. Schmedes, J. Jamieson, A. Mittnik, S. Forrest, B.K. Coombes. 2011. Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death. Proc. Natl. Acad. Sci. USA 108:E746-E752. 15.) Sawyer, S., J. Krause, K. Guschanski, V. Savolainen, and S. Paabo. 2012. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7:e34131. 16.) Bouwman, A.S., S.L. Kennedy, R. Muller, R.H. Stephens, M. Holst, A.C. Caffell, C.A. Roberts, and T.A. Brown. 2012. Genotype of a historic strain of Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA 109:18511-18516. 17.) Vilstrup, J.T., A. Seguin-Orlando, M. Stiller, A. Ginolhac, M. Raghavan, S.C. Nielsen, J. Weinstock, D. Froese. 2013. Mitochondrial phylogenomics of modern and ancient equids. PLoS ONE 8:e55950. 18.) Fu, Q., A. Mittnik, P.L. Johnson, K. Bos, M. Lari, R. Bollongino, C. Sun, L. Giemsch. 2013. A revised timescale for human evolution based on ancient mitochondrial genomes. Current biology: CB 23:553-559. 19.) Farrell, R.E. 2010.Chapter 13 - Practical Nucleic Acid Hybridization. RNA Methodologies, 4th Editio. Academic Press, San Diego:283-299. 20.) Harrison, A., H. Binder, A. Buhot, C.J. Burden, E. Carlon, C. Gibas, L.J. Gamble, A. Halperin. 2013. Physico-chemical foundations underpinning microarray and next-generation sequencing experiments. Nucleic Acids Res. 41:2779-2796. 21.) Allentoft, M.E., M. Collins, D. Harker, J. Haile, C.L. Oskam, M.L. Hale, P.F. Campos, J.A. Samaniego. 2012. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. Biol. Sci. 279:4724-4733. 22.) Deagle, B.E., J.P. Eveson, and S.N. Jarman. 2006. Quantification of damage in DNA recovered from highly degraded samples--a case study on DNA in faeces. Front. Zool. 3:11. 23.) Schwarz, C., R. Debruyne, M. Kuch, E. McNally, H. Schwarcz, A.D. Aubrey, J. Bada, and H. Poinar. 2009. New insights from old bones: DNA preservation and degradation in permafrost preserved mammoth remains. Nucleic Acids Res. 37:3215-3229. 24.) King, C., R. Debruyne, M. Kuch, C. Schwarz, and H. Poinar. 2009. A quantitative approach to detect and overcome PCR inhibition in ancient DNA extracts. Biotechniques 47:941-949. 25.) Wales, N., J.A. Romero-Navarro, E. Cappellini, and M.T.P. Gilbert. 2012. Choosing the Best Plant for the Job: A Cost- Effective Assay to Prescreen Ancient Plant Remains Destined for Shotgun Sequencing. PLoS ONE 7:e45644. 26.) Meyer, M., and M. Kircher. 2010. Illumina sequencing library preparation for highly multiplexed target capture and sequencing Cold Spring Harb Protoc. . 27.) Kircher, M., S. Sawyer, and M. Meyer. 2012. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40:e3. 28.) Barnes, I., B. Shapiro, A. Lister, T. Kuznetsova, A. Sher, D. Guthrie, and M.G. Thomas. 2007. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr Biol. 17:1072-1075. 29.) Debruyne, R., G. Chu, C.E. King, K. Bos, M. Kuch, C. Schwarz, P. Szpak, D.R. Grocke. 2008. Out of America: Ancient DNA evidence for a new world origin of late quaternary woolly mammoths. Curr. Biol. 18:1320-1326. 30.) Gilbert, M.T.P., D.I. Drautz, A.M. Lesk, S.Y.W. Ho, J. Qi, A. Ratan, C.H. Hsu, A. Sher. 2008. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes. Proc. Natl. Acad. Sci. USA 105:8327-8332. 31.) Krause, J., P.H. Dear, J.L. Pollack, M. Slatkin, H. Spriggs, I. Barnes, A.M. Lister, I. Ebersberger. 2006. Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439:724-727. 32.) Enk, J., A. Devault, R. Debruyne, C.E. King, T. Treangen, D. O'Rourke, S.L. Salzberg, D. Fisher. 2011. Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths. Genome Biol. 12:R51. 33.) Rohland, N., A.S. Malaspinas, J.L. Pollack, M. Slatkin, P. Matheus, and M. Hofreiter. 2007. Proboscidean mitogenomics: Chronology and mode of elephant evolution using mastodon as outgroup. PLoS Biol. 5:e207. 34.) Untergasser, A., I. Cutcutache, T. Koressaar, J. Ye, B.C. Faircloth, M. Remm, and S.G. Rozen. 2012. Primer3--new capabilities and interfaces. Nucleic Acids Res. 40:e115. 35.) Martin, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10-12. 36.) Magoč, T., and S.L. Salzberg. 2011. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957-2963. 37.) Li, H., and R. Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754-1760. 38.) Neph, S., M.S. Kuehn, A.P. Reynolds, E. Haugen, R.E. Thurman, A.K. Johnson, E. Rynes, M.T. Maurano. 2012. BEDOPS: high-performance genomic feature operations. Bioinformatics 28:1919-1920. 39.) Ginolhac, A., M. Rasmussen, M.T. Gilbert, E. Willerslev, and L. Orlando. 2011. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27:2153-2155. 40.) Dabney, J., and M. Meyer. 2012. Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. Biotechniques 52:87-94. 41.) Mokry, M., H. Feitsma, I.J. Nijman, E. de Bruijn, P.J. van der Zaag, V. Guryev, and E. Cuppen. 2010. Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries. Nucleic Acids Res. 38:e116. 42.) Kozarewa, I., Z. Ning, M.A. Quail, M.J. Sanders, M. Berriman, and D.J. Turner. 2009. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods 6:291-295. 43.) Kozarewa, I., and D.J. Turner. 2011. Amplification-free library preparation for paired-end Illumina sequencing. Methods Mol. Biol. 733:257-266. 44.) Aird, D., M.G. Ross, W.S. Chen, M. Danielsson, T. Fennell, C. Russ, D.B. Jaffe, C. Nusbaum, and A. Gnirke. 2011. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12:R18. 45.) Kihana, M., F. Mizuno, R. Sawafuji, L. Wang, and S. Ueda. 2013. Emulsion PCR-coupled target enrichment: An effective fishing method for high-throughput sequencing of poorly preserved ancient DNA. Gene 528:347-351. 46.) Colotte, M., V. Couallier, S. Tuffet, and J. Bonnet. 2009. Simultaneous assessment of average fragment size and amount in minute samples of degraded DNA. Anal. Biochem. 388:345-347. 47.) Briggs, A.W., J.M. Good, R.E. Green, J. Krause, T. Maricic, U. Stenzel, and S. Paabo. 2009. Primer extension capture: targeted sequence retrieval from heavily degraded DNA sources. J Vis Exp.:e1573. 48.) Maricic, T., M. Whitten, and S. Paabo. 2010. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE 5:e14004.