Enrichments used 3.4 μL of indexed library (an estimated 1.5–18.8 ng of template based on total qPCR quantification) rather than the recommended 100–500 ng, as we have successfully enriched concentrations this low previously (data not shown).
Given the short length of the genomic regions targeted, we scaled down the input bait concentration from the typical 500 ng, which is appropriate for up to 2 Mb targets. All 13 libraries as well as 3.4 μL of EBT were enriched in replicate using 2.5 ng total input of each bait set, and then 9 libraries (Mammoths 4, 6–11 and associated blanks) and EBT were enriched in replicate using 25 ng.
We used blocking oligonucleotides matching both strands rather than one strand of double-indexed library (26). These blocks were also used as bait diluents, maintaining 500 ng oligonucleotides per reaction.
The second 2.5 ng bait experiment and both 25 ng bait experiments were erroneously performed using 95% and 105% of the recommended amounts of Block 1 and Block 2, respectively.
Reactions hybridized for between 37 and 39 h at 45°C with 45°C wash steps, rather than 65°C. Previous experiments indicated this lower temperature would improve the capture of short, rare inserts typical to our samples (data not shown).
We used 20 μL of hydrophobic Dynabeads MyOne Streptavidin T1 beads (Invitrogen, Life Technologies, Carlsbad, CA) for enrichment cleanup rather than hydrophilic C1 beads. A previous version of the kit called for hydrophobic M280 beads.
Each enrichment elution was purified to 30 μL EBT with MinElute. An 8 μL aliquot was then amplified using primers that match the 5′ flanks of each index (IS5_reamp. P5: AATGATACGGCGACCACCGA; IS6_reamp.P7: CAAGCAGAAGACGGCATACGA) (26) following the recipe and program in Supplementary Table S6. Reactions were put on ice and purified to 21 μL EBT with MinElute. Sequencing, demultiplexing
Non-enriched and enriched libraries were sequenced on two separate lanes of a HiSeq 1000 Illumina flowcell. cBot cluster generation and sequencing employed the version 3 chemistry and a 2 × 101 bp dual 8 bp indexing protocol, using the alternative primer mixes from the TruSeq Dual Index Sequencing Primer Kit (paired-end). Raw data were processed with HCS version 220.127.116.11 and RTA version 18.104.22.168. File conversion and demultiplexing using each 7 bp reverse index (requiring a 100% match) were performed using CASAVA version 1.8.2. Curation
Reads were trimmed of the 3′ universal adapter sequence (AGATCGGAAGAGC) using cutadapt version 1.2 (35), requiring a 1 bp overlap (-O 1) and tolerating 16% sequence divergence (-e 0.16). Paired reads were then merged with FLASH version 1.0.3 (36), requiring an 11 bp overlap (-m 11) and tolerating 15% overlap divergence (-x 0.15). These merged reads were then combined with the non-mergeable Read 1s to generate a final data set for downstream analysis. We used random 3 million read subsets from each library to analyze on-target raw and unique read proportions, while coverage patterns were analyzed using these and full read sets. Alignment
Using Burrows-Wheeler Aligner (BWA) version 0.6.1-r104 (37), we aligned reads 24 bp and longer to the light strand of a mammoth mitochondrial genome with the VNTR masked with 10 Ns (GenBank Accession No: JF912200.1, a haplogroup C/Clade I mammoth from Alaska). We used default aln parameters except disabled seeding (-l 9999) and allowed 2% of assemblies to be missed assuming a 2% error rate (-n 0.02). We also mapped reads to the nuclear targets (42,127 bp), but given the dominance of repeat-region reads in these alignments, we did not analyze this data further, though alignment rates to hard-masked nuclear targets are shown in Supplementary Table S5. Resulting SAM files were converted to BED with BEDOPS version 2.2.0 (38), and using these we generated complexity curves by randomly sampling raw mapped reads in increments corresponding to 100,000 total library reads up to the available total reads per sample. Unique reads were identified by collapsing reads with unique 5′ and 3′ coordinates and direction. Fragment misincorporation plots, which exhibit typical aDNA deamination patterns, were generated with mapDamage version 2.0 (39) (Supplementary Figures S1a–S1k). Data availability
HTS data for this project can be found on the NCBI SRA at Study Accession #SRP026317. Results and discussion
The qPCR results from each mammoth sample, as measured using (i) original DNA extracts, (ii) non-enriched sequencing libraries, and (iii) enriched sequencing libraries, are listed in Table 2 and Supplementary Table S1 along with the HTS read counts and proportions of reads that map to the mitochondrial genome out of random 3 million read subsets. The nature of qPCR assays requires their values be carefully compared, since they estimate two different parameters. Our mammoth 12S assay measures locus frequency among all fragments ~49 bp and greater, and therefore correlation of this result to on-target HTS reads will vary depending on the endogenous insert length distribution as well as any intra-target abundance biases. The total assay estimates copy numbers of a heterogeneous length distribution, so this accuracy is also dependent on the overall library length distribution and amplification biases.