to BioTechniques free email alert service to receive content updates.
Capturing protein-coding genes across highly divergent species
Chenhong Li1,3, Michael Hofreiter2, Nicolas Straube3, Shannon Corrigan3, and Gavin J.P. Naylor3
Full Text (PDF)
Supplementary Material

(Click to enlarge)

Performing 2 rounds of gene capture further increased the number of targets captured relative to the initial round of capture by 68%, on average. Once again, the improvement was most conspicuous where the bait and target species were highly divergent (see Table 1). The effectiveness of cross-species gene capture with the optimized protocol ranged from a minimum of 225 (for the amphibians) to a maximum of 1159 (between chicken and zebra finch) of a possible 1449 target CDS. The lowest number of target genes successfully captured (225) resulted when baits designed from the X. tropicalis genome were used to capture corresponding genes in the axolotl, A. mexicanum. We hypothesize that this result may be due to compromising effects associated with repeats in the unusually large (~32 × 109 bp) (24) genome of the axolotl. The protocol we used for target enrichment used human cot-1 DNA in an effort to block and therefore mitigate the adverse effect of repetitive DNA during hybridization. We suspect that tailored cot-1 DNA derived from A. mexicanum may be necessary to effectively block repeat elements impeding capture for the axolotl.

We investigated whether factors other than bait-target divergence affected the efficacy of gene capture. We compared the GC content, target sequence length, and chromosomal position between the set of targets that were successfully captured, and those that failed capture. This was carried out for the six positive controls (H. sapiens, G. gallus, X. tropicalis, A. carolinensis, D. rerio, and C. milii) where the bait and target sequences are identical, allowing us to rule out any confounding effects that might be caused by sequence divergence. On average, the few targets that were not captured had higher GC contents (56%) and shorter lengths (235 bp) than those that were successfully captured (GC = 49%, length = 303 bp). However, there was overlap in both GC content and target length between the captured and non-captured targets suggesting that other influences may be involved when targets are not captured. Chromosomal position did not seem to have any effect on capture success.

In summary, the efficacy of gene capture was improved by incorporating both touchdown gene capture and conducting a second round of capture, but there was considerable variation in efficacy across the five classes of gnathostome vertebrates tested. We hypothesize that this was due to differences in rates of molecular evolution among the pairs of vertebrates, the presence of genomic anomalies such as repeats that are known to interfere with gene capture (25), or secondary structural features that inhibited hybridization to the baits (26). Comparison within a class of vertebrates: chondrichthyan fishes

A total of 13 hybridization reactions were carried out [elephant shark (C. milii), five skates and rays (Aetobatus narinari, Leucoraja erinacea, Neotrygon kuhlii, Rhinobatos schlegelii, Torpedo formosa), and seven sharks (Carcharhinus amblyrhynchos, Chlamydoselachus anguineus, Etmopterus joungi, Heterodontus portusjacksoni, Isurusoxyrinchus, Orectolobus halei, Squatina nebulosa)]. Touchdown gene capture and the double capture protocol were deployed as in the first set of experiments across classes of gnathostome vertebrates. For the cross-species captures, we were able to obtain full sequences for 1004 of the 1449 target sequences in the worst case, and 1351 of 1449 target sequences in the best case (Table 2). We obtained 1449 of 1449 sequences for the positive control (C. milii baits tested against a C. milii library). These results show greater homogeneity in the efficacy of capture than was evident in the survey across vertebrate classes, confirming the expectation that the methods would be more consistent when applied to a denser taxonomic sample of more closely related lineages. A small subset of genes that were either not captured for the majority of taxa, or showed some evidence of potential paralogy due to local gene duplication were excluded from our final data set.

(Click to enlarge)

Overall, more than 90% of the captured sequences were assigned as orthologs by HaMSTR (22). The average identity between baits and successfully captured target sequences ranged from 61% to 98% (Figure 1). In total, we obtained 338,822 orthologous base pairs (of 418,475 initially targeted) across the 13 chondrichthyan orders (i.e., 81% of the original target set) with a maximum sequence divergence between bait and target of 39%. The final chondrichthyan data set included 1242 successfully captured putatively orthologous CDS that ranged in length from 112 to 5091 bp, with 51 CDS being more than 600 bp in length (Figure 2).

  1    2    3