2Department of Molecular & Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
In this study, we tested the NuGEN Ovation RNA-Seq System for library preparation followed by next-generation sequencing on an Illumina GAIIx. The cDNA product of the NuGEN kit may have significant amounts of ssDNA with hairpin structures that are generated during the amplification process. These structures interfere with efficient downstream end repair, A-tailing, and adapter ligation, all standard steps in post-amplification sequencing library construction. We were able to increase the efficiency of sequencing library yields 4- to 6-fold or greater by treatment of NuGENamplified cDNA product with the single-strand endonuclease S1. These results suggest that this treatment effectively cleaves hairpin structures generated during amplification that are resistant to the standard enzyme cocktails used for the end-repair step.
Next-generation sequencing (NGS) technology has significantly altered the economics of genomics and opened up new avenues of research such as wholegenome and exome sequencing (1), RNA-seq (2,3), ChIP-seq, and small RNA-seq (4) that were previously costprohibitive. New methods for more efficient and lower cost preparation of sequencing libraries have been developed (5) and new commercial kits provide reagents and consumables for specific NGS applications.
NuGEN (San Carlos, CA, USA) recently introduced a new protocol for RNA-seq to simplify library preparation starting from small amounts of total RNA without preselection of mRNA or other steps to reduce rRNA contamination. The NuGEN kit uses a single primer isothermal amplification (SPIA) method to amplify RNA target into cDNA. This cDNA can then be brought into standard Illumina (San Diego, CA, USA) library preparation protocols that involve end repair, A-tailing, and ligation of selected sequencing adapters.
Initial testing of NuGEN-amplified cDNA for library preparation yielded lower-than-expected amounts of properly ligated product when compared with DNA libraries prepared with our standard protocols, where we routinely begin with 100 ng dsDNA. We hypothesized that the end-repair steps critical to the Illumina protocol may not be as efficient for the NuGEN-amplified cDNA due to hairpin structures formed at the cDNA ends. Since the end-repair enzymes include single-strand exonuclease and polymerase functions, hairpin structures would be resistant to blunt-end conversion by these enzymes. To test this hypothesis, we treated NuGEN-amplified cDNA with the single-strand–specific endonuclease S1 (Promega, Madison, WI, USA; 50 U/μL for 30 min at room temperature). This treatment should cleave any hairpins created during the NuGEN protocol allowing them to convert to blunt-end cDNA using the standard end-repair cocktail in the Illumina protocol.
Initially, a series of six total RNA samples were extracted from human CD4+ T cells before and after activation with anti-CD3/anti-CD28 beads (Invitrogen, Carlsbad, CA, USA), a well-established model for the immune response (6). Total RNA was prepared using standard TRIzol (Invitrogen) methods for four of the samples and AllPrep (Qiagen, Valencia, CA, USA) RNA purification for two of the samples. After confirming RNA quantity and integrity [Nanodrop (Wilmington, DE, USA) and Agilent Bioanalyzer (Santa Clara, CA, USA) analysis, respectively; see BioAnalyzer traces in the Supplementary Materials], 100 ng each sample was converted to cDNA using the NuGEN protocol. Then 100 ng NuGEN-amplified cDNA from each sample was taken into a standard Illumina sequencing library preparation directly or after treatment with S1 endonuclease. We tested library products at two size ranges of ~300 bp and 750 bp in length, corresponding to typical protocols for shortand long-read sequencing, respectively. The two products were isolated from agarose gels and PCR-amplified for 15 cycles using Phusion polymerase (Finnzymes, Vantaa, Finland). PCR products were then analyzed on a 2% agarose gel (Figure 1A). The longer library products are close to the limit of what can efficiently form clusters for sequencing on Illumina flowcells but would be suitable for longer read technologies.
It is clear from the gel results (Figure 1A) that S1 endonuclease treatment of the cDNA before proceeding with library preparation significantly enhanced the yield for both products. In fact, the larger 750-bp products are not detectable at all in the untreated samples but are clearly visible in the S1-treated samples. Quantitative analysis of the gel image confirmed these conclusions (Figure 1B).
In a second experimental series, we tested whether simply increasing the input amount of cDNA could improve the yields. A single sample was processed using two different input amounts of cDNA in the library preparation. Aliquots of 100 ng and 300 ng were either S1-treated or left untreated before continuing the protocol to end-repair, A-tailing, and adapter ligation. These four libraries were run on 2% agarose gels and bands were cut out at 300, 400, 500, 600, and 750 bp followed by 15 cycles of PCR. After PCR, the size-selected library products were run on a second 2% agarose gel (Figure 2A) and quantified (Figure 2B). Again, results show the improved yield after S1 treatment for both input amounts. In this case we found somewhat better yields with 100 ng relative to 300 ng input cDNA. It is probable that these better yields were the result of reaction conditions more optimal for starting with a smaller amount of cDNA. We performed an additional experiment using 1 g cDNA and library preparation reagent concentrations optimized for this scale of reaction, and observed increased yields for S1-treated cDNA (4–20-fold) while reducing the total number of PCR cycles (6–8) (see Supplementary Figures S6 and S7). Thus, the S1 treatment is beneficial for improving library yields under a variety of optimized library preparation conditions.
All the S1-treated 300-bp libraries shown in Figure 1A were then sequenced on an Illumina GAIIx to generate pairedend reads of 60 bp each. Resultingdata were aligned using Pipeline1.5_CASAVA1.0 (Illumina; Table 1). It is interesting to note that RNA extraction for samples 5 and 6, representing resting and activated CD4 T cells from Donor 3, were done with the AllPrep RNA purification kit (Qiagen) and demonstrated significantly lower yields of unique matches for mRNA and correspondingly higher ribosomal RNA (rRNA) matches relative to RNA extracted using the TRIzol method. These results suggest that RNA extraction methods need careful evaluation for their impact on the RNA-seq data. The TRIzol method is performed primarily in solution phase using organic extraction, ethanol precipitation, and washes, whereas the AllPrep method purifies the RNA through a series of column separation steps. It is not clear how these differences would affect the final distribution of RNA-seq reads and due to the small sample size we hesitate to draw conclusions beyond our observations. All RNA samples were of similar high quality, with RIN scores of >9 (see Supplementary Material). The alignment statistics from libraries prepared using TRIzol-extracted RNA are ~40%.
Finally, in order to compare RNA-seq data from S1-treated and untreated libraries, we tested RNA from Donor 1 CD4+ memory and naive T cells (extracted using the AllPrep method). cDNA was prepared and subsequently either S1-treated or left untreated before continuing the protocol to end-repair, A-tailing, and adapter ligation. Each library was loaded into a separate lane of a single-read flow-cell and sequenced 40 bases. For all samples, the proportions of mRNA and mitochondrial reads were similar to Donor 3, also extracted using the AllPrep protocol. However the percent of rRNA reads was substantially lower in the memory and naive cell libraries and not influenced by S1 treatment (Supplementary Table S1). The reason for this difference in rRNA reads is not clear but may reflect an improvement in kit reagents by NuGEN during the time period between the processing of the first six samples and then the memory and naive samples ~14 months later.
Correlations between gene expression values as measured by reads per kilobase per million reads (RPKM) show consistent relative gene expression for both memory and naive cells in S1-treated and untreated samples (Spearman's rank correlation coefficient, >0.98; Supplementary Figures S3 and S4). Analysis of differential expression between memory and naive cells in the S1-treated and untreated sets also shows very high correlation (Pearson's correlation coefficient ~0.82 across >9000 genes with RPKM >1 for all four samples; Supplementary Figure S5). Finally, the percent of duplicate reads was analyzed for the four samples with values of 49% and 57% for the S1-treated and untreated memory cells, respectively, and 42% and 50% for the S1-treated and untreated naive cells, respectively.
Overall, these results suggest some improvement in library quality resulting from the S1 treatment.
In conclusion, we have successfully performed RNA-seq using the NuGEN Ovation RNA-Seq System on multiple samples of total RNA prepared from human CD4+ T cells. The results demonstrate that the addition of the pretreatment step with S1 endonuclease improves the yields of sequencing libraries by 4–6-fold or greater without reducing the quality of the library. Indeed, the advantages of improving yield include a reduction in the number of PCR cycles required for the protocol, thus potentially reducing PCR bias and duplicate reads, and thereby improving the complexity in the sequencing library. We believe that the addition of the S1 treatment is particularly relevant for preparing sequencing libraries from longer cDNA products where lower yields are more problematic. Thus the addition of the S1 endonuclease step into the current NuGEN protocol is a technically simple and low-cost way to maximize yields of high-quality cDNA libraries produced for RNA-seq.
This research was supported by funds from the National Institutes of Health [NIH; grant nos. U19 A1063603-06 (to D.R.S., S.R.H., J.S., and L.S.) and T32DK007022-30 (to HKK)], the Molly Baber Research Fund, and the Verna Harrah Research Fund supporting the Salomon laboratory. The NuGEN Ovation kits and additional support for deepsequencing were provided in the form of reagents as an unrestricted research grant from NuGEN. This paper is subject to the NIH Public Access Policy.
The authors declare no competing interests. There were no consulting fees, existing contracts, stocks or options, intellectual property arrangements, or any other form of conflict of interest issues between the Scripps researchers and NuGEN. No NuGEN employee had any role in the experimental designs or the data interpretation and manuscript preparation.
Address correspondence to Steven R. Head, Next Generation Sequencing Core, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA, 92040, USA. e-mail: [email protected]
1.) Archibald, A.L., L. Bolund, C. Churcher, M. Fredholm, M.A. Groenen, B. Harlizius, K.T. Lee, D. Milan. 2010. Pig genome sequence—analysis and publication strategy. BMC Genomics 11:438.[CrossRef] [PubMed] 2.) Hosseini, P., A. Tremblay, B.F. Matthews, and N.W. Alkharouf. 2010. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets. BMC Res Notes. 3:183.[CrossRef] [PubMed] 3.) Martí, E., L. Pantano, M. Bañez-Coronel, F. Llorens, E. Miñones-Moyano, S. Porta, L. Sumoy, I. Ferrer. 2010. A myriad of miRNA variants in control and Huntington's disease brain regions detected by massively parallel sequencing. Nucleic Acids Res. 38:7219-7235.[CrossRef] [PubMed] 4.) Hawkins, R.D., G.C. Hon, L.K. Lee, Q. Ngo, R. Lister, M. Pelizzola, L.E. Edsall, S. Kuan. 2010. Distinct epigenomic landscapes of pluripotent and lineagecommitted human cells. Cell Stem Cell 6:479-491.[CrossRef] [PubMed] 5.) Quail, M.A., I. Kozarewa, F. Smith, A. Scally, P.J. Stephens, R. Durbin, H. Swerdlow, and D.J. Turner. 2008. A large genome center's improvements to the Illumina sequencing system. Nat. Methods. 5:1005-1010.[CrossRef] [PubMed] 6.) Trickett, A., and Y.L. Kwan. 2003. T cell stimulation and expansion using anti-CD3/CD28 beads. J. Immunol. Methods 275:251-255.[CrossRef] [PubMed]