In order to assess cDNA populations enriched following HAC, DSN, or Ribo-Zero treatment, mapped reads were binned based on where the reads aligned to the E. coli K-12 or the hg19 genome. For E. coli, mapped reads were binned into the following categories: rRNA, genes, intergenic, and tRNA (Figure 2A), and for human RNA, the category breakdown was rRNA, miRNA, exon, intron, lincRNA, intergenic, mitochondrial RNA, and tRNA (Figure 2B). In the untreated E. coli cDNA controls, 86% of E. coli reads mapped to rRNA transcripts compared with 14.4% of reads in the HAC ss-cDNA fraction and 22.3% in the Ribo-Zero RNA-seq libraries. Effective removal of rRNA sequences in both the HAC normalized and Ribo-Zero cDNA libraries corresponded to a 9.7-fold and an 8.7-fold increase, respectively, in the total number of reads mapping to non-rRNA E. coli K-12 genes. In the human PBMC cDNA libraries, the HAC fractions had a mean 28% rRNA sequences mapped, the DSN libraries contained 39% rRNA sequences while the untreated controls had 84% rRNA mapped. Removal of rRNA sequences by HAC normalization resulted in a 4.5-fold increase in the overall mapped non-rRNA reads compared with 3.8-fold for DSN normalized cDNA libraries. Like the rRNA fraction, the tRNA gene fraction was also decreased by HAC and DSN normalization, but all other RNA categories were increased (Figure 2B, Supplementary Table 3). Further analysis of highly abundant small RNAs (<200bps) showed both HAC and DSN normalization biased against these RNA populations (miRNAs and snoRNAs), so only RNAs >200bps were used in downstream analyses (Supplementary Figure 3). Thus, HAC, DSN, and Ribo-Zero treated RNA-seq libraries led to a greater proportion of SGS reads mapping to non-rRNA species in the cDNA libraries, as well as more in-depth coverage over a broader array of E. coli and human PBMC RNA transcripts.
Enrichment of protein-coding sequences in E. coli K-12 and human PBMC cDNA libraries
While depleting high abundant rRNA sequences from a total RNA population prior to SGS, it is critical to maintain overall relative abundances of intracellular RNAs by minimizing biases toward underrepresented non-rRNA populations. In this study we utilized the E. coli K-12 genome to obtain deep coverage across the entire transcriptome to assess any potential biases introduced by our HAC normalization method, and then compared these results with the commercially available Ribo-Zero kit. First, we analyzed total E. coli K-12 transcriptome coverage profiles generated by the HAC and Ribo-Zero RNA-seq libraries (Figure 3). In both cases, E. coli K-12 transcriptome profiles were comparable across technical replicate series as well as to the untreated RNA-seq control libraries. Second, we compared the enrichment of gene-coding RNA transcripts by HAC and Ribo-Zero treated RNA-seq libraries based on intracellular RNA transcript abundance and size (Figure 4, Supplementary Figure 4). Enrichment of E. coli K-12 gene-coding transcripts normalized against the untreated controls showed an overall 6-fold enrichment across the entire E. coli K-12 transcriptome regardless of transcript abundance or size. Human PBMC total RNA-seq libraries treated by both DSN and the microcolumn HAC normalization methods increased the number of hits on exon sequences by 3–4 fold, but showed more variability when comparing across transcript abundance and size ranges (Figure 4, Supplementary Figure 3). The observed variability in the PBMC RNA-seq libraries is most likely due to the lack of transcriptome coverage depth across the hg19 genome. The evenness of 5′ and 3′ end coverage of gene coding RNA transcripts was assessed for both E. coli K-12 and human PBMC RNA-seq libraries (Supplementary Figure 5) and difference in gene coverage at the terminal ends between the untreated and treated RNA-seq libraries was found. The increased number of SGS hits to gene coding RNA transcripts in both the E. coli K-12 and human PBMC HAC normalized RNA-seq libraries resulted in broader and deeper coverage of those sequences without an appreciable bias on the overall transcriptional profile.
Hydroxyapatite chromatography is a well-established method for separating different nucleic acid species (RNA, ssDNA, dsDNA) from complex sample types. However, the methodology has not been widely applied in SGS RNA-seq library preparation workflows, possibly as a result of perceived disadvantages (34) including labor intensiveness, unacceptably high starting material requirements, and poor reproducibility (17). The micro-column HAC normalization method described here eliminates such issues and provides a viable alternative to other commercially available rRNA depletion kits for RNA-seq applications. This micro-column HAC system reproducibly separated ss-cDNA versus ds-cDNA fractions from simple (E. coli K-12) and complex nucleic acid populations (human PBMCs) and effectively reduced the number of highly abundant intracellular rRNA sequences found in the ssDNA fractionated RNA-seq libraries to levels comparable to the commercially available Ribo-Zero and DSN normalization kits. HAC-based cDNA normalization methods feature several advantages over the current rRNA depletion protocols, including a greater flexibility in total RNA sample input amount, assay conditions (temperature, reagents) and sample types (prokaryotic and eukaryotic) at a fraction of the cost per sample (<< $1 per sample in reagent costs). It should also be noted that HAC-based cDNA normalization can be easily integrated into any RNA-seq library preparation protocol that generates ds-cDNA as a product prior to SGS. Moreover, HAC normalization preserves the rRNA enriched ds-cDNA fraction for further analysis, if desired, which is especially useful when sequencing highly complex environmental samples where comprehensive 16S rRNA profiling is often required for characterizing diverse microbial communities (35).
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. The authors are grateful to Ron Renzi for his support of the technology components utilized in this work.
The authors declare no competing interests.
Address correspondence to Todd W. Lane, Sandia National Laboratories, MS 9292, Livermore CA, USA. Email: [email protected]
1.) Ozsolak, F., and P.M. Milos. 2011. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12:87-98. 2.) Cosart, T., A. Beja-Pereira, S. Chen, S.B. Ng, J. Shendure, and G. Luikart. 2011. Exome-wide DNA capture and next generation sequencing in domestic and wild species. BMC Genomics 12:347. 3.) Gordo, S.M., D.G. Pinheiro, E.C. Moreira, S.M. Rodrigues, M.C. Poltronieri, O.F. De Lemos, I.T. da Silva, R.T. Ramos. 2012. High-throughput sequencing of black pepper root transcriptome. BMC Plant Biol. 12:168. 4.) Park, K.D., J. Park, J. Ko, B.C. Kim, H.S. Kim, K. Ahn, K.T. Do, H. Choi. 2012. Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq. BMC Genomics 13:473. 5.) Balakrishnan, C.N., Y.C. Lin, S.E. London, and D.F. Clayton. 2012. RNA-seq transcriptome analysis of male and female zebra finch cell lines. Genomics. 6.) Pinto, A.C., H.P. Melo-Barbosa, A. Miyoshi, A. Silva, and V. Azevedo. 2011. Application of RNA-seq to reveal the transcript profile in bacteria. Genet. Mol. Res. 10:1707-1718. 7.) Antoniou, E., and R. Taft. 2012. Gene expression in mouse oocytes by RNA-Seq. Methods Mol. Biol. 825:237-251. 8.) Roberts, A., H. Pimentel, C. Trapnell, and L. Pachter. 2011. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325-2329. 9.) Costa, V., C. Angelini, I. De Feis, and A. Ciccodicola. 2010. Uncovering the complexity of transcriptomes with RNA-Seq. J. Biomed. Biotechnol. 2010:853916. 10.) Chen, G., K. Yin, L. Shi, Y. Fang, Y. Qi, P. Li, J. Luo, B. He. 2011. Comparative Analysis of Human Protein-Coding and Noncoding RNAs between Brain and 10 Mixed Cell Lines by RNA-Seq. PLoS ONE 6:e28318. 11.) He, S., O. Wurtzel, K. Singh, J.L. Froula, S. Yilmaz, S.G. Tringe, Z. Wang, F. Chen. 2010. Validation of two ribosomal RNA removal methods for microbial metatranscriptomics. Nat. Methods 7:807-812. 12.) Xu, A.G., L. He, Z. Li, Y. Xu, M. Li, X. Fu, Z. Yan, Y. Yuan. 2010. Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq. PLOS Comput. Biol. 6:e1000843. 13.) Costa, V., C. Angelini, L. D'Apice, M. Mutarelli, A. Casamassimi, L. Sommese, M.A. Gallo, M. Aprile. 2011. Massive-scale RNA-Seq analysis of non ribosomal transcriptome in human trisomy 21. PLoS ONE 6:e18493. 14.) Giannoukos, G., D.M. Ciulla, K. Huang, B.J. Haas, J. Izard, J.Z. Levin, J. Livny, A.M. Earl. 2012. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 13:R23. 15.) Peterson, D.G., S.R. Schulze, E.B. Sciara, S.A. Lee, J.E. Bowers, A. Nagel, N. Jiang, D.C. Tibbitts. 2002. Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery. Genome Res. 12:795-807. 16.) Whitelaw, C.A., W.B. Barbazuk, G. Pertea, A.P. Chan, F. Cheung, Y. Lee, L. Zheng, S. van Heeringen. 2003. Enrichment of gene-coding sequences in maize by genome filtration. Science 302:2118-2120. 17.) Gijavanekar, C., U. Strych, Y. Fofanov, G.E. Fox, and R.C. Willson. 2012. Rare target enrichment for ultrasensitive PCR detection using cot–rehybridization and duplex-specific nuclease. Anal. Biochem. 421:81-85. 18.) Zhulidov, P.A., E.A. Bogdanova, A.S. Shcheglov, L.L. Vagner, G.L. Khaspekov, V.B. Kozhemyako, M.V. Matz, E. Meleshkevitch. 2004. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 32:e37. 19.) Anisimova, V.E., D.V. Rebrikov, D.A. Shagin, V.B. Kozhemyako, N.I. Menzorova, D.B. Staroverov, R. Ziganshin, L.L. Vagner. 2008. Isolation, characterization and molecular cloning of duplex-specific nuclease from the hepatopancreas of the Kamchatka crab. BMC Biochem. 9:14. 20.) Ko, M.S. 1990. An ‘equalized cDNA library’ by the reassociation of short double-stranded cDNAs. Nucleic Acids Res. 18:5705-5711. 21.) Soares, M.B., M.F. Bonaldo, P. Jelene, L. Su, L. Lawton, and A. Efstratiadis. 1994. Construction and characterization of a normalized cDNA library. Proc. Natl. Acad. Sci. USA 91:9228-9232. 22.) Puzyrev, A.T., K. Chroniary, and N.K. Moschonas. 1995. A normalized cDNA library from human erythroleukemia cells. Mol. Biol. (Mosk.) 29:97-103. 23.) Andrews-Pfannkoch, C., D.W. Fadrosh, J. Thorpe, and S.J. Williamson. 2010. Hydroxyapatite-mediated separation of double-stranded DNA, single-stranded DNA, and RNA genomes from natural viral assemblages. Appl. Environ. Microbiol. 76:5039-5045. 24.) Fadrosh, D.W., C. Andrews-Pfannkoch, and S.J. Williamson. 2011. Separation of single-stranded DNA, double-stranded DNA and RNA from an environmental viral community using hydroxyapatite chromatography. J Vis Exp.. 25.) Chirica, G., J. Lachmann, and J. Chan. 2006. Size exclusion chromatography of microliter volumes for on-line use in low-pressure microfluidic systems. Anal. Chem. 78:5362-5368. 26.) Gil, G.C., J. Brennan, D.J. Throckmorton, S.S. Branda, and G.S. Chirica. 2011. Automated analysis of mouse serum peptidome using restricted access media and nanoliquid chromatography-tandem mass spectrometry. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 879:1112-1120. 27.) Stachowiak, J.C., E.E. Shugard, B.P. Mosier, R.F. Renzi, P.F. Caton, S.M. Ferko, J.L. Van de Vreugde, D.D. Yee. 2007. Autonomous microfluidic sample preparation system for protein profile-based detection of aerosolized bacterial cells and spores. Anal. Chem. 79:5763-5770. 28.) Levin, J.Z., M. Yassour, X. Adiconis, C. Nusbaum, D.A. Thompson, N. Friedman, A. Gnirke, and A. Regev. 2010. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7:709-715. 29.) Lefrancois, P., W. Zheng, and M. Snyder. 2010.CHIP-SEQ: Chip-Seq: using high-throughput DNA sequencing for genome-wide identification of transcription factor binding sites. In J. Weissman (Eds.) Methods in Enzymology, 2nd Editio. Guide to Yeast Genetics: Functional Genomics, Proteomics, and Other Systems Analysis:77-104. 30.) Langmead, B., and S.L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357-359. 31.) Quinlan, A.R., and I.M. Hall. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841-842. 32.) Trapnell, C., L. Pachter, and S.L. Salzberg. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105-1111. 33.) Patanjali, S.R., S. Parimoo, and S.M. Weissman. 1991. Construction of a uniform-abundance (normalized) cDNA library. Proc. Natl. Acad. Sci. USA 88:1943-1947. 34.) Shcheglov, A.S.Z. P.A., E.A. Bogdanova, and D.A. Shagin. 2007.Normalization of cDNA libraries. In A.L. S. Buzdin (Ed.) Nucleic Acids Hybridization. Springer:97-124. 35.) Bartram, A.K., M.D. Lynch, J.C. Stearns, G. Moreno-Hagelsieb, and J.D. Neufeld. 2011. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads. Appl. Environ. Microbiol. 77:3846-3852.