2Systems Biology Department, Sandia National Laboratories, Livermore, CA, USA
3Advanced Systems Engineering & Deployment Department, Sandia National Laboratories, Livermore, CA, USA
Second-generation sequencing (SGS) has become the preferred method for RNA transcriptome profiling of organisms and single cells. However, SGS analysis of transcriptome diversity (including protein-coding transcripts and regulatory non-coding RNAs) is inefficient unless the sample of interest is first depleted of nucleic acids derived from ribosomal RNA (rRNA), which typically account for up to 95% of total intracellular RNA content. Here we describe a novel microscale hydroxyapatite chromatography (HAC) normalization method to remove eukaryotic and prokaryotic high abundant rRNA species, thereby increasing sequence coverage depth and transcript diversity across non-rRNA populations. RNA-seq analysis of Escherichia coli K-12 and human intracellular total RNA showed that HAC-based normalization enriched for all non-ribosomal RNA species regardless of RNA transcript abundance or length when compared with untreated controls. Microcolumn HAC normalization generated rRNA-depleted cDNA libraries comparable to the well-established duplex specific nuclease (DSN) normalization and Ribo-Zero rRNA-depletion methods, thus establishing microscale HAC as an effective, cost saving, and non-destructive alternative normalization technique.
Second-generation sequencing (SGS) has revolutionized whole genome sequencing and transcriptome analysis (1-5). In particular, sequencing of cDNA synthesized from intracellular total RNA (RNA-seq) enables RNA expression profiling with high dynamic range and genome coverage. RNA-seq has led to discoveries of novel alternative RNA splicing in various eukaryotic cells types and expanded our knowledge of regulatory non-coding RNA transcripts (6-8). The primary component of both eukaryotic and prokaryotic total RNA is ribosomal RNA (rRNA) with all other coding, noncoding, and small RNAs representing less than 15% of the total RNA population (9). The abundance of rRNA-derived sequences in cDNA libraries diminishes the utility of RNA-seq for functional genomics studies because only a small fraction of reads are from sequences of interest. In this context, RNA-seq library preparation techniques that efficiently remove highly abundant rRNA-derived sequence populations and enrich for non-ribosomal RNAs prior to SGS are highly desirable.
A common method for excluding rRNA is to capture RNA species that contain polyadenylated tails. This approach is highly effective in removing rRNA but also depletes all non-polyadenylated host transcripts, including non-coding RNAs that regulate eukaryotic cellular function, as well as both viral and prokaryotic microbial sequences present in many complex sample types (10). Another common method for excluding rRNA is to selectively remove the ribosomal RNA prior to generating a cDNA library for SGS. These rRNA depletion protocols utilize antisense rRNA probes specifically designed to capture human/mouse/rat or gram positive/gram negative bacterial rRNA transcripts from high-quality total RNA samples. This technique is a multi-step procedure that requires large amounts of starting material (250 ng to 10 µg of total RNA) and has been shown to be less effective on degraded RNA samples. Commercially available rRNA depletion kits such as RiboMinus, and Ribo-Zero are effective in removing highly abundant rRNA species from eukaryotic and prokaryotic total RNA, but are costly and the rRNA capture probes are species-specific (11-14).
An alternative to depleting rRNA sequences prior to cDNA library synthesis is to apply cDNA normalization (also called Cot filtration) approaches that remove highly abundant sequences from cDNA libraries (15, 16). In normalization, double-stranded DNA (dsDNA) populations are first denatured and then allowed to re-anneal at an elevated temperature. Highly abundant sequences hybridize at higher rates (proportional to the square of their concentration) and, if the re-annealing reaction is stopped at a suitable time point (e.g., 4–24 h), these will comprise the majority of double-stranded species (17). If double-stranded and single-stranded cDNA can then be separated, representation of the highest abundance species in the resulting ss fraction can be significantly reduced. The two common approaches for separating ss-cDNA and ds-cDNA populations include enzymatic digestion of ds-cDNA using a duplex specific nuclease (DSN) (18, 19) and physical separation of ds-cDNA from ss-cDNA through methods such as hydroxyapatite chromatography (HAC) (20-24).
Here we describe a micro-column based HAC approach for normalization using convenient re-packable cartridges that is rapid, reproducible, and amenable to future automated sample preparation platforms (25-27). We present a comparison of our microcolumn HAC-based method with a commercial rRNA-depletion kit, Ribo-Zero, and a DSN normalization kit for normalizing SGS libraries prepared from Escherichia coli K-12 or human peripheral blood mononuclear cell (PBMC) total RNA, respectively. Sequencing of RNA-seq cDNA libraries followed by alignment to either the E. coli K-12 or human (hg19) genome was used to measure rRNA abundance, non-rRNA transcript enrichment, and in the case of E. coli K-12, coverage across the entire bacterial transcriptome. Microcolumn HAC-based normalization proved to be an effective, cost saving alternative to commercial Ribo-Zero and DSN normalization kits, and the first step toward a fully automated system incorporating HAC normalization into RNA-seq cDNA library preparation workflows.
Materials and methods
E. coli K-12 and human PBMC cDNA library preparation
E. coli strain K-12 was obtained in lyophilized form from ATCC (Manassas, VA, USA). Bacteria were cultured with addition of 300 µl of LB broth (BD, Franklin Lakes, NJ) followed by plating on LB agar (BD) and incubation at 37°C overnight. Individual colonies were used to inoculate in LB broth, incubated at 37°C with shaking for 3 h, and centrifuged at 5000× g (RCF) at 4°C for 5 min. RNA was extracted from the cells using the RNeasy Protect Bacterial Mini Kit with on-column DNase treatment (Qiagen, Valencia, CA, USA). 1 µg of E. coli total RNA was subjected to Ribo-Zero treatment to deplete rRNA sequences using the Gram-Negative Bacteria kit following manufacturer's instructions (Epicentre, Madison, WI, USA). Total intracellular RNA was extracted from human PBMCs (Astarte Biologics, Redmond, WA, USA) using RNAzol (MRC, Cincinnati, OH, USA) followed by Directzol column cleanup kit (Zymo, Irvine, CA, USA) and eluted into nuclease-free water. Total RNA extracts were visualized and RNA integrity values were determined using a 2100 Bioanalyzer RNA nanochip (Agilent, Santa Clara, CA, USA). Triplicate samples containing approximately 300 ng of E. coli K-12 (RIN >9.0), untreated and Ribo-Zero treated, and human PBMC RNA (RIN >8.5) were fragmented by incubating with a magnesium chloride buffer (New England Biolabs, Ipswich, MA, USA) for 5 min at 95°C to obtain a fragment size range of 25–300 bp. The RNA fragments were purified using an RNA clean and concentrator kit (Zymo), and eluted in 6 µL of nuclease-free water.
For first strand cDNA synthesis, 200 ng of fragmented E. coli K-12 (untreated and Ribo-Zero treated) or fragmented human PBMC RNA was reverse transcribed and adaptors were incorporated at the 5′ and 3′ terminal ends of each fragment as previously described by Levin et al. using the SMART protocol with a few modifications (28). First strand synthesis products were purified using 18 µL of AMpure XP DNA beads (Beckman-Coulter, Indianapolis, IN, USA), following manufacturer's instructions and eluted in 25 µL of nuclease-free water. Second-strand synthesis was performed with Failsafe PCR polymerase in Buffer E (Epicentre) using the following thermocycling parameters: an initial 94°C step for 1 min followed by 9–15 cycles of 94°C for 30 s, 55°C for 30 s, 68°C for 3 min, and a final extension of 68°C for 7 min for 10–15 cycles. All ds-cDNA libraries were purified using IC DNA columns (Zymo), eluted in nuclease-free water and quantified using a Nanodrop 1000 (Nanodrop, Wilmington, DE, USA) and a 2100 Bioanalyzer with DNA high sensitivity chip (Agilent) following manufacturer's instructions.
Microcolumn hydroxyapatite chromatography
A simple fluidic system was assembled from commercially available components to speed and facilitate HAC (Figure 1). Syringe pumps (Cole-Parmer, Vernon Hills, IL, USA) were used to control and dispense three wash/elution buffers and to introduce the sample into the cartridge column. A 7-port, 6-way selection valve (Scivex, Oak Harbor, WA, USA) allowed switching between the sample and the various buffers flowing into the packed cartridge. Eluted ss-cDNA and ds-cDNA fractions were collected using a UV detector (Linear UVIS 200, Reno, NV, USA) monitoring A260 just prior to the end of the outlet tubing. The entire system could be placed in a small incubator/oven to minimize non-specific secondary structure formation (40–60°C).