Full Text (PDF)
Small noncoding RNA plays a key role in regulating a variety of biological processes, including developmental timing, cellular differentiation, tumor progression, neurogenesis, transposon silencing, and viral defense. Small RNAs function in gene regulation by binding to mRNA targets, and modulate gene expression using mechanisms such as heterochromatin modification, translational inhibition, mRNA decay, and nascent peptide turnover. Because many small RNAs were discovered relatively recently, the field is young and rapidly changing. Novel forms remain to be discovered and the sequence modifications and activity of mature miRNAs are not well understood (1).
Whole-genome discovery of small RNAsThe current tools for studying small RNA are inadequate for whole-genome discovery and characterization of novel small RNA. High-throughput platforms based on probe hybridization, such as microarrays, require prior knowledge of the miRNA sequences, and have limited dynamic range and poor sensitivity. Moreover, current sequencing-based methods for small RNA library preparation are time-consuming, prone to variation due to complex protocols, and require a significant amount of input RNA. TaqMan® MicroRNA Assays provide a robust, highly sensitive turn-key method for profiling miRNA but are limited to analysis of known miRNA species.
Here we describe a new, robust method for hypothesis-neutral, whole genome analysis of small noncoding RNA in general and miRNA in particular, using a simplified, single day procedure for preparing small RNA libraries using the SOLiD™ Small RNA Expression Kit. The SOLiD System's ultra high-throughput (greater than 200 million mappable sequence reads, or tags, per run), wide dynamic range, and high sensitivity, make it particularly appropriate for analyzing low RNA expression levels and measuring accurate fold changes at these levels.
MethodsTotal RNA was isolated from samples of human placenta (300 ng RNA) and lung (500 ng RNA). The small RNA (10–40 nucleotides) was purified using the Ambion® flashPAGE™ Fractionator System and converted to amplified small RNA libraries using the SOLiD Small RNA Expression Kit (Figure 1). The resulting cDNA libraries, containing the adaptor sequences necessary for SOLiD System sequencing, were clonally amplified onto beads with emulsion PCR using the SOLiD ePCR Kit. Following the standard protocol (SOLiD System 2.0 User Guide http://www3.appliedbiosystems.com/AB_Home/applicationstechnologies/SOLiDSystemSequencing/index.htm), the four libraries were deposited onto separate segments of a single, four quadrant slide and sequenced on the SOLiD Analyzer.
A combined total of 173 million sequence reads were generated. All reads were mapped sequentially against (i) miRBase sequence database-build 10, (ii) ribosomal RNA, tRNA and low-complexity regions of the human genome, (iii) human RefSeq sequences, and (iv) human genome NCBI-build 36.
The miRNA reference was based on miRBase sequence data and constructed to enable detection of alternative miRNA forms with variable 5′ and 3′ ends. The reads mapped to this miRNA reference identified 34.4 million reads from the three placenta library quads and 13.3 million reads from the lung library quad, allowing 0-1 mismatches (data not shown). A high percentage (92–95%) of these reads matched the miRNA reference sequence in a unique location. After filtering for rRNA, tRNA, repeated regions of the human genome, or human RefSeq sequences, 29.7 million reads from the three placenta quads and 12.1 million reads from the lung quad remained and were used for the subsequent analysis/profiling. The number of reads per miRNA was determined and used as the expression level for that particular miRNA.
Results Detection of putative novel small RNADistribution of the 94 million tags is described in Figure 2. Approximately 51% of the tags mapped to known miRNAs. Greater than 40% of the tags mapped to genomic regions but not to any known RNA species. The small percentage (9%) of the tags that mapped to known rRNA, tRNA, repeat regions, and RefSeq indicate that the library preparation method successfully enriched for the small RNA fraction. A subset of the tags that mapped to the human genome but not to miRNA could represent previously uncharacterized miRNAs. Confirmation of these species as novel small RNA is currently being validated with TaqMan® Assays.
Reproducibility and dynamic range
To assess the reproducibility of the instrument system for miRNA detection, unique sequence tags mapping to the miRBase database and isolated from two independent runs of the placenta library (two quad segments), were compared (Figure 3). A linear regression between the two sets of counts produced a coefficient of determination (R2) of >0.9996, illustrating an excellent correlation between the results of the two quadrants over the entire six logs (base 10) of dynamic range. This high degree of reproducibility is essential for accurate detection of subtle changes in expression of small RNAs between two samples run on the same slide.
miRNA expression analysis—correlation to TaqMan results
A major limitation of microarrays is the observed “ratio compression” of expression levels. The wide dynamic range observed on the SOLiD Analyzer suggested that expression levels derived from the SOLiD System and TaqMan platforms would show strong correlation. Of the 423 miRNAs identified in placenta tissue, TaqMan MicroRNA Assays for 244 targets were readily available. Comparison of expression levels between the two platforms for these miRNAs demonstrated a high correlation coefficient of 0.87 (Figure 4A). This value is considerably greater than the typical correlation observed between microarrays and TaqMan results. Restricting the analysis to miRNAs that show significant differential expression increased the correlation coefficient to 0.90 (Figure 4B). These data demonstrate that the SOLiD System generates miRNA expression profiles that correlate well with TaqMan data (R = 0.90), suggesting that the SOLiD System is a valid profiling tool for gene expression analysis.
Variability in miRNA start points
Analysis of previously characterized miRNA revealed a significant number of molecules with different 5′ start points than previously described in the Sanger database (Figure 5). This variability may be due to alternative or permissive processing and may provide new insight into the regulation of miRNA. Further studies are required to understand the biological significance of these so-called miRNA isomiRs.
Conclusion
A simple and robust workflow for small RNA analysis has been developed for the SOLiD System. This workflow demonstrates significant advantages in sensitivity and dynamic range over traditional approaches to studying whole genome RNA expression profiles. The SOLiD System generates >240 million sequence tags per run. The single-day procedure to prepare small RNA libraries using the SOLiD Small RNA Expression Kit represents a significant improvement over the 4 days required by other published methods, saving researchers time and labor. The flexible slide format used in the SOLiD System allows for the deposition of 1–16 samples, enabling the analysis of multiple samples and matched controls in a single run. The expression levels of miRNA molecules were readily determined after sequencing unique reads from miRNAs. This approach was shown to be highly reproducible and demonstrated a dynamic range that is orders of magnitude greater than microarrays. Levels of miRNA analyzed with the SOLiD Small RNA Expression System were confirmed by experiments using TaqMan® MicroRNA Assays. The SOLiD System therefore provides a highly sensitive, hypothesis-neutral method for the detection of novel small transcripts on a genome-wide scale.




