to BioTechniques free email alert service to receive content updates.
Analysis of Affymetrix GeneChip® data using amplified RNA
 
Leslie Cope1, Scott M. Hartman2, Hinrich W.H. Göhlmann3, Jay P. Tiesman2, Rafael A. Irizarry4
1, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD
2, The Procter & Gamble Company, Cincinnati, OH, USA
3, Johnson & Johnson Pharmaceutical Research & Development, Beerse, Belgium
4, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
BioTechniques, Vol. 40, No. 2, February 2006, pp. 165–170
Full Text (PDF)
Supplementary Material
CopeSupp402 (.pdf)

Introduction

The standard method of target synthesis for hybridization to Affymetrix GeneChip® expression microarrays requires a relatively large amount of input total RNA (1–15 g). When small biological samples are collected by microdissection or other methods, amplification techniques are required to provide sufficient target for hybridization to expression arrays. One such technique is to perform two successive rounds of T7-based in vitro transcription. However, the use of random primers required to regenerate cDNA from the first round of transcription results in shortened copies of cDNA from which the 5′ end is missing.

Several recent studies have investigated the reliability of gene expression measures obtained using twice-amplified RNA in both cDNA arrays and GeneChips (1,2,3,4,5,6,7,8,9). In most cases the investigators conclude that amplified RNA produces quality microarray data. They find that the expression levels of amplified samples are highly correlated to one another and have reduced, but significant, correlation with nonamplified samples. Some of these studies have compared methods for sample preparation and amplification, to develop optimal laboratory protocols (8,9). To date, however, little effort has been made to optimize data processing procedures for microarray studies using the alternative labeling strategies required for small amounts of RNA.

For many microarray platforms, there is probably little that can be done beyond identifying and marking bad players among the probes. Affymetrix GeneChips, where multiple probes are used to measure expression for each transcript, are a possible exception. Given the loss of the 5′ end of cDNA transcripts, the position of each probe within an amplified transcript should influence the hybridization of the probe, and those that are closer to the 3′ terminus are expected to be more reliable than probes taken from the 5′end of a transcript.

Our goals in this study are to investigate the effect of probe position on absolute and relative expression measurements and to evaluate the performance of a new probe set summary, small-sample RMA (sRMA), designed to minimize the random primer effect in two-round labeling protocols.

Materials and Methods

Human total RNA (Clontech, Mountain View, CA, USA) was obtained from both male (testes) and female tissues (breast and cervix) and mixed to make two separate samples (90% breast/10% testis and 90% cervix/10% testis) that should exhibit differential gene regulation for many probe sets. The 10% testis RNA provides a background set of 45 Y-linked genes that should not be differentially regulated between the two samples.

Six technical replicates (10 g each) of each RNA mixture were labeled according to the Affymetrix One-Cycle Eukaryotic Target Labeling method, and six (50 ng each) were labeled according to the Affymetrix Two-Cycle Eukaryotic Target Labeling procedure (www.affymetrix.com/support/technical/expression_manual. affx ; Affymetrix, Santa Clara, CA, USA). Briefly, the one-cycle method consists of an oligo dT/T7 promoter-mediated reverse transcription of total RNA, followed by a T7-based in vitro transcription reaction incorporating biotin rNTPs. The two-cycle labeling protocol consists of two successive rounds of T7-based in vitro transcription incorporating biotin rNTPs in the second round reaction. The conversion of first round cRNA back to cDNA for the second round, via a random primer-mediated reverse transcription reaction, is believed to be the major source of increased 3′ bias in targets generated using this labeling protocol. The resultant targets from both labeling methods were hybridized to 24 Affymetrix HGU133 A GeneChips according to the manufacturer's instructions.

In the GeneChip platform, substantial data processing is required after image analysis to obtain expression level measurements. In this study, we compared four different processing algorithms: Affymetrix MAS 5.0, Li and Wong's Model Based Expression Index (10), RMA (11), and a new protocol, called sRMA (available at www.biostat.jhsph.edu/∼ririzarr/Software/srma.R), introduced for the first time here.

The small-sample version of RMA uses the same background adjustment and cross-chip normalization procedure as the standard RMA algorithm and, in keeping with the RMA philosophy, uses a robust linear model to summarize probe level expression values. Specifically, we model background adjusted and normalized probe intensities as log2(Yijk ) = θikjkikj, i = 1,...,I, j = 1,...,J, k = 1,...,K. Here, k represents array, i represents probeset, and j represents probe, θ represents a quantity proportional (in the log scale) to the amount of RNA, φ represents a probe-specific effect, and ε, representing measurement error, is assumed to have a probe-specific variance σ2ij. The standard RMA algorithm uses median polish (12), an ad hoc robust procedure, to estimate θ, but code has recently been made available to fit the model above using formal, robust statistical procedures (13). The new implementation accommodates user-defined weights for each probe, and sRMA takes advantage of this by weighting the contribution of each probe according to its relative 5′/3′ position in the transcript using the inverse of the position-specific coefficient of variation.

  1    2    3