to BioTechniques free email alert service to receive content updates.
High-throughput verification of transcriptional starting sites by Deep-RACE
 
Signe Olivarius*, Charles Plessy, and Piero Carninci
Functional Genomics Technology Team, Omics Science Center, RIKEN Yokohama Institute


*S.O.'s present address is Molecular Evolution Group, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
BioTechniques, Vol. 46, No. 2, February 2009, pp. 130–132
Full Text (PDF)

Evidence that the genome is pervasively transcribed into hundreds of thousands different RNAs (1,2) necessitates methods for independent high-throughput verification of transcriptional starting sites (TSSs) determined by genome-wide approaches. 5′-RACE PCR is a well-established and widely used method to specifically amplify the 5′ end of a transcript, facilitating mapping of the TSS and the approximate location of promoter elements (3). Conventionally, this mapping is done by cloning the 5′-RACE PCR product into a bacterial vector and sequencing a few clones by classic electrophoresis-based Sanger sequencing. A great challenge for transcriptional analysis in general is the recently recognized prevalence of mechanisms that expand the potential transcript repertoire, such as alternative splicing, alternative promoter usage, and multiple TSSs (1,4). In order for a 5′-RACE PCR assay to reflect such diversity, sequencing a fairly large number of clones is necessary, and when more than one transcript is to be analyzed by 5′-RACE PCR this procedure becomes comprehensive and time-consuming, not to mention costly. To address this drawback, we developed a simple assay in which 5′-RACE PCR products are subjected directly to high-throughput sequencing by the newly developed flow cell sequencing technologies (5) that are currently revolutionizing DNA sequencing by enabling the sequencing of hundreds of millions bases in a single sequencing run. Based on clonal amplification of millions of single surface-immobilized DNA fragments (6,7), flow cell sequencing technologies have been recognized for their aptness for ultra-high throughput approaches such as whole-genome sequencing. However, since it eliminates the need for cloning steps and facilitates quantitative assessment of transcription, the high-throughput sequencing approach is also a very appealing option for single-gene analysis such as with 5′-RACE PCR.

To explore the possibility of combining 5′-RACE PCR with high-throughput sequencing, we selected 17 genes or genomic loci displaying expression in Hep G2 cells (Catalog no. HB-8065, ATCC, Manassas, VA, USA) according to the ENcyclopedia of DNA Elements (ENCODE) Project (2) and cap-analysis gene expression (CAGE) data (1). The RNA ligase-mediated (RLM) RACE PCR approach (8,3) for specifically amplifying the 5′ end of cDNA from full-length, capped transcripts was adapted to perform high-throughput 5′-end sequencing on total RNA extracted from HepG2 cells (Figure 1). Normally, flow cell sequencing requires a sample preparation step in which sequencing adapters are ligated to both ends of the DNA fragments. Omitting this step, the primers for the inner PCR were designed with ∼20-bp-long derivatives of specific adapter sequences of the Illumina Genome Analyzer (Tokyo, Japan) attached to the 5′ ends (Figure 1). Following nested PCR, the inner 5′-RACE PCR reactions for all 17 genes were pooled and purified using a standard PCR purification kit (QIAGEN, Tokyo, Japan), and the resulting mixture of PCR products was loaded in a single flow cell channel in the high-throughput sequencer Genome Analyzer. A total of 2,145,126 sequence reads were obtained and 1,280,189 of them could be mapped on 88,554 distinct TSSs of the human genome using the “nexalign” program as described by T. Lassmann et al. (manuscript in preparation). We used the Galaxy web service (galaxy.psu.edu; Reference (9) to identify the genomic intervals where most reads align. TSSs with a read count higher than 500 were selected and clustered, and the clusters were then extended by 100 bp in the 5′ and 3′ directions. This yielded 26 genomic intervals that we used to filter our original data and rescue TSSs with read counts lower than 500. Eighteen of the intervals corresponded to our loci of interest (one has two promoters), and the other corresponded to repeated regions of the genome. Each gene is covered by an average of more than 74,000 sequences. Even the gene with lowest coverage has 3,195 tags, suggesting that up to a few hundreds of different, non-overlapping transcriptional starting sites or genes could be investigated with a good chance of success.


Figure 1. Schematic representation of the 5-RACE PCR procedure optimized for the high-throughput Illumina Genome Analyzer. Total RNA from Hep G2 hepatocellular carcinoma cells was treated with calf intestine phosphatase (CIP) to remove free 5′ phosphates and with tobacco acid pyrophosphatase (TAP) to detach the cap of full-length transcripts, following the manufacturer's instructions of the FirstChoice RLM-RACE Kit (Applied Biosystems, Tokyo, Japan). The 5′ RACE adapter included in the kit was ligated to decapped molecules, and reverse transcription with random decamers was performed. Outer PCR reactions were carried out using a common adapter-specific forward primer included in the kit and custom gene-specific reverse primers. For the inner nested PCR reactions, in which the outer PCR reactions were used as templates, a common adapter-specific forward primer with the sequence AATGATACGGCGACCACCGAACACTGCGTTTGCTGGCTTTGATG was constructed (bold letters designate an Illumina-specific adapter sequence). The gene-specific inner reverse primers were designed with an Illumina-specific adapter sequence, CAAGCAGAAGACGGCATACGA (Click to enlarge)

  1    2