Although numerous genomes have been sequenced, the constant discovery of new genes, missed by prediction programs, indicates that gene identification and characterization is still in its infancy. Transcription start site (TSS) identification is the first step in defining promoters and studying regulatory mechanisms. Traditional methods for TSSs identification are tedious and expensive. To circumvent these problems, we developed a simple and efficient method that is accessible to most molecular biology laboratories. Our method will be a potent tool for identifying new genes, characterizing genome structures, and studying transcription regulatory mechanisms.
Current methods for identifying transcription start sites (TSSs) of specific genes in bacteria usually require adaptors or radioactive labeling. These approaches can be technically demanding and environmentally unfriendly. Here we present a method for identifying TSS called ARF-TSS, which is based on cDNA generation, circularization, PCR amplification, and DNA sequencing to determine the 5′-end of transcripts, thus circumventing the need for adaptors and radioactive labeling. We validated the method using the gene lasI from the bacterial pathogen Pseudomonas aeruginosa. Our results show that ARF-TSS could be a good alternative to traditional methods for bacterial TSS analysis.
Identification of mRNA transcription start sites (TSSs) is critical for characterization of promoter regions, which is essential for understanding gene expression and regulation patterns in both eukaryotes and prokaryotes. However, accurate identification of TSS is tedious and technically demanding (1-3). In bacteria, the TSS is traditionally determined through two experimental strategies: primer extension analysis or S1 nuclease protection mapping assays (4, 5). Both approaches generally require radioactive labeling and manual DNA sequencing, which are labor-intensive, time-consuming, costly, and environmentally unfriendly. Here, we report a new strategy for efficient and reliable identification of bacterial TSSs. This strategy is based on cDNA generation, circularization, PCR amplification, and DNA sequencing to determine the 5′-end of transcripts, thus circumventing the need for adaptor ligation and radioactive labeling. We refer to this strategy as adaptor-and radioactivity-free identification of TSSs or ARF-TSS.
As presented in Figure 1, ARF-TSS involves self-ligation of first-strand cDNA, PCR amplification, and sequence analysis. Briefly, bacterial mRNAs are reverse transcribed using a gene-specific primer with a phosphorylated 5′-end (Primer-R1) to generate the corresponding cDNA fragment with the TSS at its 3′-end. Upon treatment with T4 RNA ligase, an enzyme capable of ligating a 5′ phosphoryl-terminated single-stranded DNA to a 3′ hydroxyl-terminated single-stranded DNA, the 3′-end (TSS) and the 5′-end of Primer-R1of the cDNAs are covalently joined to form circularized or concatenated products. (For simplicity, only circularization will be discussed.) Circularized cDNAs are then amplified by PCR using primers Primer-R2 and Primer-F3, which are specific to the gene of interest. Given that a gene may be transcribed from different start sites, the PCR products are cloned and sequenced after electrophoresis on an agarose gel. This allows identification of the TSS since it will be the nucleotide immediately following the 5′-end of Primer-R1 sequence. In this way, ARF-TSS abrogates the requirement for radioactive labeling and manual DNA sequencing, and circumvents the end-tailing and adapter-anchoring steps required for other known methods.
We selected the lasI gene to validate the ARF-TSS approach. In the bacterial pathogen Pseudomonas aeruginosa, LasI is the synthase of quorum sensing (QS) signal. It is regulated by the transcription factor LasR, and its TSSs have been analyzed using a conventional S1 nuclease protection mapping assay. With active LasR, lasI was expressed abundantly, allowing the nucleotide A(-25) to be identified as the primary TSS. Without LasR, lasI was expressed at low levels, and the primary TSS was shifted to G(-13) (Figure 2A, 6-7). To evaluate ARF-TSS, we reanalyzed the TSSs of lasI in the wide-type strain PAO1 and its lasR-knockout mutant. Both bacterial strains were grown aerobically in Luria-Bertani broth. At OD600 of 1.2, total RNA was isolated using the RNeasy Mini kit (Qiagen, Valencia, CA, USA), treated with DNase I (Roche, Indianapolis, IN, USA), qualified by electrophoresis, and quantified by Nanodrop 1000 (NanoDrop Technologies, Wilmington, DE, USA). Next, 1 µg RNA was transcribed with the 5′-end phosphorylated primer lasl-R1 (5′-phosph-CATCGATTTCCA-TCTCGTCG-3′) using 20 U SuperScript III Reverse Transcriptase from Invitrogen (Carlsbad, CA, USA). After removing the RNAs by 1 N NaOH treatment (65°C for 30 min), cDNAs were purified using the QIAquick PCR Purification kit from Qiagen, treated with T4 RNA ligase 1 from Fermentas (10 U, 37°C for 30 min), and the ligates were used for PCR amplification with primers lasI-R2 (5′-GCAACTTGTGCATCTCGC-3′) and lasI-F3 (5′-GCAAAGGCT-GGGACGTTAGTG-3′). Using a Taq DNA Polymerase kit (Qiagen), 25 cycles of PCR were performed as follows: initial activation at 95°C for 5 min, denaturation at 95°C for 20 s, annealing at 50°C for 10 s, and extension at 72°C for 75 s, final extension at 72°C for 5 min. PCR products were confirmed by electrophoresis on 1% agarose gel, cloned into the pGEM T-easy vector (Promega, Madison, WI, USA) and then sequenced by the Applied Biosystems 3730xl instrument with BigDye terminator chemistry (Applied Biosystems, Foster City, CA, USA). The sequence conjunction between the 5′-end of lasI-R1 and the nucleotide upstream of the translational initiation site was identified as the TSS of lasI transcripts. For the wild-type strain, we sequenced eight clones; all sequencing results identified A(-25) as the TSS of lasI, which is consistent with the previously published results (7). For the lasR knockout, we sequenced 22 clones and identified G(-13) of lasI as the major TSS of lasI (Figure 2B), which also agrees with the previously published results (7). In addition, ARF-TSS identified other nucleotides, including G(-151), G(+3), A(+13), G(+21), and G(+34), as TSSs of lasI in the LasR-knockout mutant (Figure 2C). These TSS variants appeared reproducibly across independent repetitions of the experiments, indicating that the variants may not be the results of premature termination of reverse transcription or mRNA degradation. Alternatively, the TSS variants suggest that LasR not only regulates the expression of lasI, but controls the specificity of transcription initiation in P. aeruginosa as well, although defining the detailed mechanism will require additional studies.
To further examine the reliability of ARF-TSS, we reanalyzed another gene, hcnA. In P. aeruginosa, hcnA encodes the synthase of hydrogen cyanide. The C(-59) site has been determined to be the QS-dependent TSS through primer extension experimentation (8). As shown in Supplementary Figure S1, ARF-TSS identified the G(-58) site as the TSS of hcnA transcripts, one nucleotide downstream of the C(-59) site previously described. Because additional phosphates are added to the 5′-end of cDNAs during radioactive labeling, the cDNA products in primer extension experiments have been noted to move more slowly than the corresponding ladder in the PAGE gels, thus limiting the resolution of primer extension to within one or two nucleotides (9). Taking this into consideration, we concluded that the ARF-TSS result agreed with the previously published results and confirmed the reliability of ARF-TSS as an alternative method for identifying bacterial TSSs.
Compared with traditional primer extension and S1-nuclease protection, ARF-TSS has exhibited a number of advantages. It is a sensitive and efficient approach for unveiling additional TSSs that are beyond the detection limits of traditional methods, and it successfully avoids tiresome radioactive labeling and manual Sanger sequencing. Compared with recently developed approaches such as rapid amplification of cDNA ends (RACE) (10, 11), ARF-TSS also successfully avoids experiment steps of TdT tailing and adapter anchoring, which are known to be technically demanding (11, 12). In addition, reverse transcriptases used for reverse transcription may have terminal transferase activity and thus add extra nucleotides at the 3′-end of cDNAs. Utilization of such enzymes could cause artificial results in primer extension and S1-protection experiments because both methods determine TSSs by measuring the length of cDNAs and ineffectively distinguish the extra nucleotides added by the reverse transcriptases. ARF-TSS determines TSSs by sequencing the cDNAs, which not only improves the resolution of ARF-TSS up to one nucleotide, but also allows ARF-TSS to easily identify these extra nucleotides by comparing the cDNA sequence and the genomic sequence. Besides mapping bacterial TSSs, ARF-TSS may also serve as an alternative for quantification of transcript variants that are transcribed from different start sites of the same gene. In this application, however, specific attention is needed for genes where the mRNA has a strong tendency to form secondary structures, since these structures may affect the activity of RNA ligase I. In addition, it is worth mentioning that PCR specificity is critical for ARF-TSS, which might be tested at the early stage of experiments. After reverse transcription, single-strand cDNA could be used as a negative control for PCR amplification. No amplification suggests the primers are specific and could be subjected to the following PCR using the circularized single-stranded cDNAs as templates. Otherwise, the PCR primers may need to be redesigned for the successful application of ARF-TSS.
This work was supported by the Biomedical Research Council of A*STAR, Singapore.
The authors declare no competing interests.
Address correspondence to Lian-Hui Zhang, Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos Building, Singapore 138673. e-mail: [email protected]
1.) Tjaden, B., R.M. Saxena, S. Stolyar, D.R. Haynor, E. Kolker, and C. Rosenow. 2002. Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays. Nucleic Acids Res. 30:3732-3738. 2.) Sharma, C.M., S. Hoffmann, F. Darfeuille, J. Reignier, S. Findeiss, A. Sittka, S. Chabas, K. Reiche. 2010. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464:250-255. 3.) Ausubel, F.M., R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, and K. Struhl. 1994. Current Protocols in Molecular Biology. John Wiley & Sons, New York. 4.) Berk, A.J., and P.A. Sharp. 1977. Sizing and mapping of early adenovirus mRNAs by gel electrophoresis of S1 endonuclease-digested hybrids. Cell 12:721-732. 5.) Thompson, J.A., M.F. Radonovich, and N.P. Salzman. 1979. Characterization of the 5′-terminal structure of simian virus 40 early mRNA's. J. Virol. 31:437-446. 6.) Pearson, J.P., K.M. Gray, L. Passador, K.D. Tucker, A. Eberhard, B.H. Iglewski, and E.P. Greenberg. 1994. Structure of the autoinducer required for expression of Pseudomonas aeruginosa virulence genes. Proc. Natl. Acad. Sci. USA 91:197-201. 7.) Seed, P.C., L. Passador, and B.H. Iglewski. 1995. Activation of the Pseudomonas aeruginosa lasI gene by LasR and the Pseudomonas autoinducer PAI: an autoinduction regulatory hierarchy. J. Bacteriol. 177:654-659. 8.) Pessi, G., and D. Haas. 2000. Transcriptional control of the hydrogen cyanide biosynthetic genes hcnABC by the anaerobic regulator ANR and the quorum-sensing regulators LasR and RhlR in Pseudomonas aeruginosa. J. Bacteriol. 182:6940-6949. 9.) Komura, J., and A.D. Riggs. 1998. Terminal transferase-dependent PCR: a versatile and sensitive method for in vivo footprinting and detection of DNA adducts. Nucleic Acids Res. 26:1807-1811. 10.) Scotto-Lavino, E., G. Du, and M.A. Frohman. 2006. Amplification of 5′ end cDNA with ‘new RACE’. Nat. Protocols 1:3056-3061. 11.) Mendoza-Vargas, A., L. Olvera, M. Olvera, R. Grande, L. Vega-Alvarado, B. Taboada, V. Jimenez-Jacinto, H. Salgado. 2009. Genome-wide identification of transcription start sites, promoters and transcription factor binding sites in E. coli. PLoS One 4:e7526. 12.) Bower, N.I., and I.A. Johnston. 2010. Targeted rapid amplification of cDNA ends (T-RACE)—an improved RACE reaction through degradation of non-target sequences. Nucleic Acids Res. 38:e194.