to BioTechniques free email alert service to receive content updates.
Lost in transcription

06/20/2011
Lisa Grauer

A handful of DNA sequences are not transcribed into their cognate RNA sequences, a finding that could overthrow the 50-year-old central dogma of molecular biology.

Bookmark and Share

After sequencing and comparing mRNAs with their corresponding DNA sites from the B cells of 27 individuals, researchers from the University of Pennsylvania (UPenn) School of Medicine found over 28,000 instances in which RNA did not match the corresponding DNA. These differences were located at more than 10,000 exonic sites within 4741 genes.

“We certainly weren’t expecting to see these differences and could have easily attributed them to technical issues,” said Vivian Cheung, UPenn professor of pediatrics who led the study. “But there was this nagging feeling that perhaps the differences were something more than just random errors.”

One potential source for RNA discrepancy could have been genomic variation at these exonic sites, but the UPenn researchers specifically selected these sites where no variation has been found, according to the Single Nucleotide Polymorphism Database (dbSNP).

Frequency of the 12 types of RDDs identified in B cells of 27 normal individuals. Source: Science

Furthermore, to ensure that their individuals would not potentially have previously unidentified single nucleotide polymorphisms (SNPs) at these sites, Cheung’s group used 27 individuals whose genomes had already been sequenced and analyzed for SNPs in the 1000 Genomes Project and the International HapMap Project. As a further precaution, the team sequenced the DNA from the B cells of these 27 individuals and found no unexpected DNA variants at these locations when compared to the human genome.

After discovering the RNA-DNA differences (RDDs), the team validated these discrepant RNA sequences by analyzing the proteomes of these B cells using liquid chromatography–tandem mass spectrometry (LC-MS/MS). The peptide sequences matched the corresponding RNA sequences but not the corresponding DNA sequences, confirming that the variant RNAs were not sequencing errors and were indeed being translated into proteins.

In addition, the UPenn researchers noticed that the RDDs did not appear to be random. After further analysis, they discovered a consistent pattern across various cell types and data set types, with A-to-G and T-to-C occurring most frequently. These findings suggest that these discrepant RNAs have biological significance to diversify an organism’s proteome.

“For example, if the DNA sequence was AA, and the corresponding RNA sequence was AC in one sample, in other samples, we would see the same A-to-C difference, but not other types of differences,” said Cheung.

As a result, RNA variation may be involved with disease susceptibility. Cheung believes that future gene-mapping studies should also explore RNA variants, which uncover potential biomarkers for disease.

“The most important next step will be to find a mechanism or an enzyme that causes these differences,” Cheung said. “There’s been a lot of discussion that this could simply be a result of RNA editing, which would indicate that the correct base was put in during transcription, but we’re not sure yet if that’s actually the case.”

The paper “Widespread DNA and RNA sequence differences in the human transcriptome,” was published online on 19 May 2011 in Science.