After sequencing and comparing mRNAs with their corresponding DNA sites from the B cells of 27 individuals, researchers from the University of Pennsylvania (UPenn) School of Medicine found over 28,000 instances in which RNA did not match the corresponding DNA. These differences were located at more than 10,000 exonic sites within 4741 genes.
“We certainly weren’t expecting to see these differences and could have easily attributed them to technical issues,” said Vivian Cheung, UPenn professor of pediatrics who led the study. “But there was this nagging feeling that perhaps the differences were something more than just random errors.”
One potential source for RNA discrepancy could have been genomic variation at these exonic sites, but the UPenn researchers specifically selected these sites where no variation has been found, according to the Single Nucleotide Polymorphism Database (dbSNP).
After discovering the RNA-DNA differences (RDDs), the team validated these discrepant RNA sequences by analyzing the proteomes of these B cells using liquid chromatography–tandem mass spectrometry (LC-MS/MS). The peptide sequences matched the corresponding RNA sequences but not the corresponding DNA sequences, confirming that the variant RNAs were not sequencing errors and were indeed being translated into proteins.
In addition, the UPenn researchers noticed that the RDDs did not appear to be random. After further analysis, they discovered a consistent pattern across various cell types and data set types, with A-to-G and T-to-C occurring most frequently. These findings suggest that these discrepant RNAs have biological significance to diversify an organism’s proteome.
“For example, if the DNA sequence was AA, and the corresponding RNA sequence was AC in one sample, in other samples, we would see the same A-to-C difference, but not other types of differences,” said Cheung.
As a result, RNA variation may be involved with disease susceptibility. Cheung believes that future gene-mapping studies should also explore RNA variants, which uncover potential biomarkers for disease.
“The most important next step will be to find a mechanism or an enzyme that causes these differences,” Cheung said. “There’s been a lot of discussion that this could simply be a result of RNA editing, which would indicate that the correct base was put in during transcription, but we’re not sure yet if that’s actually the case.”
The paper “Widespread DNA and RNA sequence differences in the human transcriptome,” was published online on 19 May 2011 in Science.