A new method for phylogenic analysis, based on RNA-Seq, may overcome the current difficulties associated with creating phylogenies, according to a new paper published in the Proceedings of the National Academy of Science. Researchers from the University of Colorado Denver Health Sciences Center, Washington University School of Medicine, and Vanderbilt University have demonstrated that transcriptome sequencing via RNA-Seq was successful in determining genetic relatedness among 10 mosquito species.
Phylogenic analysis focuses on orthologous genes across different species which may indicate evolution from a common ancestor. Subsequently, phylogenetics can be used to determine relationships among species by creating a map of which genes are similar in which species to make evolutionary comparisons. Currently, phylogenies are constructed based on the information contained in sequenced genomes, however complete genome sequencing is costly and time consuming. This has limited the number of genomes available for analysis.
“It’s a practical consideration,” Antonis Rokas, professor in the department of biological sciences at Vanderbilt University, told BioTechniques. “It simply costs too much to construct phylogenies with genomes when you don’t need the whole genomes.”According to Rokas, researchers currently have complete genomes for roughly 1000 species at their disposal, yet there are around two million described species. “That gives you an idea of how much we are missing,” said Rokas.
RNA-Seq uses next-generation sequencing to study the transcriptome—that is, the set of all RNA molecules—at a nucleotide level. RNA-Seq evaluates cDNA, which is synthesized from an mRNA template. The researchers chose to test the viability of transcriptomic phylogenic analysis for several reasons. Transcriptomes are easier to sequence than genomes because the section of DNA that codes for protein is relatively small. The transcripts necessary for phylogenic analysis are highly expressed, which means there were more copies of them in the sample. Also, protein coding DNA is more valuable when constructing phylogenies than non-coding DNA, which is only useful for determining connections among closely related species.
The researchers sequenced the transcriptomes of 10 mosquito species including Aedes aegypti and species of Anopheles including A. gambiae, A. arabiensis, and A. quadriannulatus. As a model, mosquito species were a practical choice because of their value beyond phylogenic analysis. “This group of organisms is attractive to biomedical research,” said Rokas. “Our data sets can be used by people involved in phylogenic analysis and also for genomics. We’ve already been contacted by researchers interested in mosquito biology, and we’ve been sharing our data.”
The transcriptomes were sequenced using Illumina's Solexa next-generation sequencing platform. “We encountered a lot of skepticism about whether we could assemble large fragments from the short sequences we were working with,” said Rokas. “Illumina’s platform generated a lot of data with shorts reads, so we chose Illumina for the challenge. For us, it was also less expensive.” The researchers obtained an average of 13 million sequences that averaged 36-bp long, from non-normalized cDNA libraries. Phylogenomic data matrices were constructed by mapping single contigs from each species. This single-contig strategy was efficient at identifying large amounts of orthologous DNA.
The complete genomes of the mosquito species being studied were previously assembled. These sequences were obtained from the short read archive of the National Center for BioTechnology Information (NCBI), which provided the researchers with reference data. “We didn’t want to apply a new technique to an unsolved problem,” said Rokas. “Otherwise, how would we know we got it right?” Since Anopheles species occasionally hybridize (which can make phylogenic analysis more complex, because the evolutionary history of the sample is not always representative of the species as a whole), it served to test the sensitivity of the new transcriptome technique. “We wanted to test to see if our technology was sensitive enough to see the differences,” said Rokas. Indeed, the reference data for Anopheles confirmed the presence of hybridization as identified by the transcriptome analysis.
According to Rokas, transcriptome analysis is not a replacement for whole-genome sequencing. “The two approaches are complementary,” said Rokas. “Our approach cuts to the chase. If your sole intention is to create a phylogeny, then our approach will get you there faster with high efficiency. We’ve done a little bit of the puzzle, but you can always go and build a bigger construct to learn more.”
The next step for the researchers will be to apply their method to different groups of species. “We’re adopting this method and going after a large number of phylogenies,” said Rokas. The researchers will also improve the data matrices to reduce instances of missing data. “The advantage of our technique is that you end up with your phylogeny but you also have the data from the RNA-Seq for other analysis,” said Rokas. “There is other data to delve into.”
The paper, “Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life,” was published online ahead of print Jan. 4 in the Proceedings of the National Academy of Science.