to BioTechniques free email alert service to receive content updates.
Marine Metagenomics
 
Diana Gitig, Ph.D.
BioTechniques, Vol. 48, No. 5, May 2010, pp. 361–365
Full Text (PDF)

With oceans covering 70% of our planet and one milliliter of seawater containing 10,000–1,000,000 microbes (1), there is clear impetus for researchers to identify and access this plethora of genomic information, which could offer a better understanding of these microbes' role in regulating and responding to global climate changes and other critical ecosystem processes.

But as much as researchers have their eye on the yet-untapped genetic sequences of marine microbial species, these millions of unseen ocean inhabitants aren't exactly suited to conventional DNA sequencing, which requires a culture of identical cells. Fortunately for those interested in deciphering the ocean's metagenome—that is, the collection of microbial genes in this mixed environmental sample—recent advances in sequencing technology and computational power are starting to break below the surface on the importance of ocean microbes.

Trends in Expression

“We need a whole earth catalog of genes and genomes to help us describe processes driven by microorganisms that control the flux of energy and matter on earth,” says Edward DeLong, a professor of biological engineering and civil and environmental engineering at the Massachusetts Institute of Technology (MIT). DeLong was one of the first recipients of a grant from the Gordon and Betty Moore Foundation, whose 2004 Marine Microbiology Initiative began tackling the ambitious goal of identifying marine micro- organisms. “Since such a small fraction of microbial species have been cultured and studied, we don't even understand what sorts of metabolic processes and organisms exist and how they relate to the carbon, nitrogen, and sulfur cycles in the environment,” he says. Attaining such a metagenomic catalog could well depend on the further advancement of next-generation sequencing methods.

Traditional Sanger sequencing relies on the culturing and cloning of an organism. As such, it is often unsuitable for metagenomic studies. Cloning, for one thing, has known biases, such as those introduced when an insert proves toxic to the vector. If certain species in a heterogeneous sample are cloned less efficiently than others, these biases can skew the genomic composition of a certain community. Additionally, most microorganisms—for a variety of reasons—cannot yet be cultured in a lab (e.g., because they depend on another species for viability in the wild). But yet another pressing concern for those studying large metagenomic communities is one of volume: the amount of sequence data necessary to understand a community is simply too great to be handled by Sanger methodology.



Up to now, most large-scale metagenomic studies have relied on pyrosequencing approaches, such as Roche's 454 platform. Pyrosequencing can deliver high throughput (several hundred thousand reactions in a single instrument run) along with the longest reads of all currently available next-generation sequencing platforms (around 450 bp). This longer fragment length is essential when attempting to classify unknown sequences from many different organisms in a mixed sample, since shorter fragments may lack enough sequence-specific information to be attributed to one particular organism.

Since pyrosequencing approaches do not require cloned DNA libraries, they are much cheaper and faster than the Sanger method and avoid cloning biases. Other next-generation sequencing-by-synthesis technologies—including Illumina's Genome Analyzer and Applied Biosystem's SOLiD platform—have emerged in recent years as well to help in metagenomics studies. Illumina's sequencing approach uses fluorescently labeled nucleotides that are detected by laser excitation and can generate reads of about 100 bp from a paired-end sequencing run. SOLiD sequencing relies on fluorescently labeled dinucleotide probes that compete for ligation to a sequencing primer hybridized to an adapter sequence on the template. Every base on the template is assayed by two different primers, which enhances the system's accuracy. The appeal of these new systems is the higher throughput; they can do about 50–60 billion bases per run, as opposed to 454's 300 million bases per instrument run. Many metagenomics researchers still prefer pyrosequencing because of the significantly longer reads.

“Microorganisms are exquisite biosensors. They are out there and deployed. We just have to read their output,” notes DeLong. “Next- generation sequencing technologies mean that sequencing is no longer just analyzing data; it can be used to do experiments. We can look at gene expression to determine how a community of microbes responds to perturbations in its environment.” DeLong and his colleagues have measured bacterial and archaeal gene expression within such metagenomic communities; they developed a method to amplify total microbial RNA from a sample, then pyrosequence the resulting cDNA to study the gene expression in what is being called metatranscriptomics (2). In the coming years, single-molecule sequencing technologies could move researchers even further toward understanding how specific genes and their expression patterns shape the composition of microbial communities.

  1    2