Full Text (PDF)
Determining the primary sequences of informational macromolecules is no longer a limiting factor for our ability to completely understand the biological functioning of cells and organisms. Similarly, our understanding of transcriptional regulation (transcriptomics) has been greatly enhanced by the availability of microarrays. Our next hurdle is to learn the biochemical functions of all the gene products (proteomics) and the totality of all the interactions among them (interactomics). Using traditional biochemical methods, this will take a very long time. More efficient methods are needed to address these questions, or at least to suggest possible candidates for further testing. High-resolution imaging using molecule-specific tags will reveal details of cellular architecture that are expected to provide additional insights and clues about the interactions and functions of many gene products. Computer modeling of macromolecular structures and functional systems will be of key importance. We present here a brief historical and futuristic perspective of genomics and some of its other ‘omics offshoots in the post-genomic era.
The importance of determining the entire genome sequence of humans was recognized more than two decades ago and was an important first step in ushering the field of genomics. It was also apparent at that time that the goal of deciphering the complete human genome was achievable with existing sequencing techniques (1,2,3) and by developing large-scale cloning and mapping strategies. Therefore, the human genome sequencing enterprise began by first characterizing markers to assemble a linkage map and developing a physical map of the genome. The assembly of a genetic map was greatly expedited by the availability of well-characterized markers uniformly distributed along the entire genome. The use of rare-cutting restriction enzymes and the analysis of large DNA fragments by pulsed-field gel electrophoresis facilitated the construction of a physical map. The development of vectors capable of accommodating large genomic fragments allowed the assembly of the human genome as a tiling array of large-insert clones (4). These larger insert clones were further supplemented with cosmid and phage lambda clones of genomic DNA. Using the Sanger technique of dideoxy chain terminators that were fluorescently tagged, the sequencing enterprise began with an international consortium funded by the Department of Energy, and was continued by the Human Genome Project at the National Institutes of Health (NIH) in the United States. Simultaneously, a competing effort was led by J. Craig Venter of The Institute for Genomics Research (TIGR) and Celera Genomics. While the NIH-led consortium took advantage of the tiling array of ordered clones to sequence the genome, the Venter group employed a random shotgun sequencing method which used computer analysis to assemble the overlapping sequences. With both groups adopting different approaches, the draft human genome sequence was completed ahead of schedule in 2001 (5,6).
The next step in this direction has been to study human genetic diversity by identifying different haplotypes present in the population by sequencing individual representative genomes (7). In the course of this overall undertaking, new DNA sequencing technologies have repeatedly been developed, such as pyrosequencing and single-molecule sequence-by-synthesis. These methods have made sequence analysis faster and cheaper, so that the goal of sequencing an entire human genome for $1000 will be achieved in the near future, and a single prokaryotic genome sequence will cost about $1. In the post-genomic era, the advent of methods for simultaneous evaluation of a large number of transcripts (transcriptomics), RNAi/miRNAs (interferomics/microRNomics), proteins (proteomics), interacting proteins (interactomics), DNA and chromatin modifications (epigenomics), and metabolites (metabolomics) have made the description of a comprehensive model of life achievable.
TranscriptomicsFollowing the sequencing of the human genome, the task at hand was to determine the genes that the sequence represents. The earliest estimates were that the human genome codes for over 100,000 transcripts, collectively known as the transcriptome. However, these estimates underwent considerable revisions until bioinformatic approaches predicted the number of transcripts encoded by the genome to be 20,000–25,000 (8), which is only about four times that for the bacterium Pseudomonas (9,10). It warrants mention that all of these genes are not expressed in every cell. The next wave of investigations then focused on which genes are expressed in a particular cell/tissue type or on a comparative analysis of transcripts between two conditions or cell/tissue types. These analyses took advantage of the human genome sequence and the knowledge of expressed sequences either from open reading frame (ORF) and intron/exon analysis, or from sequencing cDNAs cloned from various tissues. Oligo-nucleotides corresponding to every ORF were arrayed onto a high-density slide or chip, and these microarrays were then hybridized with target cDNA (or cRNA) generated from RNA isolated from a cell- or tissue-type (11,12). These studies have given deeper insights into changes in gene expression as a function of a disease and/or developmental stage.
MicroRNomicsIt appears that approximately one-third of the transcripts in an animal cell may be controlled by microRNAs (miRNAs). The profiling of miRNAs (13) has helped explain the significance of animal cell transcriptomes. The new paradigm of RNA interference (RNAi) proposed in 1998 added an additional layer to the regulation of gene expression in an animal cell (14). The RNAi phenomenon clarified the regulatory role of non-coding RNAs or miRNAs, and these small RNAs have come to occupy an important place in the hierarchy of gene regulation (15,16). Although array-based platforms exist for global profiling of miRNAs (microR-Nomics) in a cell or tissue, the significance of a specific profile is difficult to explain, because there are hundreds of target sequences in the human genome that may be potentially regulated by a specific miRNA. Newer experimental or bioinformatic methodologies will evolve in the near future to allow high-throughput identification of the most likely target of a specific miRNA in a particular cell or tissue under a given developmental stage, nutrition, environmental, or disease status.