In many ways, cancer is simply a devastating natural mutagenesis experiment. Alterations to genes and their products, as well as additional downstream modifications, lead to dangerous and deadly consequences. From recent studies, we know there are a few key cancer drivers, genes such as p53 and Ras that have central roles within the genetic pathways causing these devastating effects. But interestingly, these mutations don't show up in all cancers; in fact, they represent a small portion of the information that researchers and clinicians require to understand tumor biology and diagnose and treat disease.
The result: cancer researchers are now focusing on the “long tail”, collecting and cataloging rare mutations occurring in 1% or fewer of cancer patients. These rarer mutations may underlie the critical functional changes within cells that characterize and define this collection of diseases. But there is a big challenge here, a double-edge sword for researchers: because of their rarity, it is actually much harder to distinguish these rare mutations from random mutations that don't affect disease.Building network context
So, how then do researchers go about locating these important rare variants? The functional consequences of mutations in the genome can often be seen in the molecules, such as proteins, that they encode. Over time, bioinformatics researchers have learned how these biomolecules interact with each other in the cell, curating protein and metabolite connections into wiring diagrams with nodes for proteins or other molecules and edges that indicate an interaction with another molecule. This has led to the development of a landscape of bioinformatics methods for understanding the misfires that cause cancer and control the disease process according to Trey Ideker a bioinformatician, at the University of California at San Diego. Researchers gather systematic information on genetic interactions from genome sequencing that can then be combined with public data on protein-protein interactions, producing more comprehensive databases featuring millions of associations between molecules. The question then, Ideker says, is how researchers can take this grab bag of interactions and introduce context to build pathway models that researchers can take advantage of to understand and diagnose disease.
Even with millions of associations catalogued, these databases are far from complete. In many ways, our current understanding of interaction networks is like navigating through a major city with a general map explains Andrea Califano of Columbia University. “It's like having a map of a city with Main St and Broadway and not actually knowing whether the city is New York or Boston.”
Although understanding how proteins function together is critical for basic research, the real benefit could come from the diagnosis and treatment of disease. Here, cancer is the “killer app”—an important problem that motivates the need for networks.
“It's not just a good idea for solving these diseases, but required for solving these diseases,” says Ideker. So, as large data sets emerge from cancer genome sequencing projects such as the NIH/NHGRI's Cancer Genome Atlas project (TCGA), bioinformatic analysis that integrates that information into the context of protein networks is essential for helping researchers to make sense of the deluge of cancer data.
Using networks for analysis
A primary motivation for the TCGA project was to understand the similarities and differences among various types and subtypes of human cancers. Researchers involved in the TCGA are currently looking at hundreds of samples from each of more than 20 tumor types to identify rare mutations involved in cancer.
One bioinformatics approach to understanding the possible effects of specific rare mutations is to create a “heat map”—a graphical representation of mutations in context with nearby neighbors in a protein network. HotNet, an algorithm developed by Ben Raphael and his colleagues at Brown University, is one such algorithm. The idea is straightforward: mutations to a single gene confer a certain amount of “heat” to a pathway. If no nearby genes are mutated, only that single mutation is interesting to researchers. However, if 4 or 5 genes that are only mutated occasionally but are closely linked in the network, those mutations propagate heat among them creating a “hot zone”, and implicating that area of the network. “The idea behind all these approaches is to implicate your neighborhood,” explains Ideker.