to BioTechniques free email alert service to receive content updates.
Unraveling cancer through network models
Sarah Webb, Ph.D.
Full Text (PDF)

When Raphael and his colleagues used HotNet for their TGCA analysis of the ovarian cancer genome (1), they observed well-known signaling pathways such as p53 and Ras. But they also pulled out the Notch signaling pathway based on a combination of individual, infrequently mutated genes. While other experimental evidence had suggested Notch might be involved in various cancers, Raphael emphasizes, it's a nice example of how computational tools can help point researchers toward a biologically relevant hypothesis.

Other findings have implicated genes that were not so well known to researchers. TCGA analysis of ovarian and kidney cancer samples (2) identified hotspot genes that don't line up with any current experimental hypotheses for cancer. “Are they real? Are they not? It requires some additional experimental work,” Raphael says. Even though the algorithm generates results that are consistent with experimental data, which lends credibility, “ultimately what we're doing is generating hypotheses.”

These algorithms based on molecular networks could allow researchers classify tumors into subtypes based on overlapping hot zones. According to Ideker, patients might not have mutations in the same genes, but if they have mutations in genes that are closely connected within a network, that information might help researchers understand subtypes of the disease even if patients lack similar mutation profiles.

A core issue for this type of analysis is how best to simplify available biochemical data into a form that takes into account biological function, allowing researchers to model the effects. The sheer amount of information researchers have collected in curated networks (e.g., REACTOME, BioCarta, WikiPathways, KEGG, and NCI-PID) can be overwhelming, with data on gene expression, copy number, epigenetic state, neighbors in a pathway, transcription factors, and more notes Josh Stuart of the University of California Santa Cruz. Collaborating with David Haussler, also at UC Santa Cruz, Stuart has developed an algorithm called PARADIGM which takes all that available data on a gene and transforms it into a single number to indicate whether the gene is active in the cell or not (3).

Computers can then use those single values in place of the original data to come up with predictions of how genes work within a cell. For a cancer data set, this means predictions can be made as to whether tumors with a particular genetic profile are likely to have better or worse outcomes or predict drug targets based on data from cell lines.

Stuart's algorithm is being used as part of the automated pipeline for TCGA data being funneled through Firehose, the computational pipeline used at the Broad Institute.

Biochemical identification of protein-protein interactions

Researchers use a variety of assays to establish if two proteins interact. Two high-throughput approaches to identifying interactions are yeast-2-hybrid screens and affinity purification-mass spectrometry. Researchers can also predict an interaction between two proteins if both proteins contain domains known to interact with one another. Researchers might also look at whether genes are co-expressed in a cell and whether they're localized to the same cellular compartment. While those last two aren't definitive evidence for an interaction, Stein says, they can provide support for an interaction predicted using another method.

Each method comes with advantages and limitations. For example, though widely used, yeast 2 hybrid screens have several downsides. These screens probe interactions in an in vitro milieu of reagents and antibodies and may miss transient interactions or ones that involve membrane proteins. Not to mention that since this is an in vitro assay, the screen might pick up an interaction that wouldn't normally occur in the cell. So, researchers are most confident in defining a protein-protein interaction when multiple lines of evidence from different experimental approaches support that it occurs. -SW

Expanding networks

Although network analysis is improving, it is still hampered by the many protein-protein interactions within the cell that remain unmapped. (See “Biochemical identification of protein-protein interactions”) Here, Califano says, the major challenge for the field is getting detailed reference maps for cells, particularly those of different lineages where different regulatory processes occur.

For the PARADIGM algorithm, Stuart and his colleagues are slowly adding new interactions to the networks they use. A quarter of human proteins so far have been noted to regulate another gene or gene product in the curated networks,. Taking advantage of available high-throughput data, Stuart estimates that approximately 50% of proteins are included in the PARADIGM analysis networks.

  1    2    3