to BioTechniques free email alert service to receive content updates.
Profiling Human Cytomegalovirus

Ashley Yeager

Can a reading frame code for more than one protein? Are short proteins more common than previously thought? The results from a new viral genomics study suggest so.

Similar to the way detectives profile a criminal, scientists have sketched the complete set of proteins coded by the human cytomegalovirus (HCMV) genome, finding templates for hundreds of previously unidentified proteins.

Using ribosome profiling and mass spectrometry, scientists profiled HCMV’s proteome when it infects a human fibroblast cell like the one pictured here. Credit: Glyn Nelson

"A starting point for understanding and studying any virus is to identify the full set of viral gene products," said study author Noam Stern-Ginossar, a post-doctoral researcher at University of California, San Francisco. "Our studies establish a paradigm for mapping and unbiasedly deciphering complex genomes."

That paradigm is the technique that she and an international team of scientists used to experimentally decode the proteome of HCMV. In the experiment, the researchers infected human foreskin fibroblast cells with the virus, then mapped the positions of ribosomes—cellular organelles where proteins are synthesized—in RNA fragments of the cells. The scientists also performed mass spectrometry to confirm some of the newly discovered proteins. The method, described in the November 23 issue of Science (1), could allow scientists to profile other complex viruses and show exactly how each one hijacks its host cells.

"The novelty of our approach is that it is experimentally based and does not rely on any assumptions or predictions," said Stern-Ginossar.

Although scientists decoded the 240,000 base-pair genome of HCMV more than 20 years ago, information about the virus’s protein coding potential came mainly from sequence-based, informatics, and computer modeling. While HCMV infects most humans and is usually harmless, it can cause disease in newborns and in adults with weakened immune systems.

To understand how it hijacks healthy cells, Stern-Ginossar and colleagues used deep sequencing of ribosome-protected mRNA fragments. The process identifies the precise locations of the ribosomes on each mRNA and, for the first time, it experimentally and systematically determined all protein-coding regions of the HCMV genome.

The technique revealed templates, or open reading frames, for hundreds of previously unidentified proteins. Overall, the scientists were surprised to find that the open reading frames could encode more than one protein and that they generated really short proteins, a few contained less than 100 amino acids. The results suggest that short proteins may be more common than previously thought. These details all factor into HCMV’s infection profile and give scientists clues to its behavior in the body.

The biggest challenge was to “establish the accuracy and robustness of the new approach,” said Stern-Ginossar. The team demonstrated that the coding regions identified had characteristic features for protein production. "More importantly we used high-resolution mass spectrometric measurements on the virally infected cells to independently confirm the accumulation of a significant fraction of the novel proteins we have identified," she said.

With the combined information, the group is beginning to understand how HCMV infects and manipulates its host cells. The team may also use the data to develop an effective immune response to combat HCMV and possibly to profile other viruses.


1. Stern-Ginossar, N. et al. 2012. "Decoding human cytomegalovirus." Science 338: 1088-1093.

Keywords:  genomics