When Kelleher talks about proteomics data, he distinguishes between protein identification and characterization. “If you completely characterize the protein, that means you know all its splice variants…as well as post-translational modifications, as well as any polymorphisms or mutations that occur in the protein.”
To obtain that kind of information, top-down proteomicists need two pieces of data: the mass of the intact protein and its fragmentation products, as well as its sequence. The latter information arises from tandem MS/MS, in which an intact protein is then fragmented into smaller pieces.
At present, there exists an alphabet soup of fragmentation strategies for mass spectrometers, including collisionally induced dissociation (CID) and high-energy C-trap dissociation (HCD). One popular strategy for top-down work is electron transfer dissociation (ETD), which is basically a chemical reaction inside the fragmentation chamber that causes peptide backbones to break while leaving post-translational modifications intact.
But ETD, says Fenselau, is very hit-or-miss, working “really well for some proteins but not well at all for others.” Instead, Fenselau says she is “very keen to get [her] hands on” a new approach, from Jennifer Brodbelt, a chemist at the University of Texas, Austin.
Brodbelt uses laser power to shatter proteins. In earlier work Brodbelt introduced an infrared laser into an ion trap mass spectrometer, an approach called “infrared multiphoton dissociation” (3). Now she is using an ultraviolet laser and an Orbitrap mass spectrometer to do the same thing with greater efficiency and at higher-resolution.
High-energy UV photons, Brodbelt explains, can fragment peptides in ways that other technologies cannot, perhaps shedding light on tightly folded regions that are refractory to these other approaches. “Pieces of the protein that might have not cleaved well, now might cleave better using this photon-based method,” she says.The Killer App
Fenselau uses top-down proteomics to study the histones carried by extracellular vesicles called exosome. She also uses it to identify and speciate unsequenced bacteria. To do that, her team applies a combination of MALDI-TOF and liquid-phase Orbitrap mass spectra to first identify, and then sequence, protein biomarkers in crude bacterial lysates.
“We put the whole bacterial lysate into this column, and as the proteins elute we analyze them, weigh them, record the masses of their fragment ions, and then use the bioinformatics programs that are becoming available to identify the bacteria,” she says.
Kelleher calls the approach a “killer app” for the top-down workflow. “It's an awesome use of the scanning power top-down provides across the whole protein.” But here's the catch: ProSight PC, the bioinformatics software Fenselau and her colleague Nathan Edwards, a bioinformaticist at Georgetown University Medical Center use, relies on matching detected fragment ion masses against a virtual database of known proteins, splice variants, and modifications.
“If your software identifies proteins from a database, then you'll miss the correct identification if the right answer isn't in the database,” says Edwards. That's especially true with unsequenced organisms or novel post-translational modifications.
To circumvent that problem, Edwards and Fenselau configured their searches to match well-conserved proteins, such as ribosomal proteins, based not on the mass of the intact molecule itself, as in bottom-up, but on the mass of its b- and y-ion fragments.
“There are cases where the ribosomal proteins in a related organism are different in only one or two residues, which means that many of its b- and y-ion fragments have the same mass as the b- and y-ion fragments from the true protein.”
From that, the team was able to establish sufficient sequence data to compare their protein identifications against known organisms—information that they could use to place the organism in a phylogenetic tree, without first sequencing its genome at the DNA level (4).
But Pavel Pevzner, professor of computer science and director of the NIH Technology Center for Computational Mass Spectrometry at the University of California, San Diego, thinks ProSight PC (which was developed in Kelleher's lab and commercialized by Thermo Fisher Scientific) has a significant flaw.
For Pevzner, ProSight's “Achilles heel” is the use of virtual databases, an approach no standard genomic tool uses, as it is essentially impossible to scale (To wit: Paša-Tolić has calculated there could be 40 trillion theoretical variants of histone H3.1 alone, far too many to populate a virtual database). “As a computer scientist, I cannot agree with the algorithmic design of this tool,” says Pevzner.