to BioTechniques free email alert service to receive content updates.
Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications
 
Nuno Bandeira
University of California, San Diego, CA, USA
BioTechniques, Vol. 42, No. 6, June 2007, pp. 687–695
Full Text (PDF)

Introduction

Tandem mass spectrometry (MS/MS) is nowadays the technology of choice for the identification of proteins and posttranslational modifications (1). Fast-paced technological developments have delivered high-throughput analysis of thousands of proteins in a mere couple of hours at unprecedented levels of mass resolution and accuracy (2). However, the major computational approaches to the automated identification of the millions of tandem mass spectra generated on a daily basis still interpret every single tandem mass spectrum in isolation like the original techniques for de novo sequencing introduced by Klaus Biemann's group in the 1960s (3) and database searching first proposed in the early 1990s (4,5). In database searching, each tandem mass spectrum is compared against a given database of known peptides and significant matches are selected for protein identification. Elaborate scoring functions have been derived to provide statistical significance to observed identifications and help make this the approach of choice for the analysis of model organisms. However, database search is only applicable when the proteins' sequences are obtained in advance through other experimental procedures, such as DNA sequencing or Edman degradation. Conversely, de novo sequencing becomes the mass spectrometric approach of choice for studies of unknown proteins. Nevertheless, fully automated de novo analysis has remained an elusive goal, due to difficulties in sequencing accuracy—the best algorithms for individual ion trap tandem mass spectra still predict one incorrect amino acid out of every four predictions (6). We propose to approach the MS/MS identification problem from a different perspective—first by combining uninterpreted tandem mass spectra from overlapping peptides and only then determining consensus identifications (of sequences and modifications) for sets of aligned tandem mass spectra. Using this approach, it is possible to improve de novo sequencing accuracy to the level of one mistake out of every ten predicted amino acids and further discover many known and some putative novel modifications (7).

Experimental Setup

Most experimental protocols use enzymatic digestion to generate smaller peptides, which are then analyzed by mass spectrometry to identify proteins in the sample. Trypsin digestion is often employed because its strong cleavage specificity tends to be reproducible and facilitates the analysis of complex samples by generating only a few different peptides per protein. Alternatively, less specific enzymes or combinations of enzymes may be used to generate extensive protein coverage (8,9). As illustrated in Figure 1, these procedures tend to generate many overlapping peptides covering the same protein regions. While the specificity of trypsin digestion leads to many spectra covering the same protein regions, nonspecific digestion tends to generate spectra covering large portions of the protein sequences.



After enzymatic digestion, the sample consists of a collection of peptides, usually containing sizeable amounts of most peptides. This material is then processed through a series of steps such that, in principle, each cycle of MS/MS focuses exclusively on multiple instances of the same peptide. The same cycle is then subjected thousands oftimes to a variety of procedures to maximize the number of spectra from different peptides (1). After isolating many copies of a particular peptide, a tandem mass spectrum is obtained by inducing breaks at the amide bonds, and thus generating peptide fragments whose masses and relative abundances are then measured by a mass analyzer (10). Most often, the resulting peptide fragments correspond to b ions (prefix) or y ions (suffix), although other types of ions may also be generated (see Figure 1C for an illustration). Since most amino acids have measurably different masses, the ion masses observed in a tandem mass spectrum typically correlate well with the theoretical masses calculated from the peptide sequence. In addition, MS/MS can be used to identify posttranslational modifications by detecting the characteristic changes in residue mass due to the addition or loss of particular compounds (11). In particular, for a modification of mass m, all b ions and y ions containing the modified residue will have their mass offset by the same mass m.

  1    2    3    4