to BioTechniques free email alert service to receive content updates.
Search for the Undiscovered Peptide
 
Using de novo sequencing and sequence tag homology search to improve protein characterization
Bin Ma and Iain Rogers
Bioinformatics Solutions Inc., Waterloo, ON, Canada
BioTechniques, Vol. 42, No. 5, May 2007, p. 645
Full Text (PDF)

Because protein sequence databases will never be complete, cannot account for mutations between individuals and contain gene prediction errors, a large number of peptides cannot be identified using conventional MS/MS data search techniques -a hidden problem in protein identification and characterization research.

A new tool, SPIDER1,*

*SPIDER, PEAKS auto de novo, and PEAKS Protein ID, are available as part of the PEAKS Studio software from www.bioinfor.com.

is used to discover hidden peptides. Using a de novo sequence and a homologous sequence from the database, SPIDER reconstructs the real peptide, highlighting mutations and allowing for de novo sequencing error.

Proof of Concept

Bovine Serum Albumin was analyzed on a Waters Q-TOF mass spectrometer. PEAKS* software was used to generate de novo sequences for 28 spectra2,. These were the input for SPIDER searching against a Human database. SPIDER returned reasonable homologous peptides from Human Albumin for all spectra. The algorithm's proposed ‘reconstructed’ sequence was correct (i.e. matched exactly to Bovine Albumin) in 24 cases.



How Much are we Missing?

A tryptic digest of six known proteins was analysed by an LTQ Orbitrap mass spectrometer. Database search, using PEAKS Protein ID* and Mascot

Mascot™ is available from Matrix Science Ltd., UK.

together, identified the proteins and explained (i.e. matched to the known proteins) 220 of the 692 spectra. Searching the identified proteins again, with a few variable modifications (Acetylation, Deamidation, Oxidation) turned on, explained a further 53 spectra.

De novo sequences, derived by PEAKS auto de novo* from data that could not be explained otherwise, were submitted to SPIDER for sequence tag homology searching. To avoid false positives, only hits scoring higher than the best random hit (to the reverse sequences) were kept. As such, the new SPIDER tool explained another 120 spectra (Figure 2).



Conclusion

SPIDER allows us to find homologous peptides in the database and reconstruct the real peptide sequence from them. Since peptide sequence variations are common, use of this technique provides for significantly better understanding of our samples.

*SPIDER, PEAKS auto de novo, and PEAKS Protein ID, are available as part of the PEAKS Studio software from www.bioinfor.com.

Mascot™ is available from Matrix Science Ltd., UK.

References
1.) Han, Y., B. Ma, and K. Zhang. 2005.. Journal of Bioinformatics and Computational Biology 3:697-716.

2.) Ma, B., K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, and G. Lajoie. 2003.. Rapid Communications in Mass Spectrometry 17:2337-2342.