Large-scale identification of proteins and genes

The ultimate goal of proteomic analysis is to explore the cell response to environmental changes through the identification of all the proteins expressed in a cell type under given physiological conditions. Recent progresses in mass spectrometry allow such large-scale analyses. When coupled with nano LC (liquid chromatography), MS/MS QTOF can indeed generate very large volumes of spectral data (up to 1500 peptides per day). Consequently, the interpretation of the spectra, i.e. the determination of the amino acid sequence of each peptide, can no longer be carried out manually.

In this context, LCP/CEA and Helix are developping efficient algorithms: 1) for generating Peptide Sequence Tags (PSTs) through partial interpretations of the MS/MS spectra; 2) for scanning protein databanks in search of sequences that match these PSTs, or for mapping them onto a complete translated genome. This second algorithm thus returns the answers to the initial problem of protein identification, but also provides an additionnal method of genome annotation..

These algorithms have been implemented as two software modules, Taggor and PepMap, combine in the PepLine software pipe-line.

