Background
The Proteomics Standards Initiative (PSI) Extended File Format (PEFF) allows for the specification of known mutations, post-translational modifications (PTMs), and other processing events of a given proteome in a unified, consistent format.
The Trans-Proteomic Pipeline (TPP) is a widely used and well-validated open source suite of software tools that facilitates and standardizes proteomics analysis. We describe recent enhancements and additions to TPP that enable full analysis, from raw file to the export of validated results and visualization, taking advantage of the rich information contained in PEFF.
Methods
TPP includes the latest version of the Comet search engine, which supports simple amino acid variants and mass modifications specified in PEFF files. The pepXML format was extended to allow it to represent search results that incorporate non-canonical sequence variants and known mass modifications. A new peptide-sequence-to-protein mapping mechanism was incorporated to exhaustively map detected peptide sequences to all possible protein variations. Various results and sequence viewers and interfaces have been updated to display and explore these results, with links to the relevant knowledge sources for further user verification.
Results
Naturally-occurring mutations resulting in sequence differences are present in most organisms, yet the majority of protein identification is made against a single canonical database of sequences that contain little knowledge of such variants. A further complication arises when post-translational modifications that are not specified in the search parameters are present in the sample, leading to incorrect search results for those spectra.
By automatically incorporating validated sequence variants and PTMs, a larger share of high-quality spectra are confidently identified, increasing sensitivity while decreasing the false discovery error rate. For instance, one of the highest-scoring spectra that was assigned to a decoy sequence in PeptideAtlas was found to map to a well characterized protein via a SAAV that has been observed in multiple experiments.