The number of public mass spectrometry datasets has been exponentially growing in proteomics through the public domain such as jPOST, https://jpostdb.org as reusable resources. In general, public datasets consist of accumulating search results from multiple projects and institutions under the wide variety of experimental conditions, leading to linear increase of false positive hits when the registered results are just combined. Therefore, it is required to develop novel repository-scale computational workflows to control the quality of the re-analyzed datasets. Here we will propose a universal measure for annotated peptide MSMS spectra stored in public repositories named UniScore. Using the scores, we can accumulate different results from multiple search engines as well as multiple peak picking algorithms to minimize the false positive identifications.
MS raw data acquired in the global proteome analysis of Tryptic peptides from HeLa cells by Q-Exactive with 8h gradient were obtained from the jPOST repository with JPST000200. Another dataset was Phosphopeptides from mouse Hepa cells by Q-Exactive (PXD001792). The peak list was created by MaxQuant and the protein identification was performed by Mascot, X!Tandem, Comet and MaxQuant.
UniScore was calculated based on the peak annotation of MSMS spectra, independent of employed search engines and peak picking algorithms. Moreover, by optimizing the parameters for UniScore, we can reduce the false discovery rate by minimize the decoy hits. One of the major parameters of UniScore is based on the concept of peptide sequence tags (PSTs) and we accepted the number of amino acids consisting of the PSTs. So far, the UniScore-based identification generates 10-20% increase in peptide identification for relatively large-scale datasets.
We successfully developed a universal measure for peptide MSMS spectra. It is applicable for re-analysis of accumulated datasets in public repository as well as for spectral libraries.