The field of metaproteomics, the study of the collective proteome of whole (microbial) ecosystems, has seen substantial growth over the past few years. This growth comes from an increased awareness that metagenomics and metatranscriptomics can be powerfully supplemented by analysis of the proteins that can be found in the community, as clearly illustrated by e.g. the Integrative Human Microbiome Project. Despite its high relevance, the field still suffers from low identification rates in comparison to single-species proteomics. The underlying challenge here, is a lack of sequence resolution and statistical validation in the current identification algorithms, which are typically designed for single-species proteomics (Colaert et al. 2011, Muth et al. 2015).
To solve this issue, we applied the recently developed, machine learning-based ReScore algorithm on several multi-species, metaproteomics datasets (Silva et al., 2019). ReScore is a post-processing tool that re-evaluates peptide-to-spectrum-matches (PSMs) based on predicted fragment ion peak intensities. To achieve this, ReScore combines two, well-established machine learning-based algorithms: Percolator, which re-scores PSMs based on the search engine output (Käll et al., 2007), and MS2PIP, which predicts fragment ion peak intensities given a peptide’s sequence, charge and modifications (Degroeve et al., 2013). In the ReScore algorithm, the search engine-dependent features of Percolator are replaced with intensity features of MS2PIP. When ReScore is applied on metaproteomics datasets, it performs similar to Percolator. However, when both feature sets from Percolator and MS2PIP are combined, a significant improvement can be achieved.
When the updated ReScore algorithm is applied on metaproteomics datasets, our results show that ReScore leads to an increased identification rate, ranging from the number of PSMs to the taxonomical level, while the false discovery rate (FDR) remains under full control as validated in an entrapment experiment with Pyrococcus furiosus (Vaudel et al., 2012).