Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Calculating well-adjusted spectrum e-values using cloud approaches (#892)

Julian Uszkoreit 1 , Dirk Winkelhardt 1 , Katrin Marcus 1 , Martin Eisenacher 1
  1. Ruhr University Bochum, Bochum, NRW, Germany

The currently most widely used approach to match MS/MS spectra to peptides is by using a database search engine. These algorithms take as input a protein database, which mostly is specific for only one species or a small group of taxa, and a set of MS/MS spectra. After the actual search engine's peptide identification, often a strategy containing the target-decoy-approach to estimate the false discovery rate (FDR) is applied. While this strategy worked well for many years, new high-resolution mass spectrometers with precursor and fragment mass errors in the lower ppm respective mmu range exhibit problems. Firstly, the essential decoys are no longer identified, as their theoretical mass spectra do not fit the measured data. With this effect, the traditional FDR estimation is no longer possible. Furthermore, almost all search engines perform well in distinguishing which given peptide matches a spectrum best. But the differentiation, whether the match of one spectrum is better than another spectrum's match, is often not possible when sing the algorithm's scores. Many search engines have for example a tendency to score heavier, longer peptides higher than lighter, smaller sequences.

To overcome both problems, we modified a compute-intensive strategy introduced almost five years ago, which becomes now feasible using cloud technology approaches. Instead of matching only the relatively few peptides in the precursor tolerance to each respective spectrum, we additionally match 1-10 thousand decoy peptides per spectrum, which are created to match the spectrum's tolerance. This amount of peptide spectrum matches per spectrum allows us to calculate well-calibrated e-values per spectrum, which are comparable between spectra and hopefully require no additional FDR estimation. As a side-effect of our strategy we can allow searches with very large databases - up to the complete UniProt KB - without exhibiting the FDR problems, which lead to lower sensitivity.