Liquid chromatography (LC)–tandem mass spectrometry (MS/MS) has been the most widely used tool for proteomics studies over the past decades. Developing more accurate MS/MS spectrum and retention time (RT) prediction methods is one of the most effective way to improve the confidence of peptide identification.
In this work, we propose a deep learning-based method for accurate MS/MS spectrum and RT prediction using a hybrid model that combines convolutional neural network and bi-directional long-term and short-term memory network. The model takes a peptide sequence as an input, and outputs relative intensities of b/y product ions at each possible fragmentation site including neutral loss of ammonia or water, as well as normalized RT (iRT [1]) of the peptide.
We trained and validated the model with three LC-MS/MS data sets of two organisms acquired in two laboratories, i.e. two data sets of HeLa cells and a data set of mouse cerebellum [2,3]. The median dot products between the predicted and experimental b/y peak intensities (0.87–0.94) and the Pearson correlation coefficients of predicted and experimental iRT (> 0.98) were comparable to experimental repeats with each data set and for cross-organism and cross-lab validation. For benchmark purpose, we further compared the performance of our model on peptide fragmentation and RT prediction to existing tools. Our model outperformed most of the existing methods and can be adapted to both collision-induced dissociation and higher energy collisional dissociation fragmentation.
In contrast to traditional machine learning methods requiring manual design of appropriate features, the deep neural networks can learn different representations of objects automatically, and thus are more capable of handling the complexity of peptide fragmentation and retention. Our method will benefit assays development for selected reaction monitoring or parallel reaction monitoring, as well as data-independent acquisition proteomics [4].