Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Identification of isotope clusters from mass spectra using neural network model (#637)

Daewook Kwon 1 , Seunghyuk Choi 1 , Eunok Paek 1
  1. Hanyang University, Seoul, Seoul, SEOUL, South Korea

Mass spectrometry-based proteomics plays an important role in identifying peptides. Peptide identification strongly depends on a precursor mass estimated from the preceding precursor scan. However, we often observe that estimated precursor masses include isotopic errors. Determining isotope clusters is the first step in determining correct precursor masses. Existing tools such as RAPID (1) and MS-Deconv (2) adopted heuristic functions to recognize correct isotope clusters. Such heuristic functions have been developed based on the characteristics of theoretical isotope clusters, but in isotope clusters of an experimental scan include noise and may overlap with different isotope clusters. Here, we propose a machine learning approach to identify correct isotope clusters, and it has a benefit of accommodating characteristics of experimental isotope clusters.

We designed an artificial neural network model to train characteristics of isotope clusters. The model takes a monoisotopic mass and intensities of the first to the twelfth peaks in a cluster as input, and predicts whether the given cluster is an isotope cluster or not.

To train the model, we collected ~4.2M peptide spectrum matches (PSMs) from a previous study (3). Detected isotope clusters (DICs) corresponding to the precursor of each PSM were extracted using both RAPID and MS-Deconv, and we filtered out ~2.95M DICs, whose spectral contrast angle (4) against theoretical isotope clusters (5) is below 0.80. We generated ~1.25M negative isotope clusters (NICs) consisting of partial peaks of selected ~1.25M DICs.

We applied 5-fold cross validation to prevent overfitting. The accuracy was 96.68% on average. We used DICs and NICs derived from different experimental methods (6,7) to test the model. The sensitivity and specificity were 97.35% and 85.85%, respectively.

  1. K. S. Park, J. Y. Yoon, S. H. Lee, E. Paek, H. J. Park, H. J. Jung, S. W. Lee, “Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data” Anal. Chem., vol.80, 7294-7303, 2008.
  2. Liu X, Inbar Y, Dorrestein PC, et al. “Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach”, Mol. Cell. Proteomics., vol.9(12),27782-2782, 2010.
  3. Zolg DP, Whilhelm M, Schnatbaum K, Zerweck J, Knaute T, Delanghe B, Kuster B, “Building ProteomeTools based on a complete synthetic human proteome”, Nat. Methods., vol.14(3), 259-262, 2017.
  4. Umut H. Toprak, Ludovic C. Gillet, Alessio Maiolica, Pedro Navarro, Alexander Leitner, Ruedi Aebersold, “Conserved Peptide Fragmentation as a Benchmarking Tool for Mass Spectrometers and a Discriminating Feature for Targeted Proteomics”, Mol Cell Proteomics., vol.13(8), 2056-2071, 2014.
  5. Martin Loos, Christian Gerber, Francesco Corona, Juliane Hollender, and Heinz Singer, “Accelerated Isotope Fine Structure Calculation Using Pruned Transition Trees”, Anal. Chem., vol.87(11), 5738-5744, 2015.
  6. Philip Thomas, Trevor G. Smart, “HEK293 cell line: A vehicle for the expression of recombinant proteins”, Journal of Pharmacological and Toxicological Methods, vol.51(3), 187-200, 2005.
  7. Geiger T, Wehner A, Schaab C, Cox J, Mann M, “Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins”, Mol Cell Proteomics., 11:M111-014050, 2012.