Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Identifying neoantigens with LC-MS by personalized de novo peptide sequencing (#834)

Hieu Tran 1 , Rui Qiao 1 , Lei Xin 2 , Xin Chen 2 , Baozhen (Paul) Shan 2 , Ming Li 1
  1. Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
  2. Bioinformatics Solutions, Waterloo, ONTARIO, Canada
  1. Introduction

Currently, candidate neoantigens could be identified with a proteogenomics method that involves two major steps: whole exome sequencing and RNA sequencing to find somatic mutations, and mass spectrometry to find neoantigens by database search. In this study, we propose a workflow to identify neoantigens solely from mass spectrometry data by integrating de novo sequencing and database search. In addition, deep learning model for de novo sequencing is tailed to each individual patient based on their own MS data (personalized proteome). Such a personalized approach enables faster and more accurate identification of neoantigens for personalized vaccines.

  1. Methods

Our personalized de novo sequencing based neoantigen finding workflow involves five steps:

(1). LC-MS/MS data from a patient was searched against canonical database. The identified normal HLA peptides represent the patient’s immunopeptidome. Spectra generated by mutated peptides remain unmatched.

(2). DeepNovo [1], a neural network model, was used for de novo peptide sequencing. The model was trained with the identified spectrum acquired from the first step. This unique advantage allows DeepNovo to adapt to a specific immunopeptidome of an individual patient.

(3). Perform de novo sequencing on those unidentified spectrums with the trained model, and only keep high-confidence de novo peptides with an expected mass tag accuracy of 95%.

(4). Quality control of de novo peptides.

(5). Select candidate neoantigens.

 

  1. Results

We test our workflow on a public available dataset from a melanoma patient [2]. MS/MS data was searched against UniProt database with PEAKS X at 1% of false discovery rate (FDR). After the aforementioned 5 steps, our workflow reported 158 HLA class I candidate neoantigens, which include 4 out of the 5 neoantigens identified by the current proteogenomics method at 1% FDR. More importantly, our workflow identified an extra neoantigen that matched a nucleotide mutation reported by RNA-seq.

  1. Tran, N.H, Zhang, X., Xin, L., Shan, B. & Li, M. De novo peptide sequencing by deep learning. Proc. Natl. Acad. Sci. U. S. A. 114, 8247-8252 (2017). 2. Van Allen, E.M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207-211 (2015).