Currently, candidate neoantigens could be identified with a proteogenomics method that involves two major steps: whole exome sequencing and RNA sequencing to find somatic mutations, and mass spectrometry to find neoantigens by database search. In this study, we propose a workflow to identify neoantigens solely from mass spectrometry data by integrating de novo sequencing and database search. In addition, deep learning model for de novo sequencing is tailed to each individual patient based on their own MS data (personalized proteome). Such a personalized approach enables faster and more accurate identification of neoantigens for personalized vaccines.
Our personalized de novo sequencing based neoantigen finding workflow involves five steps:
(1). LC-MS/MS data from a patient was searched against canonical database. The identified normal HLA peptides represent the patient’s immunopeptidome. Spectra generated by mutated peptides remain unmatched.
(2). DeepNovo [1], a neural network model, was used for de novo peptide sequencing. The model was trained with the identified spectrum acquired from the first step. This unique advantage allows DeepNovo to adapt to a specific immunopeptidome of an individual patient.
(3). Perform de novo sequencing on those unidentified spectrums with the trained model, and only keep high-confidence de novo peptides with an expected mass tag accuracy of 95%.
(4). Quality control of de novo peptides.
(5). Select candidate neoantigens.
We test our workflow on a public available dataset from a melanoma patient [2]. MS/MS data was searched against UniProt database with PEAKS X at 1% of false discovery rate (FDR). After the aforementioned 5 steps, our workflow reported 158 HLA class I candidate neoantigens, which include 4 out of the 5 neoantigens identified by the current proteogenomics method at 1% FDR. More importantly, our workflow identified an extra neoantigen that matched a nucleotide mutation reported by RNA-seq.