Accurate detection of modifications and hypermodified peptides requires careful exclusion of alternative explanations for the same spectra, as well as aggregation of multiple lines of evidence corroborating what would otherwise be very surprising identifications. Furthermore, automated interpretation of mass offsets detected by open modification search is also required to avoid labor-intensive manual curation to distinguish real modifications from search artifacts.
Maestro stratified search begins by identifying spectra in the smallest search space with the most information (spectral library search), followed by typical database search of unidentified spectra (considering only common modifications) and only afterwards is there an attempt to explain spectra that remain unidentified as possibly containing unexpected modifications. In addition, spectral networks algorithms are used to detect correlated peptide fragmentation patterns which help confirm (or challenge) surprising identifications by correlation to less-surprising identifications of related peptides. Finally, ModDecode builds on these to annotate detected mass offsets with (possibly combinations of) known modifications and thus identify many novel and rare post-translational modifications.
Reanalysis of a cell lines dataset (PXD004452) with over 12 million spectra revealed 510,899 modified peptide variants out of a total of 826,539 unique peptide variants. Surprisingly, the diversity of detected modifications spans over 200 different known modifications, all strongly supported by multiple lines of evidence. These also revealed hypermodified proteins with over 4,200 modified variants (84% of all variants mapped to the same protein), as well as 1,582 proteins with >75% of identifications coming from modified peptides. Interestingly, the distribution of modifications is highly non-uniform along the protein backbone, with modifications tending to cluster around hypermodified protein regions with up to 1,236 variants detected on a single 41 amino acid stretch on Alpha-2-HS-glycoprotein (P02765), as well as concentrating on specific peptides such as a single EIF1A peptide which was detected in over 140 distinct modified variants.