Blood, plasma, and serum are key for biomarker discovery as they’re the predominant types of samples used for diagnostic analyses in clinical practice and are available in biobanks from thousands of clinical studies. Nevertheless, MS/MS analysis of these types of samples typically identifies less than one third of all acquired spectra and little is known about the diversity and quantitative variation of post-translational modifications across samples from different individuals or over time series from the same individuals. BloodKB aggregates identifications from deep reanalysis of >35 million spectra in >1TB of public mass spectrometry data deposited in MassIVE, containing >1,300 samples from dozens of individuals and covering a broad range of variations in age and gender, health and disease, and time series in response to health interventions.
Altogether, our reanalysis doubled the identification rate for spectra in these datasets and Maestro open search for unexpected post-translational modifications revealed an unprecedented level of diversity, with hundreds of known modification types significantly detected at minimum false localization rate of 1% and supported by very significant correlations between the spectra of modified and unmodified peptides (i.e., p-values <1e-10). A further 120+ putative novel modifications were also detected under the same stringent conditions even after consideration of possible combinations of all known modifications.
Reanalysis further reveals >800 hypermodified peptides with 10+ distinct combinations of modifications, with up to over 200 unique modification variants for a single peptide sequence. Peptide identifications were also uniquely mapped to ~20,000 exons (out of a total of ~29,000 exons), with ~7,000 distinct peptides mapped to exon splice junctions and hundreds more mapped to functional regions associated with disease or regulatory interactions.
As an open community resource, BloodKB will continue to grow in volume and knowledge as more data becomes publicly available and additional analyses or metadata are contributed over time.