Bacterial identification is essential to many applications in biology, health, and environment. Low cost and fast MALDI-TOF technology has become an approach of choice for this purpose but has several drawbacks: a long step of bacterial culture prior to analysis (>24h), low specificity and is not quantitative.
We developed a new strategy using Machine Learning algorithms to explore into LC-MSMS-DIA (Data Independent Acquisition) data in order to detect specific bacterial signals without the need of a bacterial culture nor peptide/protein identification.
As a proof of concept, we used the 15 bacteria most commonly found in urinary tract infections (UTI). To do so, 200 bacterial inoculated urine specimens were analyzed on an Orbitrap Fusion instrument in DIA mode. Raw data were converted into LC-MS maps corresponding to each precursor window, then a systematic binning in both m/z and time dimensions was performed and compared to a peptide feature detection strategy. Data tables resulting from both methods were tested with various machine learning classifiers associated to dimensionality reduction techniques to determine the best conditions for species discrimination. Moreover, mass recalibration and retention time alignment tools were used to improve the prediction accuracy and make it transferable to other laboratories. Standard LC-MS gradient (90min) and short gradient (15min), more suitable for routine analyses, were also compared. With this strategy, we were able to obtain 90 to 95% accuracy in bacterial prediction for bacterial concentration < 1x105 CFU/mL.
Our new approach, using cutting-edge technologies in proteomics and computational biology, is able to identify bacteria responsible for 85% of UTI in few hours without the need of bacterial culture or peptide/protein identification. This work paves the way to development of new generation diagnostic methods and could be extended in the future to other biological specimens and to bacteria having specific resistances.