Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

A multi-instrument, longitudinal assessment of high-throughput proteomics using 1,560 SWATH-MS profiles from standardised cancer tissue samples (#788)

Rebecca C Poulos 1 , Peter G Hains 1 , Rohan Shah 1 , Natasha Lucas 1 , Dylan Xavier 1 , Sadia Mahboob 1 , Max Wittman 1 , Jennifer MS Koh 1 , Steven G Williams 1 , Srikanth Manda 1 , Michael Hecker 1 , Asim Anees 1 , Michael Dausmann 1 , Rosemary Balleine 1 , Jean Yang 2 , Terence P Speed 3 4 , Brett Tully 1 , Yansheng Liu 5 6 , Roger Reddel 1 , Phillip J Robinson 1 , Qing Zhong 1
  1. ProCan®, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW, Australia
  2. School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
  3. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
  4. Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
  5. Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA
  6. Yale Cancer Biology Institute, Yale University, West Haven, CT, USA

SWATH mass spectrometry (MS)-based proteomics is a valuable tool for biomedical research. A selection of landmark studies have examined high-throughput SWATH-MS. However, industrial-scale reproducible proteomic measurements across multiple instruments, over time in a single laboratory, have not been investigated. To this end, with a cautious study design, we acquired 1,560 SWATH-MS runs using six SCIEX 6600 TripleTOFs. These instruments operated in harmony at ProCan over a four-month period, collecting approximately 5,500 additional MS runs in the interim to reflect a real-world scenario. Our experimental samples were a dilution series containing prostate cancer in fixed proportion (50%), with a variable fraction of ovarian cancer (3.125% - 50%) offset by yeast. We identified 6,865 proteins in a combined spectral library generated from pooled DIA runs searched with Mascot, X!Tandem and MSGF+. Our proteomic data were processed using OpenSWATH, with PyProphet for FDR-control. The median unnormalised coefficient of variation measured within a single instrument during the first experimental week approximated 10%. We applied RUV-III (Remove Unwanted Variation) for experiment-wide normalisation to correct for machine, temporal and unknown batch effects. We then evaluated and demonstrated the benefits of imputing missing values with non-missing measurements from technical replicates spanning multiple instruments. After normalisation, we observed a strong linear relationship between the ovarian tissue proportion in a sample and intensity of ovarian-specific peptides. Moreover, we assessed the sample sizes required to overcome technical variation in order to identify significantly different peptide intensities across dilutions. Finally, we applied machine learning to predict ovarian tissue concentration in each sample with high accuracy. We have produced large-scale SWATH-MS data, obtained over time across multiple instruments with varying maintenance schedules. We establish for the first time, that such measurements can be effectively integrated and analysed using appropriate statistical methods, to enable the reproducible and high-throughput proteomics required for precision medicine.