Analysing the proteome of large numbers of solid tissue cancer samples presents challenges such as small samples, tissue fixation reversal, data management, reproducible performance across multiple instruments over time, and large scale data normalisation and analysis (Tully et al 2019). Acquisition of proteome wide data can be relatively reproducible, and highly qualitative and quantitative across different laboratories using SWATH-MS. Is this achievable on a large scale and in a single facility?
The ACRF Centre for the Proteome of Human Cancer (ProCan®) was established to survey the proteomic landscape of all cancers. A suite of Barocyclers and 6 Sciex 6600 tripleTOF mass spectrometers was established in a single facility to operate 24/7 as a single unit. Around 10,000 proteomes have been collected to date.
In processing such large numbers, we encountered many operational bottlenecks such as tedious tissue washing, preparation and digestion (Tully et al 2019). We have developed entirely new workflows that reduce the end-to-end sample processing time from 2-6 hours to under one hour, for both removal of OCT from fresh frozen samples, or dewaxing and reversal of FFPE embedding. We achieved identical peptide and protein numbers from each. By running technical replicates between different MS instruments instead of on the same instrument we identified and removed technical missingness.
To demonstrate reproducibility, we performed a multi-instrument, longitudinal SWATH-MS assessment. Using six instruments, we acquired 1,560 technical replicates of a single set of experimental samples spanning a dilution series comprising a mixture of three biologically distinct tissues. We acquired data bi-daily for a week, weekly for a month, and monthly for 4 months, in a facility with varying maintenance schedules and minimal instrument down-time to reflect a real-world use case. We developed a new statistical method that can normalise these data with a significant improvement over existing methods. The data reveal strong linearity across the tissue dilution, outstanding reproducibility across time and instrument, and that machine learning can accurately predict the concentration of one tissue mixed within another.
Together this establishes how a single facility can effectively function in true high throughput mode, and also integrate and analyse large proteomic data sets across multiple instruments. This enables the reproducible and high-throughput proteomics required to realise the vision of precision medicine.