Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Batch normalisation and mixed effects models in TMT or SWATH – two sides of the same coin (#773)

Dana Pascovici 1 , Jemma Wu 1 , Karthik Kamath 1 , Yunqi Wu 1 , Thiri Zaw 1 , Mehdi Mirzaei 1
  1. Australian Proteome Analysis Facility, Macquarie University, NSW, Austria

With the advent of larger datasets being tackled in discovery proteomics, batch effects are inevitable, whether in labelled multiplexed formats such as TMT which come in batches of 6 or 10 or larger label-free experiments using SWATH/DIA.  Various avenues exist for accounting for these effects, whether by normalisation to remove the variation between batches, or by using statistical models such as linear mixed effects models that allow for random batch effects.  If using the normalisation approach, global methods that act at the sample level and normalise for the total or median sample amount often fail to completely remove the batch variability, while IRS or ComBat normalisation can more completely remove differences between batches.  The IRS method as published can only be applied to batches of similar size and therefore is well suited for TMT cross-run normalisation, however we introduce a small variation of it which can also be applied to normalise when combining across SWATH batches of uneven size.  We present the effects of these normalisation approaches in the context of a spike-in experiment containing three 10-plex TMT replicate runs with 2%, 5% and 10% of yeast peptides spiked into mouse cell lysate, which presents a known quantitation scenario.  We also show that, from the point of view of determining differentially expressed proteins, similar results can be obtained by disregarding normalisation altogether, and applying mixed effects models with random batch effects, thus the two approaches can be used to provide additional computational checks and balances.  However, normalisation remains crucial if the goal is to provide a complete dataset to be used for data mining or machine learning approaches.