Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

An integrated atlas of protein expression in human cancer based on public proteomics data (#901)

Andrew F. Jarnuczak 1 , Hanna Najgebauer 1 , Mitra Barzine 1 , Deepti J. Kundu 1 , Fatemeh Ghavidel 2 , Yasset Perez-Riverol 1 , Irene Papatheodorou 1 , Alvis Brazma 1 , Juan Antonio Vizcaino 1
  1. EMBL-European Bioinformatics Institute, Hinxton, CAMBRIDGESHIRE, United Kingdom
  2. University of Bergen, Bergen, Norway

DNA and RNA-based omics technologies have been successfully applied to profile primary tumours and their corresponding cell line models. While these studies often include hundreds of samples, proteomics studies are usually much smaller in scale. However, given the amount of MS-based datasets in the public domain, we can now employ in-silico analyses to integrate and reuse this valuable data. Here, we describe the generation of an integrated atlas of protein expression in human cancer using public data.

We collected and manually curated 7,171 raw files coming from 11 large-scale quantitative cancer proteomics datasets, most of them publicly available in the PRIDE database. The raw data was reprocessed in two combined analyses using MaxQuant. Quantification values were normalised through a multi-step procedure to obtain a complete integrated matrix of protein expression across all samples. Proteomics data was also integrated with public RNA expression information. All results are made available in the EMBL-EBI resources Expression Atlas and PRIDE.

Protein expression values were obtained for 191 cancer cell lines, 246 clinical tumour samples and 35 non-malignant tissues. These covered 13 different cancer lineages, including breast, colorectal, ovarian and prostate.

By exploring this integrated resource, we found that baseline protein expression in cell lines was generally representative of clinical tumour samples. However, as a key point, some differences in this overall trend were detected between cancer subtypes, as exemplified in the breast lineage. Furthermore, integration of RNA-seq data suggested that the level of transcriptional control in cell lines changed significantly depending on their lineage. Additionally, in agreement with previous studies, we found that variation in mRNA levels was generally a poor predictor of changes in protein abundance.

This work constitutes the first comprehensive meta-analysis including cancer cell lines and tumour proteomes, providing a highly valuable resource.