Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

The proteome landscape of the kingdoms of life (#737)

Johannes B. Mueller 1 , Philipp E. Geyer 1 2 , Ana R. Colaço 2 , Peter V. Treit 1 , Sophia Doll 1 , Sebastian V Winter 1 , Jakob Bader 1 , Niklas Köhler 3 , Fabian Theis 3 , Alberto Santos 2 , Matthias Mann 1 2
  1. Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
  2. Faculty of Health Sciences, University of Copenhagen, NNF Center for Protein Research, Copenhagen, Denmark
  3. Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

Improvements in sequencing technologies have led to vast amounts of genomic data, including full genome sequences of a large number of organisms. This opens up the possibility for parallel exploration of their proteomes; however, so far there has been no large-scale effort in this direction. Moreover, quantitative proteomics data exist almost exclusively for the most common model organisms. Here, we set out on a proteomics exploration by selecting 100 sequenced organisms from the entire tree of life. We analyzed their proteomes by state of the art label-free quantitation methods. To make our proteome data extendable to other research laboratories, we took advantage of a novel reversed-phase chromatographic column – the µPAC (PharmaFluidics). It has a lithographically etched pillar structure that does not suffer from typical drawbacks of bead-based columns, yielding a highly reproducible retention time of molecules between measurements and columns. With about 340,000 proteins from more than two million sequence unique tryptic peptides, our proteomic map of the tree of life represents by far the largest proteome dataset. It confirms the existence of a very large number of predicted proteins, increasing the total known to the research community by about 50%. This extensive dataset is suited for deep learning algorithms for prediction of technical as well as biological parameters, which we demonstrate on peptide retention times for LC-MS measurements. Quantitative comparison of protein expression and functional abundances across species in a graph database allows exploration of proteins, pathways and organelles across phyla. In particular, we visualize the conservation of protein groups that have distinct cellular functions by quantitative protein levels, which allows us to rank those functions within and between organisms. To make this data easily accessible, our graph database will be available via an interactive web site, which enables browsing of structural information, functional annotations and protein homologies.