The objective of the Proteomic Data Commons (PDC)[1] is to make cancer-related proteomic datasets accessible to the public. As a domain-specific repository within the Cancer Research Data Commons (CRDC), the vision for the PDC is to provide researchers the ability to find and analyze proteomic data across a variety of tumor types. Currently, the PDC houses data, supported by a large collection of metadata attributes, for ~20 datasets produced by CPTAC and other large-scale cancer research programs, each with cohort sizes >100. Most of the datasets in the PDC also have corresponding genomic data and images available in the Genomic Data Commons and The Cancer Imaging Archive.
The PDC is continuing the trend to replace downloading multiple local copies of data with bringing software and tools to the data in the cloud for analysis. Users may bring their own tools to co-analyze genomic and proteomic data available from a common sample. They can also define cohorts of interest and perform their own analysis of any of the publicly available data or a combination of public and private data. Private data may be stored in a PDC workspace.
One specific type of analysis facilitated through the PDC is proteogenomic analysis. The PDC provides quick access to mapping of peptide identities and quantities on the human genome as well as patient/tumor-specific protein databases containing genomic events such as single nucleotide variants and alternative splicing. It also enables fast, accurate, and convenient proteomic validation of novel genomic alterations through PepQuery.[2] These tasks are enabled through close integration with the Genomic Data Commons and NCI’s Cloud Resources.
The presentation will provide an overview of the PDC with specific examples of proteogenomic analyses.