The identification and analysis of proteomics data is inherently reliant on databases, requiring them to be of a consistent high-quality. UniProt is a comprehensive, expert-led, publicly available database of protein sequence, function and variation information. UniProt currently holds over 13,000 reference proteomes, that are constantly updated and reviewed based on collaborations with a variety of sources such as Ensembl, RefSeq, ENA and proteome-centric repositories such as ProteomicsDB, Peptide Atlas, MaxQB etc.
To facilitate searching of proteomics data, reference proteomes can be downloaded in FASTA format or queried programmatically using the UniProt API, allowing researchers direct access to data from large scale studies, variation annotation, and proteomics data amongst others mapped to UniProt from cross-referenced databases. Data is available for download and querying in a range of formats; including XML, FASTA and the recently published HUPO-PSI PEFF.
To facilitate further investigation of target proteins, mapped proteomics data pertaining to unique peptides is available graphically alongside variant, domain, and post-translational modification sites for each canonical protein sequence, allowing researchers to reference their data within the context of a proteins’ peptide sequence and understand the current functional biological information available for their proteins of interest.
All data are freely accessible from www.uniprot.org