neXtProt (www.nextprot.org) provides a large, coherent, up-to-date human protein dataset, unmatched in terms of scope and quality1. This dataset combines manually curated data from the literature and high throughput genomic, transcriptomic and proteomic data from selected resources using a single inter-operable format and community-approved standards. Full traceability is ensured by extensive use of metadata. By providing a SPARQL-based advanced query tool with extensive documentation and examples, neXtProt users can easily carry out complex searches on this data corpus and perform federated queries encompassing data found elsewhere (doi.org/10.7490/f1000research.1116829.1).
neXtProt serves as reference knowledgebase for the HUPO Human Proteome Project (HPP) since 2013 and provides specific tools to design and analyze proteomics experiments. It integrates mass spectrometry data from PeptideAtlas and displays it in the Proteomics and Peptide views. Based on this and other information, neXtProt establishes the yearly reference set of “missing proteins”, predicted from genome analysis but never experimentally confirmed2. Identification of those missing proteins is one of the main goals of the C-HPP; it is challenging and often requires targeted experiments. The Peptide Uniqueness Checker3 and the new Protein Digestion tool can be combined to plan such experiments by determining which unique peptides can theoretically be obtained by digestion of a target protein with a given protease. Finally, nextProt provides sequences, PTM and variation data in the PEFF format4, allowing to take into account a wealth of proteoforms when analyzing MS spectra.
neXtProt also provides the reference set of “proteins with unknown function”, either predicted or experimental5. Their characterization is the goal of a new initiative from the C-HPP6. Information about protein-protein interactions, subcellular location or expression encapsulated in neXtProt can help to build functional hypotheses that can be experimentally validated5.
neXtProt is constantly optimizing its tools to serve the proteomics community. Feedback is welcome.