Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Kojak 2.0: New features for the analysis of cross-linked proteins (#572)

Michael R Hoopmann 1 , Alex Zelter 2 , Michael Riffle 2 , Jimmy K Eng 2 , Trisha N Davis 2 , Robert L Moritz 1
  1. Institute for Systems Biology, Seattle, WA, United States
  2. University of Washington, Seattle, WA, United States

Shotgun MS analysis of cross-linked proteins is a versatile tool in proteomics. Data analysis of cross-linked proteins has unique challenges for which specialized algorithms are required. Kojak was initially released in 2015 and designed to perform database searching on MS/MS spectra of cross-linked peptides. Designed to be computationally efficient, Kojak is highly customizable and allows for analysis with many different cross-linkers on both small and large datasets. Its simple interface, combined with adherence to open data standards, enabled Kojak’s use with diverse experimental conditions and allowed integration into analytical pipelines. Development of the algorithm continues to build upon these core features. Here we present Kojak version 2.0, a major update to the original Kojak algorithm.

Algorithm improvements include an optimized two stage search strategy that prioritizes identification of the larger peptide in the cross-link in the first pass. In the second pass, only those peptides that can link to the best candidates in the first pass are searched, providing a significant cost savings in computation time as database searches become larger and include increasing numbers of modifications in the parameters. The scoring functions were updated to include calculation of E-values, including individually for each peptide in the cross-link, enabling assessment of the cross-linked PSM using the E-value of its lowest scoring peptide, an invaluable parameter for downstream validation algorithms such as PeptideProphet and Percolator. Kojak now includes a feature that makes use of 15N-labeled proteins mixed with their natural abundance counterpart, to enable accurate identification of inter-protein and intra-protein cross-links from homomultimers. Pipeline improvements include more open data standards for input (mzML, mzXML, MGF, Thermo RAW) and output (pepXML and mzIdentML), allowing integration into any workflow using these highly ubiquitous formats. Kojak 2.0 remains open-source and multi-platform.