Background: Human leukocyte antigen (HLA) molecules are cell-surface glycoproteins that present peptide antigens for surveillance by T lymphocytes seeking signs of disease. Mass spectrometric analysis allows us to identify large numbers of these peptides (the immunopeptidome) following affinity purification from cell lysate. However, in recent years there has been a growing awareness of the ‘dark side’ of the immunopeptidome: unconventional peptide epitopes that elude detection by conventional search methods because their sequences are not present in reference protein databases.
Methodologies: Here we establish a bioinformatic workflow to aid identification of peptides generated by non-canonical translation of mRNA. The workflow incorporates both standard transcriptomics software and novel computer programs to produce cell line-specific protein databases based on 3-frame translation of the transcriptome. Optionally, known mutations can be included to produce an ‘alternate’ transcriptome and corresponding imputed protein database. We then search our experimental data against both transcriptome-based and standard databases using PEAKS Studio. Finally, further novel software helps to compare the various result sets arising for each sample, and pinpoint putative genomic origins for the identified unconventional sequences.
Results: We have trialled the workflow to study the immunopeptidome of the acute myeloid leukaemia cell line THP-1, starting with THP-1 RNA-Seq data downloaded from the Sequence Read Archive. We confidently identified over 8000 peptides from 3 replicates of purified THP-1 HLA peptides using Swissprot. Using the transcriptome-based databases, we recapitulated >70% of these, and also identified over 250 unconventional peptides, many of which might be generated by translating UTRs or the ‘wrong’ frame.
Conclusions: Our workflow, which we term ‘immunopeptidogenomics’, can provide databases which include pertinent unconventional sequences, and can also be tailored for neoepitope discovery in cancer, without becoming unsearchably large. Immunopeptidogenomics is a step towards the unbiased search approaches needed to illuminate the dark side of the immunopeptidome.