Background
Since 2010 through neXtProt, the Human Proteome Project (HPP) has progressively assigned protein existence 1 (PE1) status to human proteins according to community-agreed, high-stringency mass spectrometry (MS) metrics. However, the identification of human membrane proteins at this high-stringency remains problematic. Limited arginine (R) and lysine (K) residues in multi-transmembrane domains (TMDs) and restricted tryptic activity to this hydrophobic environment lead to the underrepresentation of TMD tryptic peptides in bottom-up MS experiments. These observations led us to examine the predicted experimental tryptic peptide yield from TMD-containing membrane proteins (TMD-MP), if trypsin were to be precluded from acting on TMD R/K residues. In silico analysis of the tryptic peptide repertoire corresponding to the complete sequence and N-/C-terminal strand + ecto- + endo- loop domains (i.e., non-TMD regions) of all TMD-MP highlights that a number of these proteins are unable to generate tryptic peptides that meet high-stringency HPP guidelines.
Method
UniProt defined ecto- and endo- domains including the N- and C- terminal strands (i.e., with only TMDs removed) of TMD-MPs were tryptically digested in silico. Proteins that potentially met HPP PE1 MS-based assignment criteria (i.e., ability to generate 2+ MS-detectable 9+ amino acid (AA), uniquely-mapping, non-nested tryptic peptides) from these “soluble” hydrophilic domains were determined. Further analyses were performed at lower stringency to visualize how stringency affects the number of proteins that qualify.
Result
In total, 204 of 3,878 (i.e., ~5%) TMD-MPs in the human proteome could not generate peptides as per current HPP MS guidelines, gradually decreasing to 103 (~2.5%) if stringency criteria were relaxed to a single uniquely mapping peptide of 7+ AAs. Olfactory receptors were (by far) the largest protein family that contributed to these statistics. In total, we also observed that 54 (~1.4 %) of all TMD-MPs failed to generate tryptic peptides upon digestion of their complete sequence.