Proteins are important biological macromolecules critical for structure, function and regulation of human cells and tissues. The human genome draft is available since 2003 but until now not all the coding genes have known protein products. Human Proteome Project (HPP) was launched in 2010 with the aim of mapping entire human proteome. The HPP community has identified 88.62% of the coding genes as protein products. The remaining 11.38% of the proteins are missing. Since most of the missing proteins are membrane proteins which might have clinical implications, therefore it is important to identify these proteins for utilizing their therapeutical potential. There are several technical challenges that make the missing protein (MP) identification through mass spectrometry (MS), a difficult task.
The largest family among the missing proteins is the olfactory receptors (ORs) which are the superfamily of G-protein coupled receptors (GPCRs). There is no convincing MS evidence is found, even for the single OR. Four of the ORs are given the protein status based on orthogonal evidence. Therefore, we collated the available orthogonal evidence for ORs from published literature. In particular, available ligand evidence can be used for novel agonist prediction.
We have applied different classical ML and deep learning methods to an ectopic OR, with a broad ligand spectrum. OR1G1 (UniProt ID: P47890) is the member of family 1 of the OR superfamily, located on Chromosome 17. OR1G1 is ectopically expressed in gut enterochromaffin cells (normal and tumours) where it is known to be responsible for serotonin release. On the basis of classifier performance, we applied the naive Bayes classifier to a large test dataset, resulting in high probability predictions . Such an approach will assist in collecting experimental evidence for the missing olfactory proteome.