Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

SAAVpedia: identification, functional annotation, and retrieval of single amino acid variants for proteogenomic interpretation (#585)

Heeyoun Hwang 1 , Soo Youn Lee 1 , Young-Mook Kang 1 , Hyejin Kim 1 2 , Dong Geun Kim 1 2 , Ji Eun Jeong 1 2 , Jin Young Kim 1 , Jong Shin Yoo 1 2
  1. Korea Basic Science Institute, Cheongju, CHUNGBUK, Rep. of Korea
  2. Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, Rep. of Korea

Next-generation genome sequencing has enabled the discovery of numerous disease/drug-associated non-synonymous single nucleotide variants (nsSNVs) that alter the amino acid sequences of a protein. Although several studies have attempted to characterize pathogenic nsSNVs, few have been confirmed as single amino acid variants (SAAVs) at the protein level. Here, we developed the SAAVpedia platform to identify, annotate, and retrieve pathogenic SAAV candidates from proteomic and genomic data. The platform consists of four modules: SAAVidentifier, SAAVannotator, SNV/SAAVretriever, and SAAVvisualizer. The SAAVidentifier provides a reference database containing 18,206,090 SAAVs, and performs identification and quality assessment of SAAVs. The SAAVannotator provides functional annotation with biological, clinical and pharmacological information for interpretation of condition specific SAAVs. The SNV/SAAVretriever module enables bi-directional navigation between relevant SAAVs and nsSNVs with diverse genomic and proteomic data. SAAVvisualizer provides various statistical plots based on functional annotations of detected SAAVs. To demonstrate utility of SAAVpedia, the proteogenomic pipeline with protein-protein interaction network analysis was applied to proteomic data from breast cancer and glioblastoma patients. We will extend the SAAV validation database from a variety of proteomic data to further biomedical research. SAAVpedia will play a key role in pathogenic biomarker discovery based on massive proteogenomic data interpretation. The SAAVpedia is available at