Molecular actions derived from gene mutations are still unclear although many of large scale cancer genomics datasets have been produced and analyzed recently. Proteogenomics is necessary to elucidate the dynamics of such gene mutations and the resulting protein expression. Therefore, it is important to establish the analysis method for the dynamics of protein expression caused by gene mutation.
We obtained protein-protein interaction data from STRING (https://string-db.org/) and loaded them into the Neo4j (https://neo4j.com/) graph database. Subsequently, we calculated all shortest paths between each protein node pair. We also obtained gene mutation information from lung adenocarcinoma cell lines stored in DBKERO (http://kero.hgc.jp/). Then, we measured the proteome expression of the cell lines with LC/MS/MS. Combining both data, we mapped up/down-regulated proteins on the shortest paths and extracted linearly up/down-regulated paths which all nodes are simultaneously co-regulated. Finally, we combined extracted paths to draw whole co-regulated network and traversed the network started from a mutated gene. As a result, continuously co-regulated pathways started from a mutated gene could be obtained.
We successfully obtained all continuously co-regulated protein pathways linked to mutated genes for each cell line sample. These pathways have different characteristics for each gene mutation subtype.
Our novel network analysis method is helpful for interpretation of proteogenomic data and it makes possible to characterize cancer subtypes. Our method can explore novel causative routes by gene mutations that it is difficult to find from the existing knowledge-based pathway databases.