Poster Presentation HUPO 2019 - 18th Human Proteome Organization World Congress

Genome Annotation of a Model Diatom Phaeodactylum tricornutum Using an Integrated Proteogenomic Pipeline (#535)

feng ge 1
  1. Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, HUBEI, China

Diatoms comprise a diverse and ecologically important group of eukaryotic phytoplankton that significantly
contributes to marine primary production and global carbon cycling. Phaeodactylum tricornutum
is commonly used as a model organism for studying diatom biology. Although its genome was sequenced
in 2008, a high-quality genome annotation is still not available for this diatom. Here we report the development
of an integrated proteogenomic pipeline and its application for improved annotation of P. tricornutum
genome using mass spectrometry (MS)-based proteomics data. Our proteogenomic analysis unambiguously
identified approximately 8300 genes and revealed 606 novel proteins, 506 revised genes, 94 splice
variants, 58 single amino acid variants, and a holistic view of post-translational modifications in P. tricornutum.
We experimentally confirmed a subset of novel events and obtained MS evidence for more than
200 micropeptides in P. tricornutum. These findings expand the genomic landscape of P. tricornutum
and provide a rich resource for the study of diatom biology. The proteogenomic pipeline we developed
in this study is applicable to any sequenced eukaryote and thus represents a significant contribution to
the toolset for eukaryotic proteogenomic analysis. The pipeline and its source code are freely available
at https://sourceforge.net/projects/gapeproteogenomic.