The characterization of complex mass spectrometry data obtained from metaproteomics or clinical studies presents unique challenges and potential insights in the pathogenesis of human disease. Previous approaches essentially rely on prior expectation or knowledge of likely sample composition in order to construct focussed search libraries, but this is potentially limiting in many cases. Here we present a novel software pipeline to directly estimate the proteins and species present in complex mass spectrometry samples at the level of expressed proteomes, using de novo sequence tag matching and probabilistic optimization of very large sequence databases prior to target-decoy search. We validated our pipeline against the results obtained from the recently published MetaPro-IQ (Zhang et al., 2016) pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications being found. We then showed that using an unbiased search of the entire release of UniProt (ca. 90 million protein sequences) MetaNovo was able to identify a similar bacterial taxonomic distribution compared to that found using a small, focused matched metagenome database, but now also simultaneously identified proteins present in the samples that are derived from other organisms that are missed by 16S or shotgun sequencing and by previous metaproteomic methods. Using MetaNovo to analyze a set of single-organism human neuroblastoma cell-line samples (SH-SY5Y) against UniProt we achieved a comparable MS/MS identification rate during target-decoy search to using the UniProt human Reference proteome, with 22583 (85.99 %) of the total set of identified peptides shared in common. Taxonomic analysis of 612 peptides not found in the canonical set of human proteins yielded 158 peptides unique to the Chordata phylum as potential human variant identifications. Of these, 40 had previously been predicted and 9 identified using whole genome sequencing in a proteogenomic study of the same cell line.