Skip to main content
SLU publication database (SLUpub)

Research article2022Peer reviewedOpen access

mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation

Buck, Moritz; Mehrshad, Maliheh; Bertilsson, Stefan

Abstract

Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license.

Published in

NAR genomics and bioinformatics
2022, Volume: 4, number: 3, article number: lqac060