Skip to main content
Research article - Peer-reviewed, 2021

Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN)

Van Damme, Renaud; Hoelzer, Martin; Viehweger, Adrian; Mueller, Bettina; Bongcam-Rudloff, Erik; Brandt, Christian

Abstract

Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used for quantification and annotation by providing RNA-Seq data (optionally). We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and, if provided de novo metatranscript annotations across the metagenomic sample and for each bin. MUFFIN is available on github under GNUv3 licence: .Author summaryDetermining the entire DNA of environmental samples (sequencing) is a fundamental approach to gain deep insights into complex bacterial communities and their functions. However, this approach produces enormous amounts of data, which makes analysis time intense and complicated. We developed the Software "MUFFIN," which effortlessly untangle the complex sequencing data to reconstruct individual bacterial species and determine their functions. Our software is performing multiple complicated steps in parallel, automatically allowing everyone with only basic informatics skills to analyze complex microbial communities.For this, we combine two sequencing technologies: "long-sequences" (nanopore, better reconstruction) and "short-sequences" (Illumina, higher accuracy). After the reconstruction, we group the fragments that belong together ("binning") via multiple approaches and refinement steps while also utilizing the information from other bacterial communities ("differential binning"). This process creates hundreds of "bins" whereas each represents a different bacterial species with a unique function. We automatically determine their species, assess each genome's completeness, and attribute their biological functions and activity ("transcriptomics and pathways"). Our Software is entirely freely available to everyone and runs on a good computer, compute cluster, or via cloud.

Published in

PLoS Computational Biology
2021, volume: 17, number: 2, article number: e1008716
Publisher: PUBLIC LIBRARY SCIENCE

Authors' information

Swedish University of Agricultural Sciences, Department of Molecular Sciences
Swedish University of Agricultural Sciences, Department of Animal Breeding and Genetics
Hoelzer, Martin
Friedrich Schiller University of Jena
Viehweger, Adrian
Leipzig University
Viehweger, Adrian
Friedrich Schiller University of Jena
Swedish University of Agricultural Sciences, Department of Molecular Sciences
Bongcam-Rudloff, Erik (Bongcam Rudloff, Erik)
Swedish University of Agricultural Sciences, Department of Animal Breeding and Genetics
Brandt, Christian
Swedish University of Agricultural Sciences, Department of Animal Breeding and Genetics
Brandt, Christian
Friedrich Schiller University Jena

UKÄ Subject classification

Bioinformatics (Computational Biology)

Publication Identifiers

DOI: https://doi.org/10.1371/journal.pcbi.1008716

URI (permanent link to this page)

https://res.slu.se/id/publ/111287