Development build for ELIXIR-Belgium/rdmkit-sandbox@0fafbdf (branch: contribute-refactor)
Skip to content Skip to footer

Your domain: Marine metagenomics

Introduction

The marine metagenomics domain is characterized by large datasets that require access to substantial storage and High-Performance Computing (HPC) for running complex and memory-intensive analysis pipelines, and therefore are difficult to handle for typical end-users and beyond the resources of many service providers. With respect to sharing metagenomics datasets in compliance with the FAIR principles, so that they can be reused, it hinges entirely on recording rich metadata about all the steps from sampling to data analysis.

Managing marine metagenomic metadata

Description

Metagenomics is a highly complex process the encompasses several steps including: sampling, isolation of DNA, generation of sequencing libraries, sequencing, pre-processing of raw data, taxonomic and functional profiling using reads, assembly, binning, refinement of bins, generation of MAGs, taxonomic classification of MAGs, and archiving of raw or processed data. To comply with the FAIR principles, you need to collect metadata about all these steps.

Moreover, in marine metagenomics, it is also necessary to characterize the marine environment of the sample, including geolocation, and the physico-chemical properties of the water.

Considerations

Tools and resources for analyzing metagenomics datasets

Description

The field of marine metagenomics has been in rapid expansion, with many statistical/computational tools and databases developed to explore the huge influx of data. You need to be able to choose between the multiple bioinformatics techniques, tools, and methodologies available for performing each step of a typical metagenomics analysis, while ensuring that your choice conforms to the best practices for the domain. Moreover, you need access to HPC facilities with capacity to execute the data analysis and store the resulting data, and therefore should be aware of what computing infrastructures are available to you (and at what cost).

Considerations

  • Are there particular characteristics of your dataset that would restrict the choice of applicable tools?
  • Are the recommended tools freely available?
    • If not, can you afford the software licensing cost?
    • If not, are there freely available alternatives?
  • Does your institution have its own HPC facilities, and what are the access conditions?
  • Does your country have a research HPC infrastructure, and what are the access conditions?

Solutions

Related pages

More information

Skip tool table
Tool or resource Description Related pages Registry
Genomic Standards Consortium (GSC) The Genomic Standards Consortium (GSC) is an open-membership working body enabling genomic data integration, discovery and comparison through international community-driven standards. Standards/Databases
MIGS/MIMS Minimum Information about a (Meta)Genome Sequence Standards/Databases
MIxS Minimum Information about any (x) Sequence Standards/Databases Training
Contributors