Development build for ELIXIR-Belgium/rdmkit-sandbox@0fafbdf (branch: contribute-refactor)
Skip to content Skip to footer

Your domain: Bioimaging data

Introduction

Bioimaging specialists are acquiring an ever growing amount of data: images, associated metadata, etc. However, image data management often does not receive the attention it requires or is avoided altogether since it is considered a burdensome task. At the same time, storing images on personal computers or USB keys is no longer an option, assuming it ever was! Data volume is exponentially increasing, and not just the acquired images need storing but potentially processed images will be generated and will need to be kept alongside the original images. It is critical to proactively identify where the data will be stored, for how long, who will cover the cost of the hardware, and who will cover the cost of managing the infrastructure. All the stakeholders need to be involved in the preliminary discussions: biologists, facility managers, data analysis, IT support, etc., to ensure that the requirements are understood and met.

What constitutes bioimage data

An image is much more than a collection of zeros and ones. The image will contain the binary representing the pixels on screen but it is usually packed with useful metadata. You will find the obvious keys indicating how to interpret the zeros and ones, you can also find a lot of acquisition metadata e.g. hardware/instrument used, settings used, etc.

The number of image proprietary formats is very large and keeps increasing. It is challenging to support so many proprietary file formats i.e. read/extract metadata. The Bio-Formats library currently supports over 150 different file formats. The Dataset Structure Table shows the extension of the files to read and indicates the structure of the image itself e.g. single file, multiple files, one image file and a companion file, etc.

Data management challenges

The number of files and their size could be extremely large. Deleting/misplacing a file could invalidate the study itself, preventing its reuse.

Managing images immediately becomes a larger problem, not only the binary files need to be handled, but also the associated metadata. Several efforts have been made and still ongoing to capture those metadata. Understanding and capturing the metadata are critical for many reasons, just to mention a few: analysis, detection of possible faults in acquisition systems. It is important to decide how much details will be recorded since this could dramatically increase the metadata volume and therefore the effort required to capture the metadata.

The collection of images could be:

  • data acquired within a facility;
  • data acquired in other facility (commissioned work or external guest user) and “transported” by the users to their facility;
  • slides scanned.

After acquisition, data are usually moved to more permanent storages with different level of permissions. This depends on the facility policies and could prevent collaborative work. Users will also adopt their own “organisation” conventions, this could potentially make it very difficult to find or understand the data when, for example, the data are migrated to a new location or when the researcher who acquired the data leaves the lab.

Standard (meta)data formats

Description

Unlike other domains, the bioimaging community has not yet agreed on a single standard data format which is generated by all acquisition systems. Instead, the images described above are most frequently collected in proprietary file formats (PFFs) defined by hardware vendors. Currently, there are several hundred such formats that the researchers may encounter. These formats combine critical acquisition metadata with the multidimensional binary data but are often optimized for quickly writing the data to disk. Tools and strategies are outlined below to ease working with this data.

Considerations

  • When purchasing a microscope, consider carefully how the resulting files will be processed. If open source tools will be used, proprietary file formats may require a time-consuming conversion. Discuss with your vendor if an open format is available.
  • If data from multiple vendors is to be combined, similar a conversion may be necessary to make the data comparable.
  • Imaging data brings special considerations due to the large, often continuous nature of the data. Single terabyte-scale files are not uncommon. Sharing these can require special infrastructure, like a data management server (described below) or a cloud-native format (described below). One goal of such infrastructure is to enable the selective (i.e. interactive) zooming of your image data without the need to download the entire volume, thereby reducing your internet bandwidth and costs.
  • Importantly, most acquisition systems produce proprietary file formats. Understanding how well they are supported by the imaging community could be a key factor of a successful study. Will it be possible to analyse or view the image using open-source software? Will it be possible to deposit the images to public repositories when published? The choice of proprietary file formats could prevent from using any other tools that are not related to the acquisition systems.

Solutions

Vendor libraries: Some vendors provide open source libraries for parsing their proprietary file formats. See libCZI from Zeiss.

Open source translators: Members of the community have developed multi-format translators that can be used to access your data on-the-fly i.e. the original format is preserved, no file written on disk. This implies that you will need to perform this translation each time you access your data and, depending on the size of the image(s), you could run out of memory. Translation libraries include,

  • Bio-Formats (Java) - supports over 150 file formats
  • OpenSlide (C++) - primarily for whole-slide imaging (WSI) formats
  • AICSImageIO (Python) - wraps vendor libraries and Bio-Formats to support a wide-range of formats in Python

Permanent conversion: An alternative is to permanently convert your data to

  • OME-Files - The Open Microscopy Consortium (OME) has developed an open format, “OME-TIFF”, to which you can convert your data. The Bio-Formats (above) library comes with a command line to tool bfconvert that can be used to convert to files to OME-TIFF
  • The bioformats2raw and raw2ometiff toolchain provided by Glencoe Software allows the more performant conversion of your data, but requires an extra intermediate copy of the data. If you have available space, the toolchain could also be an option to consider.

Cloud (or “object”) storage: If you are storing your data in the cloud, you will likely need a different file format since most current image file formats are not suitable for cloud storage. OME is currently developing a next-generation file format (NGFF) that you can use.

Metadata: If metadata are stored separately from the image data, the format of the metadata should follow the subject-specific standards regarding the schema, vocabulary or ontologies and storage format used such as:

(Meta)Data collection

Description

The acquisition of bioimaging data takes place in various environments. The (usually) light or electron microscope may be in a core facility, in a research lab or even remotely in a different institution. Regardless of where the instrument is located, the acquired imaging data is likely to be stored, at least temporarily, in a local, vendor specific system’s PC next to the acquisition system due to their complexity and size. This is often unavoidable in order to securely store the data as quickly as the acquisition process itself.

Due to the scale of data, keeping track of the image data and the associated data and metadata is essential, particularly in life sciences and medical fields. Organising, storing, sharing, publishing image data and metadata can be very challenging.

Considerations

  • Consider using an image management software platform. Image management software platforms offer a way to centralize, organize, view, distribute and track all of their digital images and photos. It allows you to take control over how your images are managed, used and shared within research groups.
  • When evaluating an image management software platforms, check if it allows you to:
    • Control the access you wish to give to your data and how you wish to work e.g. PI only can view and annotate my data or you can choose to work on project with some collaborators.
    • Access data from anywhere via either Web or Desktop clients and API.
    • Store the metadata with your images. For example, analytical results can be linked to your imaging data and can be easily findable.
    • Add value to your imaging data by for example linking them to external resources like ontologies.
    • Make your data publicly available and slowly moving towards FAIRness.
  • Try to avoid storing bioimaging data in the local system’s PC.
  • If possible, make a transfer to central storage mandatory. If not possible, enable automation of data backup to central storage.
  • Consider support for minimal standards (metadata schemas, file formats, etc.) in your domain.
  • Consider reusing existing data.

Solutions

  • Agnostic platforms that can be used to bridge between domain data include:
  • Image-specific data management platforms include:
    • OMERO - broad support for a large number of imaging formats.
    • Cytomine-IMS - image specific.
    • XNAT - medical imaging platform, DICOM-based.
    • MyTARDIS - largely file-system based platform handling the transfer of data.
    • BisQue - resource for management and analysis of 5D biological images.m
  • Platforms like OMERO, b2share also allow you to publish the data associated with a given project.
  • Metadata standards can be found at the Metadata Standards Directory Working Group.
  • Ontologies Resources available at:
  • Existing data can be found by using the following resources:
  • Find software tools, image databases for benchmarking, and training materials for bioimage analysis in the BIII registry

Data publication and archiving

Description

Public data archives are an essential component of biological research. However, publishing image data and metadata can be very challenging for multiple reasons, just to mention a few: limited infrastructure for some domains, data support, sparse data.

Bioimaging tools and resources are behind compared to what is available in sequencing for example. mainly due to limited infrastructures capable of hosting the data. There are a few ongoing efforts to breach that gap.

Two distinct types of resources should be considered:

  • Data archives (“storage”) as a long-lasting storage for data and metadata and making those data easily accessible to the community.
  • Added-values archives: store enhanced curated data, typically aiming at a scientific community.

Considerations

  • If you only need to make your data available online and have limited metadata associated, consider publishing in a Data archive.
  • If your data should be considered as a reference dataset, consider an Added-values archive.
  • Select and choose the repositories based on the following characteristics:
    • Storage vs Added-value resources.
    • Images format support.
    • Supported licenses e.g. CC0 or CC-BY license. For example the Image Data Resource (IDR) uses Creative Commons Licenses for submitted datasets and encourages submitting authors to choose.
    • Which types of access are required for the users e.g. download only, browse search and view data and metadata, API access.
      • Does an entry have an access e.g. idr-xxx, EMPIAR-#####?
      • Does an entry have a DOI (Digital Object Identifier)?

Solutions

Comparative table of some repositories that can be used to deposit imaging data:

Repository Type Data Restrictions Data Upload Restrictions DOI Cost
BioImageArchive Archive No PIH data None --- Free
Dryad Archive No PIH data 300GB Yes over 50GB (*)
EMPIAR Added-value Electron microscopy imaging data None Yes Free
Image Data Resource (IDR) Added-value Cell/Tissue imaging data, no PIH data None Yes Free
SSBD:database Added-value Biological dynamics imaging data None --- Free
SSBD:repository Archive Biological dynamics imaging data None --- Free
Zenodo Archive None 50GB per dataset Yes Free
  • PIH: Protected health information.
  • (*) unless submitter is based at member institution.

Related pages

More information

FAIR Cookbook is an online, open and live resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable; in one word FAIR.

Skip tool table
Tool or resource Description Related pages Registry
4DN-BINA-OME-QUAREP (NBO-Q) Rigorous record-keeping and quality control are required to ensure the quality, reproducibility and value of imaging data. The 4DN Initiative and BINA have published light Microscopy Metadata Specifications that extend the OME Data Model, scale with experimental intent and complexity, and make it possible for scientists to create comprehensive records of imaging experiments. The Microscopy Metadata Specifications have been adopted by QUAREP-LiMi and are being revised in QUAREP-LiMi in collaboration with instrument manufacturers OMERO Standards/Databases
AICSImageIO Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python
b2share Store and publish your research data. Can be used to bridge between domains Standards/Databases
bfconvert The bfconvert command line tool can be used to convert files between supported formats.
BIII The BioImage Informatics Index is a registry of software tools, image databases for benchmarking, and training materials for bioimage analysis Tool info
Bio-Formats Bio-Formats is a software tool for reading and writing image data using standardized, open formats Tool info
bioformats2raw Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.
BioImageArchive The BioImage Archive stores and distributes biological images that are useful to life-science researchers. Data publication Standards/Databases
BioPortal A comprehensive repository of biomedical ontologies Health data Documentation and meta... Tool info Standards/Databases Training
BisQue Resource for management and analysis of 5D biological images Tool info
Cytomine-IMS Image Data management
Dryad Open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data Biomolecular simulatio... Data publication Standards/Databases
EMPIAR Electron Microscopy Public Image Archive is a public resource for raw, 2D electron microscopy images. You can browse, upload and download the raw images used to build a 3D structure OMERO Structural Bioinformatics Data publication Tool info Standards/Databases Training
Image Data Resource (IDR) A repository of image datasets from scientific publications OMERO Microbial biotechnology Tool info Standards/Databases
iRODS Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow. TransMed Data storage Tool info
MyTARDIS A file-system based platform handling the transfer of data
OMERO OMERO is an open-source client-server platform for managing, visualizing and analyzing microscopy images and associated metadata Galaxy OMERO Tool info Training
Ontology Lookup Service EMBL-EBI's web portal for finding ontologies FAIRtracks Health data Documentation and meta... Tool info Standards/Databases Training
OpenSlide C library that provides a simple interface to read whole-slide images (also known as virtual slides)
raw2ometiff Java application to convert a directory of tiles to an OME-TIFF pyramid. This is the second half of iSyntax/.mrxs => OME-TIFF conversion.
SSBD:database Added-value database for biological dynamics images Standards/Databases
SSBD:repository An open data archive that stores and publishes bioimaging and biological quantitative datasets
XNAT Open source imaging informatics platform. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data. TransMed XNAT-PIC
Zenodo Generalist research data repository built and developed by OpenAIRE and CERN FAIRtracks Plant Phenomics Biomolecular simulatio... Plant sciences Single-cell sequencing Data publication Identifiers Standards/Databases Training
Zooma Find possible ontology mappings for free text terms in the ZOOMA repository. Tool info Training
Skip national tools table

Tools and resources tailored to users in different countries.

Tool or resource Description Related pages Registry
Technology Hotels

More than 130 Technology Hotels offer access to high-end technology and expertise in the field of bioimaging, bioinformatics, genomics, medical imaging, metabolomics, phenotyping, proteomics, structural biology, and/or systems biology.

Human data Proteomics Researcher Compliance monitoring ...
Contributors