What is the TransMed data and computing tool assembly?
The TransMed data and computing tool assembly is an infrastructure provided by ELIXIR Luxembourg for clinical and translational projects. TransMed assembly provides the tools for managing ongoing projects that often require the management of cohort recruitment, and processing of samples, data and metadata. This entails GDPR-compliant and secure data collection, storage, curation, standardisation integration and analysis of clinical data and associated molecular, imaging and sensor/mobile data and metadata.
TransMed tool assembly is also a blueprint showing how a collection of tools can be combined to support data lifecycle management in clinical and translational projects.
Who can use the TransMed data and computing tool assembly?
All researchers can use tools in the TransMed assembly individually or in combination depending on their project needs. Most of the tools in the TransMed assembly are open-source and can be re-used. ELIXIR Luxembourg provides know-how transfer and training on the tool assembly upon request from researchers and data steward organisations. To make a request please contact info@elixir-luxembourg.org.
Additionally, ELIXIR Luxembourg provides hosting of the TransMed assembly. Hosting of tools and data is free of charge for national users. For international users hosting of data (up to 10TB) is free on the basis that the data is shared with the wider research community with an appropriate access model such as controlled access. For international users, charges for the hosting tools and hosting of large datasets are evaluated on a case-by-case, please contact info@elixir-luxembourg.org for details.
For what purpose can the TransMed assembly be used?
Data management planning
Translational Biomedicine projects often deal with sensitive data from human subjects. Therefore, data management planning of this type of projects needs to take data protection and GDPR compliance into account .
Typically a TransMed project involves multiple (clinical) study sites and can contain several cohorts. During the planning phase the dataflow for the project and data/metadata collected prospectively or retrospectively needs to be documented. Projects can use the Data Information Sheet DISH to map the project dataflow and collect metadata necessary for GDPR-compliant processing. In addition, a data protection impact assessment needs to be performed taking into account partner roles, responsibilities and the data information collected via the DISH. For this purpose TransMed assembly uses the Data Information System - DAISY, which indexes all information collected by DISH and provides a repository to accumulate GDPR-required project documentation such as ethics approvals and consent templates and subject information sheets and ultimately the project data management plan. TransMed assembly includes the risk management tool MONARC, which can be used to perform Data Protection Impact Assessments (DPIA). DPIAs are a requirement of the GDPR for projects dealing with sensitive human data.
Data collection, transfer and storage
For projects involving patient recruitment the TransMed assembly provides the Smart Scheduling System, SMASCH, tracking availability of resources in clinics and manages patient visits. Pseudonymised clinical data and patient surveys are then collected by the state of the art electronic data capture (EDC) system REDCap through a battery of electronic case report forms (eCRFs). Imaging data from the clinics are deposited into a dedicated imaging platform XNAT. Omics data, both in raw and derived form can be deposited to the data provenance system iRODS. The transfer of data files can be done via various encrypted communication options as outlined in the Data transfer section of the RDMkit. The TransMed assembly most typically utilises (S)FTP, Aspera FASP and ownCloud. Data is also encrypted at rest with hard-ware and also with file-level encryption using either open-source utilities such as gpg or commercial options such as Aspera FASP.
Data curation and harmonisation
To facilitate cross-cohort/cross-study interoperability of data, upon collection, the data needs to be curated and harmonised. For this purpose the TransMed assembly uses a variety of open standards and tools. For data quality and cleansing the assembly uses OpenRefine, which provides an intuitive interface to generate facets of data that support the research to identify quality issues and outliner. It also enables traceable and yet easy data correction. For data Extraction, Transformation and Loading (ETL) the assembly uses Talend Open Studio (for complex and reusable ETLs) as well as R and Python (for ad-hoc and simple transformation). To evaluate and improve FAIRness of datasets, the assembly follows the recipes in the FAIR Cookbook developed by the FAIRplus consortium. Related to standard data models and ontologies the assembly follows the recommendations in the FAIR Cookbook recipe for selecting terminologies and ontologies.
Data integration and analysis
TransMed projects usually require different data types from different cohorts to be integrated into one data platform for the exploring, sub-setting and integrated analysis for hypothesis generation. The TransMed assembly consists of several such tools: Ada Discovery Analytics (Ada) is a web-based tool to provide a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources. The assembly also includes other tools for specific data types, such as Atlas that integrate features from various OHDSI applications for Electronic Health Record data in OMOP-CDM format into a single cohesive experience. tranSMART is a tool that provides easy integration between phenotypic/clinical data and molecular data and a “drag-and-drop” fashion data exploration interface.
Data stewardship
To facilitate the findability of data the TransMed assembly provides a Data Catalog tool that supports the indexing search and discovery of studies, data sets and samples accumulated in the context of projects from different sites and cohorts. The catalog implements a controlled-access model by integration with REMS. Audit trailing of data access is achieved by integration of the DAISY in the access process. The catalog tool can be integrated with various identity management systems such as Keycloak, Life Science Login (LS Login) or Free-IPA.
Related pages
More information
Skip tool tableTools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
Ada Discovery Analytics (Ada) | Ada is a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources. | ||
Atlas | Free, publicly available web-based, open-source software application developed by the OHDSI community to support the design and execution of observational analyses to generate real world evidence from patient level observational data. | Tool info Training | |
DAISY | Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. | Human data GDPR compliance | Tool info Training |
Data Catalog | Unique collection of project-level metadata from large research initiatives in a diverse range of fields, including clinical, molecular and observational studies. Its aim is to improve the findability of these projects following FAIR data principles. | Training | |
FAIR Cookbook | FAIR Cookbook is an online resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable (FAIR) | Health data Compliance monitoring ... | Tool info Training |
Free-IPA | FreeIPA is an integrated Identity and Authentication solution for Linux/UNIX networked environments. | ||
iRODS | Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow. | Bioimaging data Data storage | Tool info |
Keycloak | Keycloak is an open source identity and data access management solution. | Training | |
Life Science Login (LS Login) | An authentication service from EOSC-Life | IFB NeLS TSD | |
MONARC | A risk assessment tool that can be used to do Data Protection Impact Assessments | Human data | |
OHDSI | Multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics. All our solutions are open-source. | Toxicology data Data quality | Tool info |
OMOP-CDM | OMOP is a common data model for the harmonisation for of observational health data. | Data quality | |
OpenRefine | Data curation tool for working with messy data | Data quality Machine actionability | Training |
REDCap | REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment, it is specifically geared to support online and offline data capture for research studies and operations. | Health data Data quality Identifiers | Tool info Training |
REMS | REMS (Resource Entitlement Management System), developed by CSC, is a tool that can be used to manage researchers’ access rights to datasets. | Tool info Training | |
SMASCH | SMASCH (Smart Scheduling) system, is a web-based tooldesigned for longitudinal clinical studies requiring recurrent follow-upvisits of the participants. SMASCH controls and simplifies the scheduling of big database of patients. Smasch is also used to organize the daily plannings (delegation of tasks) for the different medical professionals such as doctors, nurses and neuropsychologists. | Health data | |
Talend | Talend is an open source data integration platform. | ||
tranSMART | Knowledge management and high-content analysis platform enabling analysis of integrated data for the purposes of hypothesis generation, hypothesis validation, and cohort discovery in translational research. | Data storage | Tool info |
XNAT | Open source imaging informatics platform. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data. | XNAT-PIC Bioimaging data |