Your tasks: Data sensitivity

Is your data sensitive?

Description

In general, data can be categorised into two types i.e. sensitive data and non-sensitive data. Non-sensitive data can be shared openly without a risk of any harm. The term sensitive data is used when making data publicly available could put people, organisations, countries, and/or ecosystems at risk - this could be for example, personal or commercial information, and it could also be information about habitat, geographical location, and breeding grounds of endangered/vulnerable species. Such data sensitivity must be protected against unauthorized access, and therefore one should be cautious when deadling with potentitally sensitive or sensitive information. It is important to identifty, at early stage of data management process, that at which point data becomes sensitive or what parts of (existing or newly generated) data are sensitive. What is considered sensitive information is usually regulated by national laws and may differ between countries, so it is important to take into consideration both global and local regulations and policies.

Considerations

If you deal with any information about individuals from the EU, you are bound by the EU General Data Protection Regulation. In GDPR, such data is called “personal data”.
In the context of GDPR “special category data” is a subclass of “personal data” that is potentially even more harmful, and GDPR prescribes very strict rules for dealing with this data. Article 9 of GDPR defines the special categories as data consisting of racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic data, biometric data, data concerning health or data concerning a natural person’s sex life or sexual orientation. Confusingly, these special categories are sometimes colloquially called “sensitive data”. Note that this page is concerned with the broader definition of “sensitive data”.
Information in Life Science projects are for the most part categorised under health and genetic data and are considered special category data under the GDPR.
You need to assess whether or not your dataset contains attributes that can lead to the identification of a person. Note that combinations of attributes that are themselves not identifiable can be identifiable together. See the definitions described in the How can you de-identify your data section.
You need to know the de-identification status of your data. Life Science research data rarely contains directly identifying attributes. Research data would typically be pseudonymised or anonymised. If you work with personal data, you must understand the difference between these two (see under de-identification below).
For some studies there is a cohort owner, often a clinical party or a trusted third party that can map study participant keys back to names and surnames. Such data is considered pseudonymous.
If there are no means to map the data back to individuals, then the data is considered anonymous and is out of the scope of the GDPR.
You should keep in mind that anonymising data is a notoriously difficult task. Does your dataset contain a wide array of attributes, or exhibit unique traits/patterns such that one can reasonably expect that not more than a dozen people in the world have those together? In that case, you can not assume that it is anonymous. Such data run the risk of being linked back to individuals through various technical means. You need to take into account that technical means to identify people in the future may be more powerful than than they are right now: i.e. data that is anonymous right now may not be anonymous forever.

Solutions

Identify what legislations and regulations there are that you are expected to follow. Your institution’s website may give you hints on where you can look for information about data sensitivity.
If you cannot determine if your data is sensitive, contact someone with expert knowledge in that area.

How can you de-identify your data?

Description

Data anonymization is the process of irreversibly modifying personal data in such a way that subjects cannot be identified directly or indirectly by anyone, including the study team. If data are anonymized, no one can link data back to the subject.

Pseudonymization is a process where identifying-fields within data records are replaced by artificial identifiers called pseudonyms or pseudonymized IDs. Pseudonymization ensures no one can link data back to the subject, apart from nominated members of the study team who will be able to link pseudonyms to identifying records, such as name and address.

Data anonymization involves modifying a dataset so that it is impossible to identify a subject from their data. Pseudonymization involves replacing identifying data with artificial IDs, for example, replacing a healthcare record ID with an internal participant ID only known to a named clinician working in the study.

Considerations

Both anonymization and pseudonymization are approaches that comply with the GDPR. Simply removing identifiers cannot guarantee data anonymity. A dataset may contain unique traits/patterns that could identify individuals. An example of this would be recording 2 potentially unrelated attributes such as the instance of a rare disease and country of residence, where there is only a single case of this disease in this country. Data that is anonymous currently may not be anonymous in the future. Future datasets on the same individual may disclose their identity. Anonymization techniques can sometimes damage the statistical properties of the data, for example, translating current participant age into an age range.

Solutions

An example of pseudonymization is where participants in a study are assigned a non-identifying ID and all identifying data (such as name and address) are removed from the metadata to be shared. The mapping of this ID to personal data is held separately and securely by a named researcher who will not share this data. There are well-established data anonymization approaches, such as k-anonymity, l-diversity, and differential privacy.

Tool assembly

TSD

The Sensitive Data Service (TSD) provides a platform to store, compute and analyse research sensitive data in compliance with Norwegian regulations regarding individuals’ privacy.

Tool assembly

COVID-19 Data Portal

The COVID-19 Data Portal brings together relevant datasets for sharing and analysis to accelerate coronavirus research.

Tool assembly

TransMed

TransMed tool assembly from ELIXIR Luxembourg supports projects in clinical and translational biomedicine.

More information

Links to FAIR Cookbook

FAIR Cookbook is an online, open and live resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable; in one word FAIR.

Declaring data permitted uses

Links to DSW

With Data Stewardship Wizard (DSW), you can create, plan, collaborate, and bring your data management plans to life with a tool trusted by thousands of people worldwide — from data management pioneers, to international research institutes.

Are there privacy reasons why your data can not be open?

Will you collect any data connected to a person, "personal data"?

How is pseudonymization handled?

Could the coupling of data create a danger of re-identification of pseudo- or anonymized personal data?

Does this dataset contain personal data?

Does this dataset contain sensitive information?

Are personal data sufficiently protected?

Training

Training in TeSS

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
EU General Data Protection Regulation	Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation).	TSD Human data

National resources

Tools and resources tailored to users in different countries.

Tool or resource	Description	Related pages
BioMedIT	A secure IT network for the responsible processing of health-related data.	Human data Data analysis
Federated EGA Finland	FEGA allows you to store and share sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR). The European Genome-phenome Archive (EGA)	CSC Researcher Data Steward Data publication Existing data Human data
Findata	The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data.	CSC Researcher Data Steward Existing data Human data
Fingenious	Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks.	CSC Researcher Data Steward Human data
Sensitive Data Services for Research	CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer.	CSC Researcher Data Steward Data analysis Data storage Data publication Human data
Luxembourg Covid-19 data portal	The Luxembourgish COVID-19 Data Portal acts as a collection of links and provides information to support researchers to utilise Luxembourgish and European infrastructures for data sharing.
Educloud Research	Educloud Research is a platform provided by the Centre for Information Technology (USIT) at the University of Oslo (UiO). This platform provides access to a work environment accessible to collaborators from other institutions or countries. This service provides a storage solution and a low-threshold HPC system that offers batch job submission (SLURM) and interactive nodes. Data up to the red classification level can be stored/analysed.	Data analysis Data storage
Federated EGA Norway node	Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD. The European Genome-phenome Archive (EGA)	Human data Existing data Data publication TSD
HUNTCloud	The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large-scale information. HUNT Cloud offers cloud services and lab management. It is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences.	Human data Data analysis Data storage
Nettskjema	Nettskjema is a solution for designing and managing data collections using online forms and surveys. It can be used for collecting sensitive data and offers a high degree of security and privacy.	TSD
Norwegian COVID-19 Data Portal	The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.	Human data Existing data Data publication
RETTE	System for Risk and compliance. Processing of personal data in research and student projects at UiB.	Human data Data security GDPR compliance Policy maker Data Steward
SAFE	SAFE (secure access to research data and e-infrastructure) is the solution for the secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on the “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data.	Human data Data analysis Data storage
TSD	The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO.	Human data Data analysis Data storage TSD
usegalaxy.no	Galaxy is an open-source, web-based platform for data-intensive biomedical research. This instance of Galaxy is coupled with NeLS for easy data transfer. Galaxy	Data analysis Existing data Data publication NeLS
Federated EGA Sweden node	Secure archiving and sharing of genetic and phenotypic data resulting from Swedish biomedical research projects. The European Genome-phenome Archive (EGA)	Human data Existing data Data publication
Human Data Guidelines	Guidelines as well as further information on legal considerations when working with human biomedical data.	Human data
NBIS Data Management Consultation	Free consultation service regarding data management questions in life science research.	Data management plan Data publication
Swedish Pathogens Portal	The Swedish Pathogens Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.	COVID-19 Data Portal Human data Existing data Data publication