Introduction
Intrinsically disordered proteins (IDP) domain brings together databases and tools needed to organize IDP data and knowledge in a Findable, Accessible, Interoperable and Reusable (FAIR) manner. Experimental data created by users must be complemented by metadata in order to be deposited in an IDP resource. This document describes what community standards must be followed and where to find information needed to complete the metadata of an IDP experiment or study.
Annotating or curating data from an IDP related experiment or study
Description
As a researcher in the field of Intrinsically Disordered Proteins (IDPs), you want to know how to process an experimental result in a FAIR way. As a final aim, you want to deposit the data in a community database or registry for wider adoption.
Considerations
You can split the experimental process in several steps:
- How should you properly describe an IDP experiment? Are there any community standards that you should follow?
- How do you add metadata in order to make IDP data more machine readable?
- How should you publish IDP data to a wider audience?
Solutions
-
The IDP community developed a MIADE standard under a PSI-ID workgroup. The standard specifies the minimum information required to comprehend the result of a disorder experiment.
The standard is available in XML and TAB format. You can check example annotation in XML and TAB format and adapt it to your data.
-
The IDP community developed the Intrinsically Disordered Proteins Ontology (Intrinsically disordered proteins ontology (IDPO)). The ontology is an agreed consensus of terms used in the community, organised in a structured way.
-
You should deposit primary data into relevant community databases (BMRB, PCDDB, SASBDB). You should deposit literature data to the manually curated database DisProt. DisProt is built on MIADE standard and IDPO ontology. As such, DisProt requires curators to annotate all new data according to community standards. IDP data from primary databases, together with curated experimental annotations and software predictions, is integrated in the comprehensive MobiDB database. DisProt and MobiDB add and expose Bioschemas markup to all data records increasing data findability and interoperability.
Issues annotating or describing an IDP related term or study
Description
IDP field is actively evolving. It integrates newly published experimental evidence of protein disorder and translates it in a machine readable way in an IDP database. This mapping process relies on accurate knowledge of protein identifiers, protein regions under study and disorder region functional annotation.
Considerations
Most common issues that you as a researcher can encounter during the mapping process are:
- how to properly and uniquely identify the protein (or fragment) under study?
- how to deal with missing terms in IDPO?
Solutions
-
In order to uniquely identify the protein under study, you should identify the protein on UniProt reference protein database. The protein identifier must be complemented with an isoform identifier (if needed) in order to completely match the experimental protein sequence.
Use the SIFTS database to precisely map the experimental protein fragment (deposited at PDB) to a reference protein database (UniProt) at an amino acid level.
-
Experimental evidence from literature must be mapped to relevant IDPO terms. If no suitable term could be found in IDPO, try with following resources:
- Evidence & Conclusion Ontology (ECO) for experimental methods
- Molecular Interactions Controlled Vocabulary for molecular interactions
- Gene Ontology for functional terms
If there isn’t an appropriate term in ontologies or vocabularies, you can submit a new proposal for community review at DisProt feedback.
Related pages
More information
Skip tool tableTools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
Bioschemas | Bioschemas aims to improve the Findability on the Web of life sciences resources such as datasets, software, and training materials | Machine actionability | Standards/Databases Training |
BMRB | Biological Magnetic Resonance Data Bank | Structural Bioinformatics | Tool info |
DisProt | A database of intrinsically disordered proteins | Tool info Standards/Databases Training | |
Intrinsically disordered proteins ontology (IDPO) | Intrinsically disordered proteins ontology | Tool info | |
MIADE | Minimum Information About Disorder Experiments (MIADE) standard | Training | |
MobiDB | A database of protein disorder and mobility annotations | Tool info Standards/Databases Training | |
PCDDB | The Protein Circular Dichroism Data Bank | Tool info | |
PDB | The Protein Data Bank (PDB) | Galaxy Structural Bioinformatics Data publication | Tool info Training |
SASBDB | Small Angle Scattering Biological Data Bank | ||
SIFTS | Structure integration with function, taxonomy and sequence | ||
UniProt | Comprehensive resource for protein sequence and annotation data | Galaxy Proteomics Single-cell sequencing Structural Bioinformatics Machine actionability | Tool info Standards/Databases Training |