Projet ANR-22-PESN-0007


Sharing reliable protocols to transform datasets into gold standards: Application to Neuro-Vascular Pathologies

  • FAIR
  • workflows
  • standards
  • data and process provenance
  • sharing and reusability protocols
  • automatic annotation of datasets

Short description

Access to a wide variety of complementary, multi-scale and massive data collections offers unprecedented opportunities for healthcare research. A large number of analyses can be performed on these datasets, for scientific advances and discoveries to emerge. The national ‘Digital Health’ Acceleration Strategy ambitions to boost digital health innovation which includes designing innovative health data analysis approaches.

Importantly, such data analyses are complex, they rely on various computational tools that have to be parametrized and chained together. There is now compelling evidence that many scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance, especially in the healthcare domain.

Sharing of health data is often hampered by personal data protection requirements and comes up against technical constraints (security, volume). These constraints can however be limited when the protocols and the workflows implementing analyses are sufficiently reusable to reproduce analyses in situ.

Additionally, when designed to be reusable, protocols and their implementations - workflows - provide the provenance traces of the analyzed data, describing how data results have been obtained and thus increasing scientists’ confidence in the results produced.

This calls for innovative solutions for the annotation of biomedical and clinical datasets and extraction of provenance. Protocols and their implementation as workflows using and generating datasets should be elevated to first-class objects and the inherent dual relationship between datasets and protocols/workflows should be better exploited.

Challenges thus include standardization and annotation for datasets and protocols, extracting protocols and workflows from text and other datasets, and synthesizing them into interoperable, yet shareable protocols.

The originality of ShareFAIR lies in tackling both the reliability of datasets and analysis protocols and in harnessing the dual relationship between datasets and protocols. Specifically, ShareFAIR will provide:

(i) standards to uniformly represent datasets, ontologies/common vocabularies to annotate datasets and protocols/workflows, and provenance to trace the origin of datasets,

(ii) an interoperable framework for the design, annotation and reuse of reliable and shareable protocols,

(iii) approaches to extract protocols from textual data to enrich the set of protocols and workflows and better document the provenance of datasets, and approaches to learn protocols from biomedical and clinical datasets.

Complete proposal here


All publications are on :


Project coordinator : Sarah Cohen-Boulakia