Equipe BD
Equipe BD
Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

You are here

Towards Efficient Execution of Data Science Pipelines

Qui: 
Javier A. ESPINOSA OVIEDO
Quand: 
Thursday, February 25, 2021 - 12:45
Où: 
visio

The democratization of powerful computing architectures (cloud computing, GPUs, TPUs), together with advances in machine and deep learning methods, are seen as a promise for getting insight from big datasets and powering AI data-centric solutions. In this context, data scientists are responsible for defining complex and repetitive operations called “data science pipelines” intended to extract value (or produce a model) from these datasets. To date, data science pipelines are defined in an artisanal manner by combining platforms, tools and cloud services that were not designed to work together. Besides, these pipelines can involve the full data management life cycle of a data science experiment (e.g., data collection, storage, cleansing and analysis). In this talk, I will describe some research results and perspectives that show how data science pipelines can be seen as services’ coordinations implementing a set of data processing operations, that can be optimized during deployment and execution.