Equipe BD
Equipe BD
Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

You are here

Benchmarking SQL-On-MapReduce systems using big astronomy databases

Qui: 
Amin Mesmoudi
Quand: 
Tuesday, May 6, 2014 - 13:00 to 14:00
Où: 
Nautibus, Salle C5 (rdC)

With the amount of data produced in several application domains, it is increasingly difficult to manage and query related large data repositories. Within the PetaSky project (http://com.isima.fr/Petasky), we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project (http://www.lsst.org/). The overall expected size of the database that will be produced will exceed 60 PB. In order to evaluate the performances of existing SQL-On-MapReduce data management systems, we defined a new benchmark using queries from the area of corpuscular physics and cosmology. The goal of this work is to report on the ability of these systems to support large scale declarative queries. We mainly investigate the impact of data partitioning, indexing and compression on query execution performances.