Equipe BD
Equipe BD
Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

You are here

SQL query rewriting for data exploration

Sabina SURDU
Tuesday, March 22, 2016 - 13:00 to 14:00
INSA de Lyon, bât. Blaise Pascal, Salle du LIRIS 501.301

In a large number of domains, ranging from astrophysics to earth observation, data analysts are facing a data deluge. In this Big Data era, it is essential to explore the data in order to unearth new knowledge. As user profiles are becoming more and more diverse and data ever more complex, this task has become increasingly hard. Analysts can access gigantic scientific data through SQL, while also using data mining tools to peer their data. We propose a query rewriting technique to help data analysts formulate their queries, in order to rapidly and intuitively explore their Big Data. We describe different ways to differentiate between the positive and the negative examples corresponding to a query, i.e., those results desired by analysts and those that are not wanted, respectively. We construct a learning dataset from the positive and the negative example sets. We reformulate the initial query using machine learning techniques, and obtain a new query, more "efficient" and diverse. We propose metrics to evaluate the quality of the rewriting.