ANR Project DAG

This project explores the cross-fertilization between artificial intelligence, combinatorial algorithmic and databases to bring original solutions to fundamental data mining problems. We focus on an important enumeration problem in data mining, called interesting patterns enumeration in large volume of data, referred to as iPeP in the sequel. For instance, the problems of enumerating inclusion dependencies in relational databases or frequent sub-trees in XML semi-structured data fit in this category. These examples point out that a pattern can be complex, and that data may be huge and highly heterogeneous. In this setting, we aim at defining high level declarative languages (logical or algebraic) for expressing interesting pattern enumeration problems. Then, we want to characterize tractable sub-classes of interesting patterns enumeration problems for which algorithms could be devised leading to efficient implementations in practice. To do so, our ambition is to bridge the gap between constraint programming, constraint-based data mining and algorithmic on discrete structures and to have a cross-fertilization between these sub-branches of computer sciences. From a theoretical point of view, properties that patterns should verify in a given language have to be identified and characterized (e.g. monotonicity, maximality). We will have to define new problem classes for which generic solutions may exist. The key points to study are enumeration algorithms on discrete structures together with their complexity, the expressive power of languages used for defining patterns, the predicates defining interestingness criteria in data, and problem transformations. As an example, we plan to explore transformations from pattern set to Boolean lattice in order to exploit good algorithmic properties of this kind of structure. Open-access data sources will be used to assess the feasibility of the propositions made in this project. As far as we know, only one recent project at the European level has the same scientific positioning [75]. Therefore, our project is an opportunity to play a major role at the international level in this new trend of research trying to apply techniques from constraint programming to constraint based data mining.

Job annoucements

Funded by the National Research Agency (ANR), DEFIS 2009 program, ANR-09-EMER-003-01