Equipe BD
Equipe BD
Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

You are here

Pattern Matching in Massive Graphs with Precision and Recall Guarantees

Qui: 
Matei RIPEANU
Quand: 
Friday, January 24, 2020 - 12:30 to 13:30
Où: 
Université Lyon1, Dép. Informatique, Bât. Nautibus, salle C2

Pattern matching in graphs, that is, finding subgraphs that match a smaller template graph within a large background graph, is fundamental to graph analysis and serves a rich set of applications. Unfortunately, existing solutions have limited scalability, are difficult to parallelize, and/or support only a limited set of search patterns.

We explore avenues towards a scalable solution for pattern matching. We target practical pattern matching scenarios in large-scale property graphs, able to operate on massive distributed memory machines.

This talk will present a novel algorithmic pipeline that bases pattern matching on constraint checking. The key intuition is that each vertex or edge participating in a match has to meet a set of constraints specified by the search template. The pipeline iterates over these constraints to eliminate all the vertices and edges that do not participate in any match, and reduces the background graph to the complete set of vertices and edges participating in any match. Additional analysis can be performed on this much reduced graph. Furthermore, a vertex-centric formulation for this constraint checking algorithm exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks.

The constraint checking approach supports both exact and a class of edit-distance based approximate matching with 100% precision (i.e., no false positives in the output) and 100% recall (i.e., all vertices and edges participating in matches are included) for arbitrary search templates. We have scaled our implementation to massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, processed on a cluster with up to (1,024 compute nodes), orders of magnitude larger than used in the past for similar problems.

Project link: http://netsyslab.ece.ubc.ca/wiki/index.php/PatternMatching

Bio: Matei Ripeanu is a Full Professor at the University of British Columbia where he works with a fantastic group of students. Read more about his current projects at http://netsyslab.ece.ubc.ca/wiki/index.php/Projects