Equipe BD
Equipe BD
Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

You are here

Algebraic Amplification for Semi-Supervised Learning from Sparse Data

Qui: 
Wolfgang GATTERBAUER
Quand: 
Thursday, March 11, 2021 - 11:00 to 12:00
Où: 
visio

Node classification is an important problem in graph data management. It is commonly solved by various label propagation methods that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on knowing the compatibility matrix that must be provided by either domain experts or heuristics. Can we instead directly estimate the correct compatibilities from a sparsely labeled graph in a principled and scalable way? We answer this question affirmatively and suggest a method called distant compatibility estimation that works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph sketches. We refer to algebraic amplification as the more general idea of leveraging algebraic properties of an algorithm’s update equations to amplify sparse signals. We show that our estimator is by orders of magnitude faster than an alternative approach and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap preprocessing step for any existing label propagation method and removes the current dependence on heuristics.

RELATED WORK - VLDB 2015: Linearized and single-pass belief propagation http://www.vldb.org/pvldb/vol8/p581-gatterbauer.pdf https://gatterbauer.name/download/vldb2015_LinBP_presentation_narrated.pptx https://www.youtube.com/watch?v=DPSW8SF6gPc&list=PLEWtRs08n5UUykWVmXZnWt...

BIO Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. Prior to joining Northeastern, he was a postdoctoral fellow in the database group at the University of Washington and an Assistant Professor in the Tepper School of Business at Carnegie Mellon University. One major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. Examples of such functionalities are managing trust, provenance, explanations, and uncertain & inconsistent data. He is a recipient of the NSF Career award and “best-of-conference” mentions from VLDB 2015, SIGMOD 2017, and WALCOLM 2017. In earlier times, he won a Bronze medal at the International Physics Olympiad, worked in the steam turbine development department of ABB Alstom Power, and in the German office of McKinsey & Company. https://db.khoury.northeastern.edu/