Query-driven Data Completeness Assessment

Friday, February 17, 2017 - 14:00 to 15:00
Nautibus, salle C2

In many applications including loosely coupled cloud databases, collaborative editing and network monitoring, data from multiple sources is regularly used for query answering. For reasons such as system failures, insufficient author knowledge or network issues, data may be temporarily unavailable or generally nonexistent. Hence, not all data needed for query answering may be available. In this talk, I will give an overview techniques for reasoning about data completeness. I will particularly focus on the work in [1], where we propose a natural class of completeness patterns, expressed by selections on database tables, to specify complete parts of database tables. We then show how to adapt the operators of relational algebra so that they manipulate these completeness patterns to compute completeness patterns pertaining to query answers. Our proposed algebra is computationally sound and complete with respect to the information that the patterns provide. We show that stronger completeness patterns can be obtained by considering not only the schema but also the database instance and we extend the algebra to take into account this additional information. We develop novel techniques to efficiently implement the computation of completeness patterns on query answers and demonstrate their scalability on real data. In the final part, I will touch upon recent topics of completeness and relevance assessment in large knowledge bases.

[1] Identifying the Extent of Completeness of Query Answers over Partially Complete Databases; Simon Razniewski, Flip Korn, Werner Nutt and Divesh Srivastava; SIGMOD 2015

Bio: Simon Razniewski is an Assistant Professor at the Faculty of Computer Science of the Free University of Bozen-Bolzano, Italy. His research interests include Data Quality and Management, Semantic Web, Knowledge Engineering and Machine Learning. He holds a PhD from the Free University of Bozen-Bolzano (2014), and a Diplom (MSc.) from TU Dresden (2010). He spent time as visitor at the Max-Planck Institute for Informatics (2016), the University of Queensland (2015), AT&T Labs-Research (2013), the University of California, San Diego (2012), and has previous industrial experience from Globalfoundries (2010) and Siemens IT (2009).