Laboratoire d'InfoRmatique en Images et Systèmes d'information
UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon
The Software Heritage project has assembled the largest existing archive of publicly available software source code and associated development history, for more than 6 billion unique source code files and 1 billion unique commits, coming from more than 90 million software development projects.
In this talk we will review the project background, current status, and future directions with a focus on its graph-based data model and its exploitation. The archive is a Merkle DAG whose nodes stand for source code development artifacts such as source files, code trees, commits, releases, and version control system (VCS) snapshots. The graph is typed, fully-deduplicated, and global, allowing to keep track of all the different places (e.g., different VCS repositories) from which a given artifacts have been distributed from. The graph is big, with about 200 billion edges and 20 billion nodes and exponentially growing, doubling every 2 years. The graph network topology and growth dynamics are being studied, but still largely unknown at this stage.
We will discuss the state-of-the-art of operating, analyzing, and querying the Software Heritage graph, and early results in applying graph compression techniques to it to make it more easily manageable. We will conclude with an in-depth discussion of open questions, challenges, and actionable research directions.
Speaker bio:
Stefano Zacchiroli is Associate Professor of Computer Science at Université de Paris on leave at Inria. His research interests span formal methods, software preservation, and Free/Open Source Software engineering. He is co-founder and current CTO of the Software Heritage project. He is an official member of the Debian Project since 2001, where he was elected to serve as Debian Project Leader for 3 terms in a row over the period 2010-2013. He is a former Board Director of the Open Source Initiative (OSI) and recipient of the 2015 O'Reilly Open Source Award.