The Software Heritage Graph Dataset: Public software development under one roof

From AcaWiki
Jump to: navigation, search


Citation: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli The Software Heritage Graph Dataset: Public software development under one roof.


Wikidata: Q67020258

Download: https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf

Tagged:


Summary:

Describe the Software Heritage Graph Dataset which deduplicates content from public software forges (eg git revisions) and releases (eg from operating system distribution and language specific package managers) and is available as csv (for Posgresql import), parqet (for Spark and similar), and for online query on Amazon Athena. Provides example SQL queries to demonstrate basic (e.g, most common filename is index.html) and potential software research queries.

Theoretical and practical relevance:

Paper review https://blog.sourced.tech/post/msr-paper-review-the-software-heritage-graph-dataset/