The Software Heritage Graph Dataset: Public software development under one roof

From AcaWiki
Jump to: navigation, search

Citation: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli The Software Heritage Graph Dataset: Public software development under one roof.
Internet Archive Scholar (search for fulltext): The Software Heritage Graph Dataset: Public software development under one roof
Wikidata (metadata): Q67020258
Download: https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf
Tagged:

Summary

Describe the Software Heritage Graph Dataset which deduplicates content from public software forges (eg git revisions) and releases (eg from operating system distribution and language specific package managers) and is available as csv (for Posgresql import), parqet (for Spark and similar), and for online query on Amazon Athena. Provides example SQL queries to demonstrate basic (e.g, most common filename is index.html) and potential software research queries.

Theoretical and Practical Relevance

Paper review https://blog.sourced.tech/post/msr-paper-review-the-software-heritage-graph-dataset/