The Software Heritage Graph Dataset: Public software development under one roof
Citation: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli The Software Heritage Graph Dataset: Public software development under one roof.
Internet Archive Scholar (search for fulltext): The Software Heritage Graph Dataset: Public software development under one roof
Wikidata (metadata): Q67020258
Download: https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf
Tagged:
Summary
Describe the Software Heritage Graph Dataset which deduplicates content from public software forges (eg git revisions) and releases (eg from operating system distribution and language specific package managers) and is available as csv (for Posgresql import), parqet (for Spark and similar), and for online query on Amazon Athena. Provides example SQL queries to demonstrate basic (e.g, most common filename is index.html) and potential software research queries.
Theoretical and Practical Relevance
Paper review https://blog.sourced.tech/post/msr-paper-review-the-software-heritage-graph-dataset/