Citation: Roberto Di Cosmo, Stefano Zacchiroli (2017) Software Heritage: Why and How to Preserve Software Source Code. iPRES 2017: 14th International Conference on Digital Preservation (RSS)
Internet Archive Scholar (search for fulltext): Software Heritage: Why and How to Preserve Software Source Code
Download: https://hal.archives-ouvertes.fr/hal-01590958
Tagged:

Summary

Overview and status of the Software Heritage project.

Reviews existing work, claims "software archival in source code form has not been addressed in its own right before."

Source code is at risk: "diaspora" to many platforms and institutional forges, shutdowns/fragility of same, and lack of research instrument to analyze the whole of software; a "very large telescope" of software is needed.

Missing of Software Heritage is to "collect, organize, preserve, and make easily accessible all publicly available source code" using the following principles to achieve this:

transparency and free software
replication
multi-stakeholder and non-profit
no a priori selection (save all the code)
source code first (other projects including some discussed as prior work archive context such as development mailing lists and binaries created from source code)
intrinsic identifiers
provenance of facts
minimalism

Outlines applications in cultural heritage, scientific research, and industrial uses (such as "part numbers" for free software).

Describes technical design, which takes advantage of challenge/opportunity of massive duplication of published source code: "Software Heritage archive is conceptually a single (big) Merkle Direct Acyclic Graph".

This DAG is populated through workflows that include:

listing
loading
scheduling
archiving

Briefly describes initial implementation progress of above workflows and access to the archive.

Software Heritage: Why and How to Preserve Software Source Code

Summary

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools