Reproducible and User-Controlled Software Environments in HPC with Guix
From AcaWiki
Citation: Ludovic Courtès, Ricardo Wurmus (2015/12/18) Reproducible and User-Controlled Software Environments in HPC with Guix. Lecture Notes in Computer Science (RSS)
DOI (original publisher): 10.1007/978-3-319-27308-2_47
Semantic Scholar (metadata): 10.1007/978-3-319-27308-2_47
Sci-Hub (fulltext): 10.1007/978-3-319-27308-2_47
Internet Archive Scholar (search for fulltext): Reproducible and User-Controlled Software Environments in HPC with Guix
Download: https://link.springer.com/chapter/10.1007/978-3-319-27308-2 47
Tagged: Computer Science
(RSS) reproducibility (RSS), high-performance computing (RSS)
Summary
Functional Package Managers (like Guix and Nix) make it easy to develop reproducible HPC environments.
Theoretical and Practical Relevance
Placeholder
Problem
- Sysadmins want stability, but devs want to improve things.
- This sounds like the site-reliability problem in industry.
Prior attempts
- System package managers (e.g. apt): packages are too old, packages build on publishers machine not client, difficult to write packages, difficult to incorporate multiple channels, imperative/stateful package management.
- Traditional third-party package managers (e.g. EasyBuild, Spack): clobbers the
/usr
, imperative/stateful package management, doesn't capture system configuration, built artifacts are not safely shareable. - Writing down every version: doesn't capture system configuration.
- Snapshot system image (e.g. Docker image, VM image approach): hard to ship, hard to verify the environment, hard to compose.
- Snapshot recipes (e.g. Dockerfile, Vagrantfile): too broad, almost always talks to internet which introduces non-determinism, imperative/stateful.
Their Solution: Functional Package Managers (FPM)
- All packages are pure functions from {files of packages they depend on} to {files produced by the package}.
- This is not just for libraries; even the C compiler is considered a dependency.
- This encodes a DAG of packages.
- Files (both inputs and outputs) are read-only/immutable.
- Can safely share the cache across machines.
- Since every node in the network needs the same packages, this reduces build burden.
- Since they are pure, cache results on disk.
- Each result stores the hash of its inputs, so we know when the cached result can be safely used.
- How to maintain purity while maintaining ease-of-use?
- For purity, FPM runs the package-function in a chroot (filesystem isolation), well-defined environment variables, PID namespace, etc.
- For ease-of-use, FPMs inserts your dependent packages into the chrooted-filesystem,
$PATH
, and other env-vars. - This makes it easy to explicitly depend on packages, but hard to implicitly do so.
- Implementations: Guix (considered in this paper) and Nix.
Use case of FPM
- Guix is deployed at Delbrück Center for Molecular Medicine (MDC), Berlin.
- The package cache is shared among 250 cluster nodes and some user workstations.
- Custom packages are easy
- This claim is made by the authors not the users. I would be curious to know what they think
Downsides of FPM
- Guix Daemon requires privilege
- Guix does not have remote daemon
- At MDC, users have to manage their environment from a specific node. Other nodes can use but not change this environment.
- FPM can't easily specify the kernel/OS-level things.
- Guix's Sandboxing (chroot + ...) isn't perfect; information can still leak, causing non-determinism.
- Some packages try to query and specialize for the specific processor, which makes it impure.
- Guix doesn't have proprietary software.