Reducing Technical Debt with Reproducible Containers

From AcaWiki
Jump to: navigation, search

Citation: Tanu Malik (2020/11/04) Reducing Technical Debt with Reproducible Containers. IDEAS-ECP Webinar (RSS)
Internet Archive Scholar (search for fulltext): Reducing Technical Debt with Reproducible Containers
Tagged: Computer Science (RSS) computational science (RSS)


  • Technical debt := short-term gain at the cost of increased long-term maintenance effort.
  • Difficulties include:
    • Reproducibility is usually an afterthought.
    • Identifying all relevant I/O.
    • no mapping from artifacts to the paper.
  • "Containers do not reduce technical-debt"
    • Still has incompletely specified dependencies, still non-deterministic.
  • Sciunit can reduce technical-debt by putting experiments in a auditable, modifyable package
    • Use strace to capture inputs and outputs automatically.
    • Container either include the data or exclude the data; including is expensive, but excluding is not reproducible.
      • Why not partial include?
  • MiDAS: Minimizing DAtaSets
    • Applications only access a subset of datasets.
    • Identify only relevant chunks
    • Use partial evaluation to prune codebase