A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering)

From AcaWiki
Jump to: navigation, search

Citation: T. Marić, JP. Lehr, I. Pappagianidis, B. Lambie, D. Bothe, C. Bischof (2021/04/07) A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering). IDEAS Productivity Project Webinar (RSS)
Internet Archive Scholar (fulltext): A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering)
Download: http://ideas-productivity.org/wordpress/wp-content/uploads/2021/04/webinar051-workflow4scisoft.pdf
Tagged: Computer Science (RSS) computational science (RSS)

Summary (Abstract)

  • Causing problems
    • Publish or perish
    • Lack of resources
    • PhD students rotate every 4-5 years, postdocs every 1-2 years.
    • Large-scale software engineering not taught
  • Resulting problems
    • Not able to continue development from an earlier state
    • Not reproducible (not versioned, not archived)
    • Not reusable (not modular)
    • Difficult to estimate impact of changes
  • Process recommendations
    1. Use Kanban with progress tracking cards
    2. Use branching version control
    3. Use test-driven development
    4. Enable continuous integration with emphasis on results visualization
    5. Record versions at milestones
    6. Publish code and data openly
  • Software engineering design
  • Branching model
    • Keep branching simple: main, development, and feature-branches
    • Only maintainers should merge into main and development branch
  • TDD for CSE
    • Define verification and validation tests
    • Focus on published result (Top-down instead of bottom-up)
    • Don't go overboard with unittests; write-as-you-go to debug failing integration tests
  • Data organization
    • Don't use filename for metadata, if you can help it
    • HDF5 and Exdir
  • CI
    • Balance test completeness with duration
    • Use GitLab runner, can pass artifacts between jobs
    • Use Out-of-source installation, driven by env vars
  • Cross-linking with other data
    • Singularity is more intuitive than Docker.
  • Lessons
    • Keep workflow simple
    • Focusing on secondary data simplifies workflow
    • Cross-linking data is beneficial
    • Define roles within research teams