A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering)
From AcaWiki
Citation: T. Marić, JP. Lehr, I. Pappagianidis, B. Lambie, D. Bothe, C. Bischof (2021/04/07) A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering). IDEAS Productivity Project Webinar (RSS)
Internet Archive Scholar (search for fulltext): A Workflow for Increasing the Quality of Scientific Software (in Computational Science and Engineering)
Download: http://ideas-productivity.org/wordpress/wp-content/uploads/2021/04/webinar051-workflow4scisoft.pdf
Tagged: Computer Science
(RSS) computational science (RSS)
Summary
- Causing problems
- Publish or perish
- Lack of resources
- PhD students rotate every 4-5 years, postdocs every 1-2 years.
- Large-scale software engineering not taught
- Resulting problems
- Not able to continue development from an earlier state
- Not reproducible (not versioned, not archived)
- Not reusable (not modular)
- Difficult to estimate impact of changes
- Process recommendations
- Use Kanban with progress tracking cards
- Use branching version control
- Use test-driven development
- Enable continuous integration with emphasis on results visualization
- Record versions at milestones
- Publish code and data openly
- Software engineering design
- Branching model
- Keep branching simple: main, development, and feature-branches
- Only maintainers should merge into main and development branch
- TDD for CSE
- Define verification and validation tests
- Focus on published result (Top-down instead of bottom-up)
- Don't go overboard with unittests; write-as-you-go to debug failing integration tests
- Data organization
- Don't use filename for metadata, if you can help it
- HDF5 and Exdir
- CI
- Balance test completeness with duration
- Use GitLab runner, can pass artifacts between jobs
- Use Out-of-source installation, driven by env vars
- Cross-linking with other data
- Singularity is more intuitive than Docker.
- Lessons
- Keep workflow simple
- Focusing on secondary data simplifies workflow
- Cross-linking data is beneficial
- Define roles within research teams