Sciunits: Reusable Research Objects
Citation: Dai Hai Ton That, Gabriel Fils, Zhihao Yuan, Tanu Malik (2017/11/16) Sciunits: Reusable Research Objects. e-Science (RSS)
DOI (original publisher): 10.1109/eScience.2017.51
arXiv (preprint): arXiv:abs/1707.05731
Semantic Scholar (metadata): 10.1109/eScience.2017.51
Sci-Hub (fulltext): 10.1109/eScience.2017.51
Internet Archive Scholar (search for fulltext): Sciunits: Reusable Research Objects
Tagged: computational science (RSS)
- Research objects := collections of digital artifacts (e.g. code, data, scripts, and temporary experiment result).
- Sciunit := a research object collected automatically by application-virtualization
- Application virtualization := use strace to collect spawned processes and file opens.
- Can modify and rerun the container manually after capture.
- This constructs a graph over processes, their spawn, their file inputs, and file outputs
- Naively, this is too fine-grained and generates too many dependencies.
- Deduplicate by checking the rolling hash of each file against existing files. This handles insertion/deletion.
- Naively, this is too many things to visualize, so the authors develop a way of contracting the graph.
- Running in container has an overhead (0 -- 40%, depending on the application).
- It takes a while to deduplicate a new stream for storage (~60s), but reconstructing is fast (<5s).