Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects

From AcaWiki

Jump to: navigation, search

Citation: Boyana Norris (2021/07) Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects. IDEAS-ECP Webinar (RSS)
Internet Archive Scholar (search for fulltext): Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects
Download: http://ideas-productivity.org/wordpress/wp-content/uploads/2021/07/hpcbp054-miningdevdata.pdf
Tagged: Computer Science (RSS) computational science (RSS), high-performance computing (RSS)

See also

Data mining

Data sources

Git metadata: commits, forks, branches, developers
Issues and associated discussions
Pull requests (github, gitlab) and associated discussions
Mailing list archives

Aggregates

Bug-fix rate
Feature-request rate
Number of issues
Issue categories
For each issue, number of followers and watchers
Number of contributors
Code complexity
Proportion of commits by most active developer
Churn (LoC, cosine distance, commits, PRs, versions, files)
Group developers into sub-teams
Timestamp of commits

Example queries

Identify domain champions (many changes over a small number of files)
Identify areas and people with high churn
Identify when someone is at risk of burning out
Impact of change estimates
How do projects weather interesting times?
Where is development effort going?
Does mood affect productivity?

Program analysis

Examples

Security: Buffer overruns, improperly validated input.
Memory safety: Null dereference, uninitialized data.
Resource leaks: Memory, OS resources.
API Protocols: improper use of APIs, incomplete/incorrect implementations
Exceptions: Arithmetic/library/user-defined
Encapsulation: Accessing internal data, calling private functions.
Data races: Two threads access the same data without synchronization

Tools

clang-tidy
clang-analyze
scan-check (wraps clang-analyze)
flang (fortran to LLVM)
fortran-linter

Workflow

clang-format
clang-tidy
clang-analyze
xSDK specific analysis
compilation
tests x {valgrind, ASan, MSan, TSan, UBSan}
- I think valgrind is overkill if you already have ASan and MSan
code coverage

Goals

Integrate static and dynamic program analysis into dev process
Make it easy to follow for others

Retrieved from "https://acawiki.org/index.php?title=Mining_Development_Data_to_Understand_and_Improve_Software_Engineering_Processes_in_HPC_Projects&oldid=12141"

Summary