Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects

From AcaWiki
Jump to: navigation, search

Citation: Boyana Norris (2021/07) Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects. IDEAS-ECP Webinar (RSS)
Internet Archive Scholar (search for fulltext): Mining Development Data to Understand and Improve Software Engineering Processes in HPC Projects
Download: http://ideas-productivity.org/wordpress/wp-content/uploads/2021/07/hpcbp054-miningdevdata.pdf
Tagged: Computer Science (RSS) computational science (RSS), high-performance computing (RSS)


See also

Data mining

Data sources

  • Git metadata: commits, forks, branches, developers
  • Issues and associated discussions
  • Pull requests (github, gitlab) and associated discussions
  • Mailing list archives

Aggregates

  • Bug-fix rate
  • Feature-request rate
  • Number of issues
  • Issue categories
  • For each issue, number of followers and watchers
  • Number of contributors
  • Code complexity
  • Proportion of commits by most active developer
  • Churn (LoC, cosine distance, commits, PRs, versions, files)
  • Group developers into sub-teams
  • Timestamp of commits

Example queries

  • Identify domain champions (many changes over a small number of files)
  • Identify areas and people with high churn
  • Identify when someone is at risk of burning out
  • Impact of change estimates
  • How do projects weather interesting times?
  • Where is development effort going?
  • Does mood affect productivity?

Program analysis

Examples

  • Security: Buffer overruns, improperly validated input.
  • Memory safety: Null dereference, uninitialized data.
  • Resource leaks: Memory, OS resources.
  • API Protocols: improper use of APIs, incomplete/incorrect implementations
  • Exceptions: Arithmetic/library/user-defined
  • Encapsulation: Accessing internal data, calling private functions.
  • Data races: Two threads access the same data without synchronization

Tools

  • clang-tidy
  • clang-analyze
  • scan-check (wraps clang-analyze)
  • flang (fortran to LLVM)
  • fortran-linter

Workflow

  1. clang-format
  2. clang-tidy
  3. clang-analyze
  4. xSDK specific analysis
  5. compilation
  6. tests x {valgrind, ASan, MSan, TSan, UBSan}
    • I think valgrind is overkill if you already have ASan and MSan
  7. code coverage

Goals

  • Integrate static and dynamic program analysis into dev process
  • Make it easy to follow for others