Developers Perception of Peer Code Review in Research Software Development
From AcaWiki
Citation: Nasir U. Eisty, Jeffrey C. Carver (2021/09/22) Developers Perception of Peer Code Review in Research Software Development. arXiv (RSS)
arXiv (preprint): arXiv:2109.10971
Internet Archive Scholar (search for fulltext): Developers Perception of Peer Code Review in Research Software Development
Download: https://arxiv.org/abs/2109.10971
Tagged: Computer Science
(RSS) software engineering (RSS)
Background
- Software is important for research.
- Research software engineers should follow standard software practices.
- However, these practices differ from industry.
- Risks due to exploration.
- Constantly changing requirements.
- Complex communication or I/O patterns.
- Need highly-specialized knowledge
- Larger scale of single executions
- Complex software due to modeling complex phenomena
- Different goals, knowledge, skills than industry devs
- Tests are hard because no oracle, large number of parameters, and legacy code.
Solution
- On the other hand, peer code review could work
- Numerous benefits
- Reviewers suggest comments that improve code quality
- Authors more likely to make code readable
- Spreads out knowledge of the change
- Community building
- Google developers expect four key themes from peer code review: education, maintaining norms, gatekeeping, and accident prevention.
- Microsoft dev spends 15--25% of time reviewing.
- 25% of comments on improve core functionality
- Numerous benefits
Study
- RQ1: How do research software developers perform peer code review?
- RQ2: What effect does peer code review have on research software?
- RQ3: What difficulties do research software developers face with peer code review?
- RQ4: What improvements to the peer code review process do research software developers need?
Methodology
Survey Design
- Questions from prior literature on peer code review.
Pilot Interviews
- Pilot interviews suggested ways of revising the questions, and develop multiple-choice answers.
- Interview audience: 13 from NCSA, 9 from Einstein Toolkit Workshop.
- "Convenience sampling"
Survey
- Solicitation:
- Emailed mailing listsprojects associated with interviewers
- 2017 International Workshop on Software Engineering for High Performance Computing in Computational and Data-Enabled Science and Engineering,
- Emailed mailing list for UK RSEs,
- Pinged RSE Slack channel,
- Advertised in Better Scientific Software newsletter,
- All excluding the pilot interviewees.
Data Analysis
- Valid response := answer all quantitative and at least one qualitative
- What motivates requiring one qualitative answer?
- Coded these to qualitative answers individually, and then merged codes, resolving differences case-by-case.
Results
- Most respondents are financially compensated for their participation, have been on both sides of code review, and more than five years of experience.
RQ1: Code review details
- Most respondents spend less than 5 hours per review, half spend 1 to 5 hours.
- Most requests get a response within 3 days, 40% within 1 day.
- Most commits goes through review.
- Most of reviews are resolved within a month, half within a week.
- Number of LoC and number of reviewers varies widely.
- Common criteria when deciding reviews: coding standards, domain knowledge are roughly tied followed by functionality, correctness, time, tests, documentation, always-accept
- Common mistakes corrected during review: code mistakes, design, style, testing, documentation, performance, readability, maintainability.
Positive experiences about code review
- Knowledge sharing, improved code quality, helpful feedback, positive feeling, problems identified
- "In a big project it is rare that anyone understands the whole picture... It [code review] can lead to more complete understanding of the task."
- "It [code review] leads to design discussions happening that would not have happened otherwise."
- "It makes the team more knowledgeable about what work is."
- "People found mistakes in code that I wrote, that I would have missed and only found out about much further on the validation process."
- Code review results in "much better code and a better understanding of different parts of the code."
Negative experiences about code review
- Takes too long, requestors misunderstand criticism, disagreements, bottleneck, hard to find reviewers, difficult task, unresponsive author
- "it [peer code review] can be long and time consuming for very small changes, as the process must be followed for even a single character change if it affects results."
- There are also problems when the "review process gets stalled while nit-picking irrelevant details."
- "Sometimes people get annoyed when they get feedback especially if they think they are experts"
RQ2: Impact of code review on research software
- By a large margin, respondents strongly agreed code review is important for their project.
- This could be due to selection bias.
- Impacts: improves code quality followed by knowledge sharing
- Why does code review improve code quality: correctness followed by a tie between improves readability, more eyes, and better maintainability.
- On correctness: "If you’ve written code yourself, it’s hard to see the assumptions you’ve made. Others can spot these and ask you to clarify, also spot your mistakes"
- On readability: "make[ing] the codebase more uniform and improves the quality of the code"
RQ3: Difficulties research software developers face with code review
- Difficulties: understanding code, understanding system, administrative issues
- Barriers: Finding time followed distantly by phrasing comments, finding the right people, participation, developer egos, takes too long
RQ4: What improvements do research software developers need?
- Formalizing process, followed by tooling, more people, better incentives, more training, more time
- Formalizing process: "a more formal structure of at least one science review followed by one technical review. It’s currently a bit of a free-for-all"
- Tooling: branching VCS and automatic analysis
Threats to validity
- Participants might not know what certain terms mean, but authors think they do.
- Human-perception can be wrong, but there is no better source of truth.
- Perhaps the sample is not representative of the population.
- Those wliling to answer a survey on code review are more likely to be aware of it.
- Participants may have misunderstood questions, but authors tried to be clear.
Conclusion
- Similar results to commercial software engineering, despite differences in research context.
- Code review largely beneficial, but could benefit from explicit process.
- Authors plan to raise awareness of code review, its flaws, and its benefits, within community.