The promises and perils of mining git

From AcaWiki
Jump to: navigation, search

Citation: Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, Premkumar Devanbu (2009) The promises and perils of mining git. 2009 6th IEEE International Working Conference on Mining Software Repositories (RSS)
DOI (original publisher): 10.1109/MSR.2009.5069475
Semantic Scholar (metadata): 10.1109/MSR.2009.5069475
Sci-Hub (fulltext): 10.1109/MSR.2009.5069475
Internet Archive Scholar (search for fulltext): The promises and perils of mining git
Wikidata (metadata): Q57726828
Download: http://turingmachine.org/~dmg/papers/dmg2009 msr git.pdf
Tagged: Git (RSS)

Summary

Contrasts decentralized and centralized source code management (DSCM and CCSM), focusing on Git and Subversion respectively, and the "promises and perils" of using information found in Git repositories for software engineering researchers. Git is more flexible and faster, making some research more feasible, but care must be taken. Thus, authors name following.

Promises

  1. Each Git developer is more likely to make their repository publicly accessibe, including work in progress and work that never is accepted into the stable codebase
  2. Git facilitates recovery of richer project history
  3. Git records information needed to correct Perils 3-6 in private logs
  4. signed-off-by and other attributes create a "paper trail"
  5. Git records authorship information for contributors who are not part of the core set of developers
  6. All metadata is local
  7. Git tracks content, so the history of lines can be tracked as they are moved or copied
  8. Git is faster and uses less space
  9. Most SCMs can be converted to Git with history intact

Perils

  1. Similar actions can have different commands, shared terms can have different meanings
  2. Many branches are implicit
  3. Git has no mainline, so analysis methods for CCSM must be modified appropriately
  4. Git history can be rewritten relatively easily
  5. It is not always possible to determine what branch a commit was made on
  6. It is not always possible to determine the source of a merge or if a merge occurred
  7. Accessible data may only contain commits that are success selected

Authors evaluate some of these by mining data from the Linux kernel and 30 other OSS projects.