Why Google Stores Billions of Lines of Code in a Single Repository

From AcaWiki
Jump to: navigation, search

Citation: Rachel Potvin, Josh Levenberg (2016) Why Google Stores Billions of Lines of Code in a Single Repository. Communications of the ACM (RSS)
DOI (original publisher): 10.1145/2854146
Semantic Scholar (metadata): 10.1145/2854146
Sci-Hub (fulltext): 10.1145/2854146
Internet Archive Scholar (search for fulltext): Why Google Stores Billions of Lines of Code in a Single Repository
Download: http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
Tagged: google (RSS)

Summary

Describe how most Google software is developed in a single "monorepo" and tooling and practices to support this scale (86TB data, 35m commits, 25k engineers, 15m loc changed weekly) and their benefits and costs.

  • Google-custom monorepo VCS known as "Piper"
  • Most access Piper via Clients in the Cloud (CitC) with a FUSE filesystem that only copies files developer is working on locally
  • Trunk-based development: branches rarely used
  • Workflow: pre-commit review, static analysis ("Tricoder"), and large-scale code change/cleanup ("Rosie")

Advantages (quote):

  • Unified versioning, one source of truth;
  • Extensive code sharing and reuse;
  • Simplified dependency management;
  • Atomic changes;
  • Large-scale refactoring;
  • Collaboration across teams;
  • Flexible team boundaries and code ownership; and
  • Code visibility and clear tree structure providing implicit team namespacing.

Costs and trade-offs (quote):

  • Tooling investments for both development and execution;
  • Codebase complexity, including unnecessary dependencies and difficulties with code discovery; and
  • Effort invested in code health.

Alternatives:

  • Using git would require splitting into thousands of repos and adopting different tools and workflow
  • Work on mercurial to allow it to support huge monorepos

Theoretical and Practical Relevance

Recording of presentation of paper material by one of the Potvin https://www.youtube.com/watch?v=W71BTkUbdqE

Paper discussed at https://news.ycombinator.com/item?id=11991479