3D-Stacked Memory Architectures for Multi-core Processors

From AcaWiki
Jump to: navigation, search

Citation: Gabriel H. Loh (2008/06) 3D-Stacked Memory Architectures for Multi-core Processors. International Symposium on Computer Architecture (RSS)
DOI (original publisher): 10.1145/1394608.1382159
Semantic Scholar (metadata): 10.1145/1394608.1382159
Sci-Hub (fulltext): 10.1145/1394608.1382159
Internet Archive Scholar (search for fulltext): 3D-Stacked Memory Architectures for Multi-core Processors
Download: https://dl.acm.org/doi/abs/10.1145/1394608.1382159
Tagged: Computer Science (RSS) computer architecture (RSS)



Elsewhere

Problem

  • CPU performance has scaled faster than memory bandwidth, so many applications are memory-bound (see Memory Wall).
    • Increasing ranks helps parallelism but hurts the total chip count, socket count, and bus length
    • Increasing banks helps parallelism but hurts area (requires additional decoders, sense amplifiers, column muxes, and row buffers)
    • Increasing bus width helps bandwidth but hurts pin count and area requirements.
    • Increasing clock speed helps speed but is limited by capacitive PCB traces.

Solution

  • 3D-stacking memory on top of the processing elements
    • Prior studies consider 3D stacking in traditional architectures.
      • Permits low-latency, high-bandwidth, high-density vertical interconnects
        • This lets one increase the ranks, number of memory controllers, and row buffers.
    • Can embed memory closer to processing
    • Non-traditional 3D memory architectures can do even better.
      • Change how banking works to simplify L2 design
      • Change how MSHRs work (use novel datastructure: vector bloom filter)

Evaluation

  • 2D: Baseline, memory is off-chip.
  • 3D (traditional 3D): Assume FSB and memory controller run at the same rate as the processor core, but memory access times are unchanged. This configuration alone provides ~35% increase in performance.
  • 3D-wide: 3D with increased width. This gets an additional 35% improvement.
  • 3D-fast: 3D-wide with 9 layers. Less benefit for applications with moderate misses.