3D-Stacked Memory Architectures for Multi-core Processors
From AcaWiki
Citation: Gabriel H. Loh (2008/06) 3D-Stacked Memory Architectures for Multi-core Processors. International Symposium on Computer Architecture (RSS)
DOI (original publisher): 10.1145/1394608.1382159
Semantic Scholar (metadata): 10.1145/1394608.1382159
Sci-Hub (fulltext): 10.1145/1394608.1382159
Internet Archive Scholar (search for fulltext): 3D-Stacked Memory Architectures for Multi-core Processors
Download: https://dl.acm.org/doi/abs/10.1145/1394608.1382159
Tagged: Computer Science
(RSS) computer architecture (RSS)
Elsewhere
Problem
- CPU performance has scaled faster than memory bandwidth, so many applications are memory-bound (see Memory Wall).
- Increasing ranks helps parallelism but hurts the total chip count, socket count, and bus length
- Increasing banks helps parallelism but hurts area (requires additional decoders, sense amplifiers, column muxes, and row buffers)
- Increasing bus width helps bandwidth but hurts pin count and area requirements.
- Increasing clock speed helps speed but is limited by capacitive PCB traces.
Solution
- 3D-stacking memory on top of the processing elements
- Prior studies consider 3D stacking in traditional architectures.
- Permits low-latency, high-bandwidth, high-density vertical interconnects
- This lets one increase the ranks, number of memory controllers, and row buffers.
- Permits low-latency, high-bandwidth, high-density vertical interconnects
- Can embed memory closer to processing
- Non-traditional 3D memory architectures can do even better.
- Change how banking works to simplify L2 design
- Change how MSHRs work (use novel datastructure: vector bloom filter)
- Prior studies consider 3D stacking in traditional architectures.
Evaluation
- 2D: Baseline, memory is off-chip.
- 3D (traditional 3D): Assume FSB and memory controller run at the same rate as the processor core, but memory access times are unchanged. This configuration alone provides ~35% increase in performance.
- 3D-wide: 3D with increased width. This gets an additional 35% improvement.
- 3D-fast: 3D-wide with 9 layers. Less benefit for applications with moderate misses.