3D-Stacked Memory Architectures for Multi-core Processors

Citation: Gabriel H. Loh (2008/06) 3D-Stacked Memory Architectures for Multi-core Processors. International Symposium on Computer Architecture (RSS)
DOI (original publisher): 10.1145/1394608.1382159
Semantic Scholar (metadata): 10.1145/1394608.1382159
Sci-Hub (fulltext): 10.1145/1394608.1382159
Internet Archive Scholar (search for fulltext): 3D-Stacked Memory Architectures for Multi-core Processors
Download: https://dl.acm.org/doi/abs/10.1145/1394608.1382159
Tagged: Computer Science (RSS) computer architecture (RSS)

Elsewhere

CPU performance has scaled faster than memory bandwidth, so many applications are memory-bound (see Memory Wall).
- Increasing ranks helps parallelism but hurts the total chip count, socket count, and bus length
- Increasing banks helps parallelism but hurts area (requires additional decoders, sense amplifiers, column muxes, and row buffers)
- Increasing bus width helps bandwidth but hurts pin count and area requirements.
- Increasing clock speed helps speed but is limited by capacitive PCB traces.

2D: Baseline, memory is off-chip.
3D (traditional 3D): Assume FSB and memory controller run at the same rate as the processor core, but memory access times are unchanged. This configuration alone provides ~35% increase in performance.
3D-wide: 3D with increased width. This gets an additional 35% improvement.
3D-fast: 3D-wide with 9 layers. Less benefit for applications with moderate misses.