Citation: Daniel Lenoski, James Pierce Laudon, Kourosh Gharachorloo, Anoop Gupta, John L. Hennessy (1990/05) The directory-based cache coherence protocol for the DASH multiprocessor. ACM SIGARCH Computer Architecture News (RSS)
DOI (original publisher): 10.1145/325096.325132
Semantic Scholar (metadata): 10.1145/325096.325132
Sci-Hub (fulltext): 10.1145/325096.325132
Internet Archive Scholar (search for fulltext): The directory-based cache coherence protocol for the DASH multiprocessor
Download: https://dl.acm.org/doi/abs/10.1145/325096.325132
Tagged: Computer Science (RSS) Computer Architecture (RSS)

Summary

Uniprocessors can only handle small problem sizes, so the future lies in multi-processor systems. However, these have multiple separate memories. Data has to be communicated between them. DASH uses a directory-based coherence protocol with point-to-point messages.

Theoretical and Practical Relevance

Directory-based cache coherence is essential to modern high-performance computers (supercomputers), although we have evolved to more complex directory structures. The larger Stanford DASH project would be hugely influential in academia and industry, proving the viability of a cache-coherent NUMA machine and release consistency.

Elsewhere

Problem

Uniprocessors have inherent limitations in the size of problems a single one can process, but they are easy to replicate
They either use shared memory/cache coherence or message passing to communicate.
- Shared memory: User to program (no need for explicit data partitioning esp. in dynamic load distribution) but has problems scaling up.
- Message passing: Scales up better
How do we implement scalable shared memory?

Background

Memory Consistency model: What values can a store return (when there have been multiple stores on different processors)?
- Stronger have fewer permissible values for loads. Easier to program, harder to optimize.
Case study on DASH Multiprocessor:
- processing node := processors and private cache.
- cluster := N processing nodes (shared memory through snooping-based cache coherence), memory, directory, and remote access cache.
- System := M clusters (shared memory through directory-based cache coherence) connected with a high-bandwidth low-latency interconnect.
Cache coherence: Either snoopy or directory-based.
- This system chooses directory, because it scales better.
- Correctness: should guarantee a specified memory consistency model, deadlock free, detect processor failures
- Performance metrics: latency, bandwidth (throughput)

Solution

Release consistency (not novel): accesses can be delayed until a release (aka fence).
- DASH provides a full fence (all accesses have to be completed) and a write fence (all writes have to be completed).
DASH Cache Coherence protocol: invalidation-based, ownership-based protocol between clusters, which connects to a snooping-based, MESI protocol within clusters. This paper is primarily about the inter-cluster protocol.
- States: uncached remote, shared remote, dirty remote. They work like you would expect.
- Directory: bit-vector with one bit per remote cluster.
  - Future work needs to reduce this space for the system to become scalable. Perhaps, group clusters into regions and have a bit per region or use a sparse directory.
- No formal verification, which is mostly a product of the time. Modern model-checkers make this feasible.
- However, there is some possibility of livelock.
No performance/scalability evaluation. This is probably also a product of the times.

The directory-based cache coherence protocol for the DASH multiprocessor

Summary

Theoretical and Practical Relevance

Elsewhere

Problem

Background

Solution

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools