Naiad: a timely dataflow system
Citation: Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, Martín Abadi (2013/11/03) Naiad: a timely dataflow system. Symposium on Operating System Principles (RSS)
DOI (original publisher): 10.1145/2517349.2522738
Semantic Scholar (metadata): 10.1145/2517349.2522738
Sci-Hub (fulltext): 10.1145/2517349.2522738
Internet Archive Scholar (search for fulltext): Naiad: a timely dataflow system
Download: https://doi.org/10.1145/2517349.2522738
Tagged: Computer Science
(RSS) robotics (RSS), scheduilng (RSS)
Introduction
Many data processing tasks require low-latency, iterative subcomputations (cycle in DFG), consistent intermediate outputs for distributed, in-memory, streaming, data processing. Naiad is the first to have all three.
- Stream processors have low-latency for acyclic programs
- Batch systems have high throughput on cyclic programs the expense of latency
- Trigger-based approaches support cyclic programs with only weak consistency guarantees
Timely Dataflow
Timely dataflow :=
- External input vertices, external output vertices
- Stateful computational vertices
- Each vertex implement callbacks:
OnRecv(e: Edge, m: Message, t: Timestamp)
andOnNotify(t: Timestamp)
- Gets two functions from the runtime:
SendBy(e: Edge, m: Message, t: Timestamp)
andNotifyAt(t: Timestamp)
- Both of these issue or receive events. All events have a pointstamp := timestamp and location in graph.
- The runtime keeps track of all unprocessed events by pointstamp. When there are no "preceding pointstamps," that event is on the frontier, and it is legal to execute its
OnNotify
.
- Each vertex implement callbacks:
- The graph can have loops, but each loop must have a single ingress node, single egress node, and at least one feedback node (on the backedge).
- Messages can travel along edges.
- All messages have logical timestamps := the value of every cycle counter.
- Aside: the whole program is run in an infinite loop, streaming over newly seen input data. If you view the whole program as a "cycle," then the program's cycle counter is called the epoch counter, which is included in the timestamp.
- Ingress nodes append a counter for their loop, egress nodes remove a counter, and feedback nodes increment the last counter.
Timely dataflow supports:
- dataflow cycles (aka iteration)
- stateful nodes without global coordination
- push-based activation of nodes when inputs are ready
Distributed Implementation
- Logical graph (stages and typed connectors) -> physical graph (Timely dataflow vertices and edges)
- Each connector can be a 1-to-1 map or a groupby.
- Each worker is in charge of a subset of vertices.
- The scheduling algorithm can be distributed, with local frontiers as a subset of global frontiers.
- Project timestamps to local timestamps consisting of counters contained in that particular worker.
- Accumulate updates to the scheduling stuff locally before broadcasting them globally.
- Fault tolerance: Each vertex also implements
Checkpoint()
andRestore()
- In a data-parallel operation, the result will be limited the latency of the slowest worker, the straggler.
- When tasks are very fine-grained, even small latencies such as network can cause so-called micro-stragglers.
- In traditional batch processing, one can run duplicate nodes and take the fastest, but since Naiad nodes are stateful and unsynchronized that won't work. Instead:
- Tweaking the network stack to be more deterministic
- Decrease scheduling quantum
- Decrease garbage-collection frequency
Writing programs in Naiad
- LINQ-like and MapReduce-like interface can be implemented in Naiad.
Performance evaluation
- Authors have physical access to a cluster
- Average of 5 runs.
- Linear throughput
- Latency is still heavily impacted by micro-stragglers
- Slightly less than perfect weak-scaling
Real-world Applications
- Batched iterative graph-processing
- Batched iterative machine learning
- Streaming acyclic computation
- Streaming iterative graph analytics