Naiad: a timely dataflow system

From AcaWiki
Jump to: navigation, search

Citation: Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, Martín Abadi (2013/11/03) Naiad: a timely dataflow system. Symposium on Operating System Principles (RSS)
DOI (original publisher): 10.1145/2517349.2522738
Semantic Scholar (metadata): 10.1145/2517349.2522738
Sci-Hub (fulltext): 10.1145/2517349.2522738
Internet Archive Scholar (search for fulltext): Naiad: a timely dataflow system
Download: https://doi.org/10.1145/2517349.2522738
Tagged: Computer Science (RSS) robotics (RSS), scheduilng (RSS)



Introduction

Many data processing tasks require low-latency, iterative subcomputations (cycle in DFG), consistent intermediate outputs for distributed, in-memory, streaming, data processing. Naiad is the first to have all three.

  • Stream processors have low-latency for acyclic programs
  • Batch systems have high throughput on cyclic programs the expense of latency
  • Trigger-based approaches support cyclic programs with only weak consistency guarantees

Timely Dataflow

Timely dataflow :=

  • External input vertices, external output vertices
  • Stateful computational vertices
    • Each vertex implement callbacks: OnRecv(e: Edge, m: Message, t: Timestamp) and OnNotify(t: Timestamp)
    • Gets two functions from the runtime: SendBy(e: Edge, m: Message, t: Timestamp) and NotifyAt(t: Timestamp)
    • Both of these issue or receive events. All events have a pointstamp := timestamp and location in graph.
    • The runtime keeps track of all unprocessed events by pointstamp. When there are no "preceding pointstamps," that event is on the frontier, and it is legal to execute its OnNotify.
  • The graph can have loops, but each loop must have a single ingress node, single egress node, and at least one feedback node (on the backedge).
  • Messages can travel along edges.
  • All messages have logical timestamps := the value of every cycle counter.
    • Aside: the whole program is run in an infinite loop, streaming over newly seen input data. If you view the whole program as a "cycle," then the program's cycle counter is called the epoch counter, which is included in the timestamp.
    • Ingress nodes append a counter for their loop, egress nodes remove a counter, and feedback nodes increment the last counter.

Timely dataflow supports:

  • dataflow cycles (aka iteration)
  • stateful nodes without global coordination
  • push-based activation of nodes when inputs are ready

Distributed Implementation

  • Logical graph (stages and typed connectors) -> physical graph (Timely dataflow vertices and edges)
    • Each connector can be a 1-to-1 map or a groupby.
  • Each worker is in charge of a subset of vertices.
    • The scheduling algorithm can be distributed, with local frontiers as a subset of global frontiers.
    • Project timestamps to local timestamps consisting of counters contained in that particular worker.
    • Accumulate updates to the scheduling stuff locally before broadcasting them globally.
  • Fault tolerance: Each vertex also implements Checkpoint() and Restore()
  • In a data-parallel operation, the result will be limited the latency of the slowest worker, the straggler.
    • When tasks are very fine-grained, even small latencies such as network can cause so-called micro-stragglers.
    • In traditional batch processing, one can run duplicate nodes and take the fastest, but since Naiad nodes are stateful and unsynchronized that won't work. Instead:
      • Tweaking the network stack to be more deterministic
      • Decrease scheduling quantum
      • Decrease garbage-collection frequency

Writing programs in Naiad

  • LINQ-like and MapReduce-like interface can be implemented in Naiad.

Performance evaluation

  • Authors have physical access to a cluster
  • Average of 5 runs.
  • Linear throughput
  • Latency is still heavily impacted by micro-stragglers
  • Slightly less than perfect weak-scaling

Real-world Applications

  • Batched iterative graph-processing
  • Batched iterative machine learning
  • Streaming acyclic computation
  • Streaming iterative graph analytics

See Also