Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hydroflow: A Compiler Target for Fast, Correct ...

Hydroflow: A Compiler Target for Fast, Correct Distributed Programs

Traditional compilers offer little assistance in ensuring the correctness of distributed programs. The

Joe Hellerstein

November 14, 2024
Tweet

More Decks by Joe Hellerstein

Other Decks in Programming

Transcript

  1. Joe Hellerstein Prof. Alvin Cheung Prof. Natacha Crooks Conor Power

    Shadaj Laddad Prof.* Mae Milano Mingwei Samuel David Chu Dr. Tiemo Bang Lucky Katahanas Chris Douglas Chris Douglas
  2. 3 Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984

    Personal Computers PDP-11, 1970 Minicomputers Sea Changes in Computing
  3. 4 New Platform + New Language = Innovation Cray-1, 1976

    Supercomputers iPhone, 2007 Smart Phones PDP-11, 1970 Minicomputers Macintosh, 1984 Personal Computers
  4. 5 ? How will folks program the cloud? In a

    way that fosters unexpected innovation Distributed programming is hard! • Parallelism, consistency, partial failure, … Autoscaling makes it harder! Today’s compilers don’t address distributed concerns The Big Question Programming the Cloud: A Grand Challenge for Computing
  5. Declarative Programming for the Cloud Relational databases were invented to

    hide how data is laid out and how queries are executed. The cloud was invented to hide how computing resources are laid out and how computations are executed.
  6. 9 Prior Systems Work: Anna Key-Value Store KVS: Petri dish

    of distributed systems! Algorithmically trivial Focus on consistency guarantees, performance Wide range of deployment options ICDE 18 VLDB 19
  7. 10 Anna KVS Performance + Consistency 700x! Hand-written in C++

    for a Ph.D. dissertation. Implementation correct by assertion. Fast, especially under contention Up to 700x faster than Masstree and Intel TBB on multicore Up to 10x faster than Cassandra in a geo-deployment 350x the performance of DynamoDB for the same price Consistency guarantees Pluggable! Causal, read-committed, … Bloom-style compositions of trivial semi-lattices (“CRDTs”)
  8. 13 A New Language Stack LLVM for the Cloud Compiler

    Optimizations for Elastic Distributed Computing Compiler Guarantees for Distributed Correctness Checks Goals Fast: low latency and high throughput Flexible: polyglot, easy to adopt, easy to extend Cost-effective: good use of cloud resources CIDR 19
  9. HYDRO Stack Many languages for programmers Declarative ”Global” IR Single-core

    IR Adaptive deployment … Cloud Services … FaaS Storage ML Frameworks Actors (e.g. Orleans) Functional (e.g. Spark) Logic (e.g. Bloom) Futures (e.g. Ray) New DSLs HYDROLOGIC (global) HYDRAULIC Verified Lifting HYDROLYSIS Compiler HYDRODEPLOY HYDROFLOW (local) Sequential Code HYDRO
  10. 17 Joe Hellerstein Prof. Alvin Cheung Prof. Natacha Crooks Conor

    Power Shadaj Laddad Prof.* Mae Milano Mingwei Samuel David Chu Dr. Tiemo Bang Lucky Katahanas Chris Douglas Chris Douglas Hydroflow An IR and Compiler
  11. 18 Body of the Talk: Four Chapters Hydroflow IR: Shared-Nothing

    dataflow transducer per core Compilation and Performance LatticeFlows for distributed consistency Cross-Node Optimization
  12. Hydroflow IR: A Graph Specification Language A human-programmable IR In

    the spirit of LLVM, Halide, etc. Still: a compiler target. Each transducer to be run on a single core A dataflow graph specification language https://hydro.run/docs/hydroflow/ my_flow = op1() -> op2(); my_flow op1 op2
  13. source demux dest Put Get join HT Simple KVS Dataflow

    // Demux network inputs network_recv = source_stream_serde(inbound) -> demux_enum::<KvsMessageWithAddr>();
  14. source demux dest Put Get join HT Simple KVS Dataflow

    // Demux network inputs network_recv = source_stream_serde(inbound) -> demux_enum::<KvsMessageWithAddr>(); puts = network_recv[Put]; gets = network_recv[Get]; puts -> [0]lookup; gets -> [1]lookup;
  15. source demux dest Put Get join HT Simple KVS Dataflow

    // Demux network inputs network_recv = source_stream_serde(inbound) -> demux_enum::<KvsMessageWithAddr>(); puts = network_recv[Put]; gets = network_recv[Get]; puts -> [0]lookup; gets -> [1]lookup; // Join PUTs and GETs by key, persisting the PUTs. lookup = join();
  16. Simple KVS Dataflow // Demux network inputs network_recv = source_stream_serde(inbound)

    -> demux_enum::<KvsMessageWithAddr>(); puts = network_recv[Put]; gets = network_recv[Get]; puts -> [0]lookup; gets -> [1]lookup; // Join PUTs and GETs by key, persisting the PUTs. lookup = join::<'static, 'tick>(); // Send GET responses back to the client. lookup -> dest_sink_serde(outbound); source demux dest Put Get join HT
  17. source demux dest Put Get join HT // Demux network

    inputs network_recv = source_stream_serde(inbound) -> _upcast(Some(Delta)) -> map(Result::unwrap) -> map(|(msg, addr)| KvsMessageWithAddr::from_message(msg, addr)) -> demux_enum::<KvsMessageWithAddr>(); puts = network_recv[Put]; gets = network_recv[Get]; // Join PUTs and GETs by key, persisting the PUTs. puts -> map(|(key, value, _addr)| (key, value)) -> [0]lookup; gets -> [1]lookup; lookup = join::<'static, 'tick>(); // Send GET responses back to the client address. lookup -> map(|(key, (value, client_addr))| (KvsResponse { key, value }, client_addr)) -> dest_sink_serde(outbound);
  18. // Demux network inputs network_recv = source_stream_serde(inbound) -> _upcast(Some(Delta)) ->

    map(Result::unwrap) -> map(|(msg, addr)| KvsMessageWithAddr::from_message(msg, addr)) -> demux_enum::<KvsMessageWithAddr>(); puts = network_recv[Put]; gets = network_recv[Get]; // Join PUTs and GETs by key, persisting the PUTs. puts -> map(|(key, value, _addr)| (key, value)) -> [0]lookup; gets -> [1]lookup; lookup = join::<'static, 'tick>(); // Send GET responses back to the client address. lookup -> map(|(key, (value, client_addr))| (KvsResponse { key, value }, client_addr)) -> dest_sink_serde(outbound); (n1v1) source_stream_serde (n2v1) _upcast (n3v1) map (n4v1) map (n5v1) demux_enum (n6v1) map Put (n7v1) join Get 1 0 (n8v1) map (n9v1) dest_sink_serde
  19. 34 (n1v1) union (n2v1) dest_sink_serde (n3v1) source_stream_serde (n4v1) _upcast (n5v1)

    map (n6v1) map (n7v1) demux_enum (n8v1) for_each Server Response (n9v1) map Peer Join (n11v1) union Put Peer Gossip (n16v1) join Get 1 (n10v1) tee (n20v1) cross_join 1 (n23v1) union (n12v1) tee (n13v1) map (n25v1) cross_join 0 (n14v1) persist (n15v1) tee 0 0 (n17v1) map 1 (n18v1) source_iter_delta (n19v1) map (n21v1) map (n22v1) source_iter_delta (n24v1) persist 1 (n26v1) filter (n27v1) map // Join as a peer if peer_server is set. source_iter_delta(peer_server) -> map(|peer_addr| (KvsMessage::PeerJoin, peer_addr)) -> network_send; // Peers: When a new peer joins, send them all data. writes_store -> [0]peer_join; peers -> [1]peer_join; peer_join = cross_join() -> map(|((key, value), peer_addr)| (KvsMessage::PeerGossip { key, value }, peer_addr)) -> network_send; // Outbound gossip. Send updates to peers. peers -> peer_store; source_iter_delta(peer_server) -> peer_store; peer_store = union() -> persist(); writes -> [0]outbound_gossip; peer_store -> [1]outbound_gossip; outbound_gossip = cross_join() // Don't send gossip back to same sender. -> filter(|((_key, _value, writer_addr), peer_addr)| writer_addr != peer_addr) -> map(|((key, value, _writer_addr), peer_addr)| (KvsMessage::PeerGossip { key, value }, peer_addr)) -> network_send; Remaining Code Entire Dataflow
  20. 35 More on the IR Integration with Rust via staged

    programming (Hydroflow+) Cyclic graphs & recursion as in Datalog Stratification of blocking operators (negation/∀, fold, etc.) A single clock per transducer, as in Dedalus See The Hydro Book: https://hydro.run/docs/hydroflow/
  21. 38 pull push (n1v1) source_iter (n7v1) union base (n2v1) source_stream

    (n4v1) join 1 (n3v1) map 0 (n5v1) flat_map (n6v1) tee (n8v1) uniqu print (n9v1) for_ea pull push (n1v1) source_iter (n7v1) union base (n2v1) source_stream 1 (n3v1) map 0 (n6v1) tee (n8v1) unique print (n9v1) for_each In-Out Tree Partitioning / Inlining Hydroflow is a Rust crate that generates Rust code from flow graphs Partition a flow graph into “in-out trees” Leads rustc to perform aggressive inlining pull push (n1v1) source_iter (n7v1) union base (n2v1) source_stream (n4v1) join 1 (n3v1) map 0 (n5v1) flat_map (n6v1) tee (n8v1) unique print (n9v1) for_each Graph Reachability
  22. 39 In-Out Tree Partitioning / Inlining pull push (n1v1) source_iter

    (n7v1) union base (n2v1) source_stream (n4v1) join 1 (n3v1) map 0 (n5v1) flat_map (n6v1) tee (n8v1) unique print (n9v1) for_each pull push (n1v1) source_iter (n7v1) union base source_stream ) join 1 (n3v1) map 0 (n6v1) tee (n8v1) unique print (n9v1) for_each pivot One inlined function per in-out tree!
  23. 40 In-Out Tree Partitioning / Inlining pull push (n1v1) source_iter

    (n7v1) union base (n2v1) source_stream (n4v1) join 1 (n3v1) map 0 (n5v1) flat_map (n6v1) tee (n8v1) unique print (n9v1) for_each pull push (n1v1) source_iter (n7v1) union base source_stream ) join 1 (n3v1) map 0 (n6v1) tee (n8v1) unique print (n9v1) for_each pivot handoff
  24. 41 Anna KVS Performance + Consistency 700x! Reminder! Fast, especially

    under contention Up to 700x faster than Masstree and Intel TBB on multicore Up to 10x faster than Cassandra in a geo-deployment 350x the performance of DynamoDB for the same price Consistency guarantees Causal, read-committed, … Bloom-style compositions of trivial semi-lattices (“CRDTs”)
  25. Anna KVS. Hydro 2023 GCP n2-standard-64 instances (64 vCPU, 256GB

    RAM) Fast? ✅ Original Anna KVS. C++ 2018 Amazon m4.16xlarge instances (64 vCPU, 256GB RAM,)
  26. 46 Challenge: Replica Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? x = ❤
  27. 47 Challenge: Replica Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? If they disagree now, what could happen later? Split Brain divergence! We want to generalize to program outcomes! x = ❤ x = 💩
  28. 48 Classical Solution: Coordination Global total order of operations via

    atomics, critical sections, distributed protocols like Paxos and 2-phase commit, etc. Expensive at every scale When can we avoid?
  29. 49 Big Queries: When? Why? When do I need Coordination?

    Why? No really: Why? When is Coordination required?
  30. 51 Hellerstein JM. The Declarative Imperative: Experiences and conjectures in

    distributed logic. ACM PODS Keynote, June 2010 ACM SIGMOD Record, Sep 2010. Ameloot TJ, Neven F, Van den Bussche J. Relational transducers for declarative networking. JACM, Apr 2013. Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. ACM TODS, Feb 2016. CALM: CONSISTENCY AS LOGICAL MONOTONICITY Theorem (CALM): A distributed program has a consistent, coordination-free distributed implementation if and only if it is monotonic. Hellerstein, JM, Alvaro, P. Keeping CALM: When Distributed Consistency is Easy. CACM, Sept, 2020.
  31. 53 Semi-Lattices: CALM Algebra Semi-Lattice: <S, +> Associative: x +

    (y + z) = (x + y) + z Commutative: x + y = y + x Idempotent: x + x = x Every semi-lattice corresponds to a partial order: x <= y ⇔ x + y = y Partial orders are compatible with many total orders! CALM connection: monotonicity in the lattice’s partial order
  32. 56 When is it Consistent to Read Without Coordination? Or

    … can apply monotonic functions to get to top of smaller lattices
  33. 60 Hydroflow’s Approach Coordination protocols in standard library Paxos implementation

    described below (WIP) Object types: lattices crate with composable lattice types. Example: Causal Consistency Clean and amenable to synthesis! Monotone & non-monotone f’ns maps, folds, etc. Dataflow types & properties OOPSLA 22
  34. 61 Dataflow Types & Properties (WIP) Type: (Seq<T>, ‘a) Props:

    {RandomBatch, Dups} “Happenstance” total order ‘a Type: Seq<Lattice<L>> Props: {RandomBatch, Dups} Latticeflow of points from L to be monotonically accumulated Type: Lattice<L> Props: {NonDeterministic} Type: Seq<Lattice<L>> Props: {RandomBatch, Dups} (n1v1) source_stream_serde (n2v1) _upcast (n3v1) map (n4v1) fold (n5v1) demux_enum Cumulative Lattice point L RandomBatch input to NonMonotonic op produces NonDeterministic output!
  35. 62 Dataflow Types & Properties (WIP) Type: (Seq<T>, ‘a) Props:

    None “Happenstance” total order ‘a Type: Seq<Lattice<L>> Props: None Latticeflow of points from L to be monotonically accumulated Type: T’ Props: None Type: Seq<Lattice<L>> Props: None (n1v1) source_iter (n2v1) _upcast (n3v1) map (n4v1) fold (n5v1) demux_enum Cumulative Lattice point L
  36. 63 (n1v1) source_stream_serde (n2v1) _upcast (n3v1) map (n4v1) lattice_merge (n5v1)

    demux_enum Dataflow Types & Properties (WIP) Type: (Seq<T>, ‘a) Props: {RandomBatch, Dups} “Happenstance” total order ‘a Type: Seq<Lattice<L>> Props: {RandomBatch, Dups} Latticeflow of points from L to be monotonically accumulated Type: Lattice<L> Props: None Type: Seq<Lattice<L>> Props: {RandomBatch, Dups} Cumulative Lattice point L
  37. 64 Eventually Consistent At a glance! (n1v1) union (n2v1) dest_sink_serde

    (n3v1) source_stream_serde (n4v1) _upcast (n5v1) map (n6v1) map (n7v1) demux_enum (n8v1) for_each Server Response (n9v1) map PeerJoin (n11v1) union Put Peer Gossip (n16v1) join Get 1 (n10v1) tee (n20v1) cross_join 1 (n23v1) union (n12v1) tee (n13v1) map (n25v1) cross_join 0 (n14v1) persist (n15v1) tee 0 0 (n17v1) map 1 (n18v1) source_iter_delta (n19v1) map (n21v1) map (n22v1) source_iter_delta (n24v1) persist 1 (n26v1) filter (n27v1) map
  38. … Cloud Services … FaaS Storage ML Frameworks Actors (e.g.

    Orleans) Functional (e.g. Spark) Logic (e.g. Bloom) Futures (e.g. Ray) New DSLs HYDROLOGIC (global) HYDRAULIC Verified Lifting HYDROLYSIS Compiler HYDRODEPLOY HYDROFLOW (local) Sequential Code HYDRO HYDRO Stack Dedalus
  39. 67 Hydrologic can be analyzed for many properties! Monotonicity analysis

    from the type system Functional dependencies from DB theory to co-partition state Data provenance helps us understand how results get generated No surprise that a data-centric approach helps! Theme In Distributed Systems, the hard part is the data (state + messages) Dedalus
  40. Challenges in Optimizing Protocols Compiler cannot understand high level protocol

    semantics Many global invariants implicit in distributed programs How can we prove that our optimizations are always correct?
  41. Keep It Simple, Preserve Equivalence! Two forms of “compartmentalization” The

    really hard part: correctly applying transformations automatically! Hand-written in Scala, correct by assertion
  42. Fast? ✅ Beats SOTA Paxos implementations Rule-based optimization (Hydroflow) Whittaker’s

    Compartmentalized Paxos (Hydroflow) Whittaker’s Compartmentalized Paxos (Scala)
  43. Halfway there! Rules proven correct, provide desired wins Need: Cost

    model for an objective function Search techniques to find optimal rewritings E-graphs meet Query Optimizers Very similar technologies!
  44. 74 What should Hydrologic contain? Hydroflow+ is a functional variant

    of Hydroflow. Maybe Hydrologic starts as single-transducer Hydroflow+? With specs for SLOs Cost, latency, uptime Other “facets”? Consistency? Security? … Cloud Services … FaaS Storage ML Frameworks Actors (e.g. Orleans) Functional (e.g. Spark) Logic (e.g. Bloom) Futures (e.g. Ray) Ne DSL HYDROLOGIC (global) HYDRAULIC Verified Lifting HYDROLYSIS Compiler HYDRODEPLOY HYDROFLOW (local) Sequential Code HYDRO
  45. Deployment, Autoscaling, Fault Tolerance Heterogeneous cloud assignment challenges Dynamics of

    autoscaling Sensing & monitoring Live autoscaling: deployment + recompilation Declarative fault tolerance What does a spec look like? Many mechanisms to choose from! … Cloud Services … FaaS Storage ML Frameworks Actors (e.g. Orleans) Functional (e.g. Spark) Logic (e.g. Bloom) Futures (e.g. Ray) Ne DSL HYDROLOGIC (global) HYDRAULIC Verified Lifting HYDROLYSIS Compiler HYDRODEPLOY HYDROFLOW (local) Sequential Code HYDRO
  46. Handling Multiple Input Languages/Patterns In general LLMs arrived just in

    time! Still need verification. Hypotheses on specific examples Shallow syntactic compilation could work for actors, distributed futures? Early results on sequential code: Katara … Cloud Services … FaaS Storage ML Frameworks Actors (e.g. Orleans) Functional (e.g. Spark) Logic (e.g. Bloom) Futures (e.g. Ray) Ne DSL HYDROLOGIC (global) HYDRAULIC Verified Lifting HYDROLYSIS Compiler HYDRODEPLOY HYDROFLOW (local) Sequential Code HYDRO OOPSLA 22
  47. 80 Language/Theory Work: 2010-15 Formalism: Dedalus CALM Theorem: coordination in

    its place Consistency ó Monotonicity : Logic + Lattices in a functional notation
  48. 82 FAQ #2: Isn’t monotonicity a rare corner case? Actually,

    most code is mostly monotonic Even coordination protocols are mostly monotonic! Michael Whittaker’s work on Compartmentalized Paxos Next challenge: automatically move non- monotonic (sequential/coordinated) code off the critical path and autoscale everything else. VLDB 2021
  49. 84 Easy and Hard Questions Is anyone over 18? Who

    is the youngest? ∃x∀y (x < y) ∃x x > 18
  50. 85 Easy and Hard Questions Is anyone over 18? Who

    is the person nobody is younger than? ∃x x > 18 ∃x¬∃y (y < x)
  51. 86 What is Time for? “Time is what keeps everything

    from happening at once.” Ray Cummings, The Girl in the Golden Atom, 1922
  52. 88 Coordination Avoidance (a poem) the first principle of successful

    scalability is to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them —James Hamilton (IBM, MS, Amazon) in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009 ” “
  53. 89 Queries are easy to analyze for distributed properties! (non-)Monotonicity

    is clear in syntax (NOT EXISTS, EXCEPT, NOT IN, …)
  54. 90 Queries are easy to analyze for distributed properties! (non-)Monotonicity

    is clear in syntax (NOT EXISTS, EXCEPT, NOT IN, …) Explicit data relationships help us understand how to co-partition state Keys and foreign keys More generally, Dependency Theory UserID MessageID Channel ID 27 “What’s for dinner” 101 11 “Burritos please!” 101 CID Owner ID 101 11 14 “Monotonicity is cool!” 102 102 14 27 “Excelente!” 101 Channel ID → CID
  55. 91 Queries are easy to analyze for distributed properties! (non-)Monotonicity

    is clear in syntax (NOT EXISTS, EXCEPT, NOT IN, …) Explicit data relationships help us understand how to co-partition state Keys and foreign keys More generally, Dependency Theory UserID MessageID Channel ID 27 “What’s for dinner” 101 11 “Burritos please!” 101 CID Owner ID 101 11 14 “Monotonicity is cool!” 102 102 14 27 “Excelente!” 101 Channel ID → CID
  56. 92 Queries are easy to analyze for distributed properties! (non-)Monotonicity

    is clear in syntax (NOT EXISTS, EXCEPT, NOT IN, …) Explicit data relationships help us understand how to co-partition state Keys and foreign keys More generally, Dependency Theory UserID MessageID Channel ID 27 “What’s for dinner” 101 11 “Burritos please!” 101 CID Owner ID 101 11 14 “Monotonicity is cool!” 102 102 14 27 “Excelente!” 101 UserID MessageID Channel ID CID Owner ID Channel ID → CID
  57. 93 Queries are easy to analyze for distributed properties! (non-)Monotonicity

    is clear in syntax (NOT EXISTS, EXCEPT, NOT IN, …) Explicit data relationships help us understand how to co-partition state Data provenance helps us understand how results get generated And why they’re generated … or even why not!
  58. 94 Distributed Deadlock: Once you observe the existence of a

    waits-for cycle, you can (autonomously) declare deadlock. More information will not change the result. Garbage Collection: Suspecting garbage (the non-existence of a path from root) is not enough; more information may change the result. Hence you are required to check all nodes for information (under any assignment of objects to nodes!) Two Canonical Examples Deadloc k! Garbag e?
  59. Extensions to Other Lattices Set (Merge = Union) Increasing Int

    (Merge = Max) Boolean (Merge = Or) {a} {b} {c} {a,b} {b,c} {a,c} {a,b,c} 5 5 7 7 3 7 false false false true true true SELECT key FROM input HAVING COUNT(*) > 10 Can use monotone functions to map to other lattices! coun t(S) x > 6
  60. 96 Generational Shift to Reasoning at the App Level 21st

    Century Immutable State Monotonicity Analysis Functional Dependencies Data Provenance … app-specific assumptions Tired: Reasoning about memory access Wired: Reasoning about App Semantics 20th Century Read/Write Access/Store Linearizability Serializability … worst-case assumptions
  61. 97 CRDTs: OOP Semi-lattices as objects: CRDTs + is the

    only method “mathematically sound rules to guarantee state convergence” — Shapiro, et al. guarantees eventual consistency of state in the end times But what do folks do with them in the mean time? “multiple active copies present accurate views of the shared datasets at low latencies” — TechBeacon blog 2022 Hmmm…. VLDB 23
  62. 98 CRDTs: Oops! Amazon Shopping Carts, a.k.a. the 2-phase set

    A composite semilattice: (SetUnion, SetUnion) Adds Deletes
  63. 99 CRDTs: Oops! Amazon Shopping Carts, a.k.a. the 2-phase set

    A composite semilattice: (SetUnion, SetUnion) What everybody wants to do: “Read” the contents, i.e. compute Adds — Deletes Not part of the object API. Adds Deletes Non-monotonic and hence inconsistent, non-deterministic, etc.