Advance Calmly: Serverless Computing and Cloud Programming

Advance Calmly: Serverless Computing and Cloud Programming

For over 10 years, the cloud has been "the next platform" for computing. Yet we still haven't settled on a programming paradigm to suit it. Serverless computing is finally providing an entry-point for a deeper and more urgent conversation on this topic.

We examine key architectural tropes like disaggregation of storage and compute, revisit critical theoretical foundations like the CALM Theorem, and take a look at new systems being explored in the Hydro project at Berkeley's RISELab to push the state of the art.

213ae5db2beb3fdd8fe162a12bf4324b?s=128

Joe Hellerstein

November 18, 2019
Tweet

Transcript

  1. 3.

    3 Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984

    Personal Computers PDP-11, 1970 Minicomputers Sea Changes in Computing
  2. 4.

    4 New Platform + New Language = Innovation Cray-1, 1976

    Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984 Personal Computers PDP-11, 1970 Minicomputers
  3. 6.

    6 Some SWAGs at numbers AWS: ~10 million servers 60

    Availability Zones1 1-8 Datacenters per AZ2 50K-80K servers per DC2 ½ of all storage bytes shipped are now to “hyperscalers”, the likes of Amazon3 I’m Not Joking 1https://aws.amazon.com/about-aws/global-infrastructure/ 2https://www.forbes.com/sites/johnsonpierr/2017/06/15/with-the-public-clouds-of-amazon-microsoft-and-google-big-data-is-the-proverbial-big-deal/ 3http://chansblog.com/impact-from-public-cloud-on-the-storage-industry-an-insight-from-snia-at-sfd12/
  4. 7.

    7 How will folks program the cloud? In a way

    that fosters unexpected innovation Distributed programming is hard! • Parallelism, consistency, partial failure, … Autoscaling makes it harder! The Big Query
  5. 10.

    Serverless 101: Functions-as-a-Service (FaaS) Enable developers (outside of Amazon, etc.)

    to program the cloud Access to 1000s of cores, PBs of RAM Fine-grained resource usage and efficiency Enables new economic, pricing models, etc.
  6. 13.

    13 Three Limitations of Current FaaS (Lambda) I/O Bottlenecks 10-100x

    higher latency than SSD disks, charges for each I/O. 15-min lifetimes Functions routinely fail, can’t assume any session context No Inbound Network Communication Instead, “communicate” through global services on every call Wang, Liang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. "Peeking behind the curtains of serverless platforms." USENIX ATC, 2018. Hellerstein JM, Faleiro J, Gonzalez JE, Schleier- Smith J, Sreekanti V, Tumanov A, Wu C. “Serverless computing: One step forward, two steps back.” CIDR, 2019.
  7. 14.

    14 Autoscaling Massive Data Processing Unbounded Distributed Computing 1 STEP

    FORWARD 2 STEPS BACK ❌ Serverless & the Three Promises of the Cloud
  8. 15.

    Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing +

    Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem
  9. 16.

    16 Embracing State Program State: Local data that is managed

    across invocations Challenge 1: Data Gravity Expensive to move state around. This policy problem is not so hard. Challenge 2: Distributed Consistency This correctness problem is challenging and unavoidable!
  10. 18.

    18 The Challenge: Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? x = ❤
  11. 19.

    19 The Challenge: Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? If they disagree now, what could happen later? x = ❤ x =
  12. 20.

    20 Classical Consistency Mechanisms: Coordination Consensus Protocols: Get a (fixed)

    set of machines to choose the same thing. (Paxos, etc.) Get a (fixed) set of machines to choose the same sequence of things. (Multipaxos, etc.) Commit Protocols: Get a (fixed) set of machines to agree on the outcome of a unanimous vote. (Two-Phase Commit)
  13. 21.

    21 Coordination Avoidance (a poem) the first principle of successful

    scalability is to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them —James Hamilton (IBM, MS, Amazon) in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009 ” “
  14. 22.

    22 Why Avoid Coordination? Waiting for control is bad Tail

    latency of a quorum of machines can be very high (straggler effects) Waiting leads to queueing It’s not just “your” problem!
  15. 23.

    23 Towards a Solution Traditional distributed systems is all about

    I/O What if we reason about application semantics? With thanks to Peter Bailis…
  16. 24.
  17. 25.

    Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!

    • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem
  18. 26.

    26 Conjecture (CALM): A distributed program P has a consistent,

    coordination-free distributed implementation if and only if it is monotonic. Hellerstein, 2010 CALM: CONSISTENCY AS LOGICAL MONOTONICITY
  19. 27.

    27 Theorem (CALM): A distributed program P has a consistent,

    coordination-free distributed implementation if and only if it is monotonic. Ameloot, Neven, Van den Bussche, 2013 CALM: CONSISTENCY AS LOGICAL MONOTONICITY
  20. 29.

    29 Consistency: Confluent Distributed Execution Definition: A distributed program P

    is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication.
  21. 30.

    30 Consistency: Confluent Distributed Execution Definition: A distributed program P

    is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. E.g. replica consistency: Two replicas of a consistent program P will agree on outcomes once they receive the same messages. x = ❤ x = ❤
  22. 31.

    31 Consistency: Confluent Distributed Execution Definition: A distributed program P

    is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. x = ❤ x = ❤
  23. 32.

    32 Monotonicity Definition: A distributed program P is consistent if

    it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).
  24. 33.

    33 Monotonicity Definition: A distributed program P is consistent if

    it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).
  25. 34.

    34 Coordination: Data-Independent Messaging Definition: A distributed program P is

    consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T). Definition: A distributed program P(T) uses coordination if it requires messages to be sent under all possible input partitionings of T.
  26. 35.

    35 Distributed Deadlock: Once you observe the existence of a

    waits-for cycle, you can (autonomously) declare deadlock. More information will not change the result. Garbage Collection: Suspecting garbage (the non-existence of a path from root) is not enough; more information may change the result. Hence you are required to check all nodes for information (under any assignment of objects to nodes!) Two Canonical Examples Deadlock! Garbage?
  27. 36.

    36 CALM (Consistency As Logical Monotonicity) Theorem (CALM): A distributed

    program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. Hellerstein JM. The declarative imperative: Experiences and conjectures in distributed logic. ACM SIGMOD Record, Sep 2010. Ameloot TJ, Neven F, Van den Bussche J. Relational transducers for declarative networking. JACM, Apr 2013. Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. ACM TODS, Feb 2016. Hellerstein JM, Alvaro P. Keeping CALM: When Distributed Consistency is Easy. To appear, CACM 2020.
  28. 38.

    38 Related Work Koutris & Suciu. “Parallel Evaluation of Conjunctive

    Queries.” PODS 2011 Beame, Koutris & Suciu. “Communication Steps for Parallel Query Processing.” PODS 213 Beame, Koutris & Suciu. “Skew in Parallel Query Processing”. PODS 2014 Koutris, Beame & Suciu. “Worst-Case Optimal Algorithms for Parallel Query Processing”. ICDT 2016
  29. 39.

    39 Logic and Lattice Composition Bloom: a “disorderly” programming language

    based on logic and lattices Monotone composition of “lego blocks” lattices into bigger programs. Allows non-monotonic expressions but they require “extra” syntax. Syntactic CALM analysis Alvaro P, Conway N, Hellerstein JM, Marczak WR. Consistency Analysis in Bloom: a CALM and Collected Approach. CIDR 2011. Conway N, Marczak WR, Alvaro P, Hellerstein JM, Maier D. Logic and lattices for distributed programming. ACM SoCC, 2012.
  30. 40.

    Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!

    • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free
  31. 41.

    Hydro Hydro: A Platform for Programming the Cloud Storage &

    overlay network built with lattice composition Anna: autoscaling mul---er KVS ICDE18, VLDB19 HydroLogic: a disorderly IR A universal disorderly algebra for cloud computing. Hydrolysis: a cloud compiler toolkit Containerized FaaS with consistent caching & fault tolerance Cloudburst: Stateful FaaS Currently under submission f(x) HydroCache: Coordina-on-free consistency and fault tolerance ? Logic Programming Functional Reactive Actors Futures Polyglot programming models for developers.
  32. 42.

    42 Anna Serverless KVS • Anyscale • CALM consistency •

    Autoscaling & multitier Chenggang Wu, Vikram Sreekanti, JM Hellerstein. “Autoscaling Tiered Cloud Storage in Anna.” PVLDB 2019 Chenggang Wu, Jose M. Faleiro, Yihan Lin, JM Hellerstein. “Anna: A KVS For Any Scale”. TKDE 2019.
  33. 43.

    43 Anyscale Shared-nothing at all scales (even across threads) Crazy

    fast under contention Up to 700x faster than Masstree within a multicore machine Up to 10x faster than Cassandra in a geo-distributed deployment Coordination-free consistency. No atomics, no locks, no waiting ever! 700x!
  34. 44.

    44 CALM Consistency Simple, clean lattice composition gives range of

    consistency levels Lines of C++ code modified by system component KEEP CALM AND WRITE(X)
  35. 46.
  36. 47.

    Cloudburst/Hydrocache: A Stateful Serverless Platform Main Challenge: Cache consistency! Hydrocache:

    new consistency protocols for distributed client “sessions” Compute Storage
  37. 48.

    48 One Consistency Level: Multisite TCC Multisite Transactional Causal Consistency

    (MTCC) Causal: Provide a consistent time based on Lamport’s happened before relation Multisite transactional: an arbitrary functional expression, running across multiple machines. Lamport’s partial ordering Step 1: Causal Cut in a cache “Bolt-On” Causal Consistency: single client, non-transactional Step 2: Causal Cut across caches and keys! Hydrocache MTCC 48 Bailis, P, Ghodsi, A, Hellerstein JM and Stoica I. "Bolt-on causal consistency." SIGMOD 213
  38. 49.
  39. 50.
  40. 51.
  41. 52.

    52 There are more cases to handle both “fixup” and

    “orchestrate from scratch” a whole other story for Read Atomic isolation! 52
  42. 53.

    CloudBurst/Hydrocache: A Stateful Serverless Platform CompeCCve performance for a predicCon

    serving pipeline. 200 400 600 800 1000 1200 1400 Python Droplet AWS Lambda (Mock) AWS SageMaker AWS Lambda (Actual) Latency (ms) 182.5 210.2 325.7 355.8 1181 191.5 277.4 411.3 416.6 1364 Performant consistency on a real-world web app. 1 10 100 1000 Droplet (LWW) Droplet (Causal) Redis Droplet (LWW) Droplet (Causal) Redis Reads Writes Latency (ms) 16.1 18.0 15.0 397 501 810 31.9 79 27.9 503 801 921.3 1 10 100 1000 10000 D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) Size: 80KB Size: 800KB Size: 8MB Size: 80MB Latency (ms) 2.8 5.6 32.7 346 4.7 21.1 100 1065 3.2 9.3 38.3 385 6.7 66.9 112 1630 6.4 59.8 253 506 17.2 279 392 2034 81.6 732 2646 1963 238 2743 5209 4250 End-to-end latency for a task with large data inputs.
  43. 54.

    • Anna: CALM KVS w/o semantics • LDPC: disaggregate &

    colocate • Coordination-free consistent cache Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free
  44. 56.

    56

  45. 57.

    Hydro: https://github.com/hydro-project Bloom: http://bloom-lang.net RiseLab: https://rise.cs.berkeley.edu hellerstein@berkeley.edu @joe_hellerstein 5 More

    Information Chenggang Wu Jose M. Faleiro Vikram Sreekanti Alexey Tumanov Joseph Gonzalez Johann Schleier-Smith Charles Lin