Slide 1

Slide 1 text

Advance Calmly: Serverless Computing and Cloud Programming JOE HELLERSTEIN, UC BERKELEY 11/18/19

Slide 2

Slide 2 text

Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing The CALM Theorem

Slide 3

Slide 3 text

3 Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984 Personal Computers PDP-11, 1970 Minicomputers Sea Changes in Computing

Slide 4

Slide 4 text

4 New Platform + New Language = Innovation Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984 Personal Computers PDP-11, 1970 Minicomputers

Slide 5

Slide 5 text

5 Suppose I offered you a computer that was actually…

Slide 6

Slide 6 text

6 Some SWAGs at numbers AWS: ~10 million servers 60 Availability Zones1 1-8 Datacenters per AZ2 50K-80K servers per DC2 ½ of all storage bytes shipped are now to “hyperscalers”, the likes of Amazon3 I’m Not Joking 1https://aws.amazon.com/about-aws/global-infrastructure/ 2https://www.forbes.com/sites/johnsonpierr/2017/06/15/with-the-public-clouds-of-amazon-microsoft-and-google-big-data-is-the-proverbial-big-deal/ 3http://chansblog.com/impact-from-public-cloud-on-the-storage-industry-an-insight-from-snia-at-sfd12/

Slide 7

Slide 7 text

7 How will folks program the cloud? In a way that fosters unexpected innovation Distributed programming is hard! • Parallelism, consistency, partial failure, … Autoscaling makes it harder! The Big Query

Slide 8

Slide 8 text

8 We’ve been talking about this for a while!

Slide 9

Slide 9 text

9 Industry finally woke up: Serverless Computing Industry Response: Serverless Computing 9

Slide 10

Slide 10 text

Serverless 101: Functions-as-a-Service (FaaS) Enable developers (outside of Amazon, etc.) to program the cloud Access to 1000s of cores, PBs of RAM Fine-grained resource usage and efficiency Enables new economic, pricing models, etc.

Slide 11

Slide 11 text

Serverless 101: On Event, call function.

Slide 12

Slide 12 text

Serverless 101: Autoscaling and Disaggregation Storage Compute Transactional Causal Consistency for Serverless Computing

Slide 13

Slide 13 text

13 Three Limitations of Current FaaS (Lambda) I/O Bottlenecks 10-100x higher latency than SSD disks, charges for each I/O. 15-min lifetimes Functions routinely fail, can’t assume any session context No Inbound Network Communication Instead, “communicate” through global services on every call Wang, Liang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. "Peeking behind the curtains of serverless platforms." USENIX ATC, 2018. Hellerstein JM, Faleiro J, Gonzalez JE, Schleier- Smith J, Sreekanti V, Tumanov A, Wu C. “Serverless computing: One step forward, two steps back.” CIDR, 2019.

Slide 14

Slide 14 text

14 Autoscaling Massive Data Processing Unbounded Distributed Computing 1 STEP FORWARD 2 STEPS BACK ❌ Serverless & the Three Promises of the Cloud

Slide 15

Slide 15 text

Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem

Slide 16

Slide 16 text

16 Embracing State Program State: Local data that is managed across invocations Challenge 1: Data Gravity Expensive to move state around. This policy problem is not so hard. Challenge 2: Distributed Consistency This correctness problem is challenging and unavoidable!

Slide 17

Slide 17 text

Embracing State: LDPC Compute Storage Logical Disaggregation, Physical Colocation

Slide 18

Slide 18 text

18 The Challenge: Consistency Ensure that distant agents agree (or will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? x = ❤

Slide 19

Slide 19 text

19 The Challenge: Consistency Ensure that distant agents agree (or will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? If they disagree now, what could happen later? x = ❤ x =

Slide 20

Slide 20 text

20 Classical Consistency Mechanisms: Coordination Consensus Protocols: Get a (fixed) set of machines to choose the same thing. (Paxos, etc.) Get a (fixed) set of machines to choose the same sequence of things. (Multipaxos, etc.) Commit Protocols: Get a (fixed) set of machines to agree on the outcome of a unanimous vote. (Two-Phase Commit)

Slide 21

Slide 21 text

21 Coordination Avoidance (a poem) the first principle of successful scalability is to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them —James Hamilton (IBM, MS, Amazon) in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009 ” “

Slide 22

Slide 22 text

22 Why Avoid Coordination? Waiting for control is bad Tail latency of a quorum of machines can be very high (straggler effects) Waiting leads to queueing It’s not just “your” problem!

Slide 23

Slide 23 text

23 Towards a Solution Traditional distributed systems is all about I/O What if we reason about application semantics? With thanks to Peter Bailis…

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem

Slide 26

Slide 26 text

26 Conjecture (CALM): A distributed program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. Hellerstein, 2010 CALM: CONSISTENCY AS LOGICAL MONOTONICITY

Slide 27

Slide 27 text

27 Theorem (CALM): A distributed program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. Ameloot, Neven, Van den Bussche, 2013 CALM: CONSISTENCY AS LOGICAL MONOTONICITY

Slide 28

Slide 28 text

28 We’ll need some formal definitions

Slide 29

Slide 29 text

29 Consistency: Confluent Distributed Execution Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication.

Slide 30

Slide 30 text

30 Consistency: Confluent Distributed Execution Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. E.g. replica consistency: Two replicas of a consistent program P will agree on outcomes once they receive the same messages. x = ❤ x = ❤

Slide 31

Slide 31 text

31 Consistency: Confluent Distributed Execution Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. x = ❤ x = ❤

Slide 32

Slide 32 text

32 Monotonicity Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).

Slide 33

Slide 33 text

33 Monotonicity Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).

Slide 34

Slide 34 text

34 Coordination: Data-Independent Messaging Definition: A distributed program P is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T). Definition: A distributed program P(T) uses coordination if it requires messages to be sent under all possible input partitionings of T.

Slide 35

Slide 35 text

35 Distributed Deadlock: Once you observe the existence of a waits-for cycle, you can (autonomously) declare deadlock. More information will not change the result. Garbage Collection: Suspecting garbage (the non-existence of a path from root) is not enough; more information may change the result. Hence you are required to check all nodes for information (under any assignment of objects to nodes!) Two Canonical Examples Deadlock! Garbage?

Slide 36

Slide 36 text

36 CALM (Consistency As Logical Monotonicity) Theorem (CALM): A distributed program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. Hellerstein JM. The declarative imperative: Experiences and conjectures in distributed logic. ACM SIGMOD Record, Sep 2010. Ameloot TJ, Neven F, Van den Bussche J. Relational transducers for declarative networking. JACM, Apr 2013. Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. ACM TODS, Feb 2016. Hellerstein JM, Alvaro P. Keeping CALM: When Distributed Consistency is Easy. To appear, CACM 2020.

Slide 37

Slide 37 text

37 Related Work

Slide 38

Slide 38 text

38 Related Work Koutris & Suciu. “Parallel Evaluation of Conjunctive Queries.” PODS 2011 Beame, Koutris & Suciu. “Communication Steps for Parallel Query Processing.” PODS 213 Beame, Koutris & Suciu. “Skew in Parallel Query Processing”. PODS 2014 Koutris, Beame & Suciu. “Worst-Case Optimal Algorithms for Parallel Query Processing”. ICDT 2016

Slide 39

Slide 39 text

39 Logic and Lattice Composition Bloom: a “disorderly” programming language based on logic and lattices Monotone composition of “lego blocks” lattices into bigger programs. Allows non-monotonic expressions but they require “extra” syntax. Syntactic CALM analysis Alvaro P, Conway N, Hellerstein JM, Marczak WR. Consistency Analysis in Bloom: a CALM and Collected Approach. CIDR 2011. Conway N, Marczak WR, Alvaro P, Hellerstein JM, Maier D. Logic and lattices for distributed programming. ACM SoCC, 2012.

Slide 40

Slide 40 text

Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free

Slide 41

Slide 41 text

Hydro Hydro: A Platform for Programming the Cloud Storage & overlay network built with lattice composition Anna: autoscaling mul---er KVS ICDE18, VLDB19 HydroLogic: a disorderly IR A universal disorderly algebra for cloud computing. Hydrolysis: a cloud compiler toolkit Containerized FaaS with consistent caching & fault tolerance Cloudburst: Stateful FaaS Currently under submission f(x) HydroCache: Coordina-on-free consistency and fault tolerance ? Logic Programming Functional Reactive Actors Futures Polyglot programming models for developers.

Slide 42

Slide 42 text

42 Anna Serverless KVS • Anyscale • CALM consistency • Autoscaling & multitier Chenggang Wu, Vikram Sreekanti, JM Hellerstein. “Autoscaling Tiered Cloud Storage in Anna.” PVLDB 2019 Chenggang Wu, Jose M. Faleiro, Yihan Lin, JM Hellerstein. “Anna: A KVS For Any Scale”. TKDE 2019.

Slide 43

Slide 43 text

43 Anyscale Shared-nothing at all scales (even across threads) Crazy fast under contention Up to 700x faster than Masstree within a multicore machine Up to 10x faster than Cassandra in a geo-distributed deployment Coordination-free consistency. No atomics, no locks, no waiting ever! 700x!

Slide 44

Slide 44 text

44 CALM Consistency Simple, clean lattice composition gives range of consistency levels Lines of C++ code modified by system component KEEP CALM AND WRITE(X)

Slide 45

Slide 45 text

45 Autoscaling & Multi-Tier Cost Tradeoffs 350x the performance of DynamoDB for the same price!

Slide 46

Slide 46 text

Hydrocache and LDPC Transactional Causal Consistency for Serverless Computing Compute Storage Logical Disaggregation, Physical Colocation

Slide 47

Slide 47 text

Cloudburst/Hydrocache: A Stateful Serverless Platform Main Challenge: Cache consistency! Hydrocache: new consistency protocols for distributed client “sessions” Compute Storage

Slide 48

Slide 48 text

48 One Consistency Level: Multisite TCC Multisite Transactional Causal Consistency (MTCC) Causal: Provide a consistent time based on Lamport’s happened before relation Multisite transactional: an arbitrary functional expression, running across multiple machines. Lamport’s partial ordering Step 1: Causal Cut in a cache “Bolt-On” Causal Consistency: single client, non-transactional Step 2: Causal Cut across caches and keys! Hydrocache MTCC 48 Bailis, P, Ghodsi, A, Hellerstein JM and Stoica I. "Bolt-on causal consistency." SIGMOD 213

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

52 There are more cases to handle both “fixup” and “orchestrate from scratch” a whole other story for Read Atomic isolation! 52

Slide 53

Slide 53 text

CloudBurst/Hydrocache: A Stateful Serverless Platform CompeCCve performance for a predicCon serving pipeline. 200 400 600 800 1000 1200 1400 Python Droplet AWS Lambda (Mock) AWS SageMaker AWS Lambda (Actual) Latency (ms) 182.5 210.2 325.7 355.8 1181 191.5 277.4 411.3 416.6 1364 Performant consistency on a real-world web app. 1 10 100 1000 Droplet (LWW) Droplet (Causal) Redis Droplet (LWW) Droplet (Causal) Redis Reads Writes Latency (ms) 16.1 18.0 15.0 397 501 810 31.9 79 27.9 503 801 921.3 1 10 100 1000 10000 D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) Size: 80KB Size: 800KB Size: 8MB Size: 80MB Latency (ms) 2.8 5.6 32.7 346 4.7 21.1 100 1065 3.2 9.3 38.3 385 6.7 66.9 112 1630 6.4 59.8 253 506 17.2 279 392 2034 81.6 732 2646 1963 238 2743 5209 4250 End-to-end latency for a task with large data inputs.

Slide 54

Slide 54 text

• Anna: CALM KVS w/o semantics • LDPC: disaggregate & colocate • Coordination-free consistent cache Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free

Slide 55

Slide 55 text

55 A Tale of Two Philosophies Correctness, optimizability, analysis Familiarity, agility, reuse

Slide 56

Slide 56 text

56

Slide 57

Slide 57 text

Hydro: https://github.com/hydro-project Bloom: http://bloom-lang.net RiseLab: https://rise.cs.berkeley.edu [email protected] @joe_hellerstein 5 More Information Chenggang Wu Jose M. Faleiro Vikram Sreekanti Alexey Tumanov Joseph Gonzalez Johann Schleier-Smith Charles Lin

Slide 58

Slide 58 text

58 Backup Slides