Advance Calmly: Serverless Computing and Cloud Programming

Advance Calmly: Serverless Computing and Cloud Programming JOE HELLERSTEIN, UC
BERKELEY 11/18/19

Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing The
CALM Theorem

3 Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984
Personal Computers PDP-11, 1970 Minicomputers Sea Changes in Computing

4 New Platform + New Language = Innovation Cray-1, 1976
Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984 Personal Computers PDP-11, 1970 Minicomputers

5 Suppose I offered you a computer that was actually…

6 Some SWAGs at numbers AWS: ~10 million servers 60
Availability Zones1 1-8 Datacenters per AZ2 50K-80K servers per DC2 ½ of all storage bytes shipped are now to “hyperscalers”, the likes of Amazon3 I’m Not Joking 1https://aws.amazon.com/about-aws/global-infrastructure/ 2https://www.forbes.com/sites/johnsonpierr/2017/06/15/with-the-public-clouds-of-amazon-microsoft-and-google-big-data-is-the-proverbial-big-deal/ 3http://chansblog.com/impact-from-public-cloud-on-the-storage-industry-an-insight-from-snia-at-sfd12/

7 How will folks program the cloud? In a way
that fosters unexpected innovation Distributed programming is hard! • Parallelism, consistency, partial failure, … Autoscaling makes it harder! The Big Query

8 We’ve been talking about this for a while!

9 Industry finally woke up: Serverless Computing Industry Response: Serverless
Computing 9

Serverless 101: Functions-as-a-Service (FaaS) Enable developers (outside of Amazon, etc.)
to program the cloud Access to 1000s of cores, PBs of RAM Fine-grained resource usage and efficiency Enables new economic, pricing models, etc.

Serverless 101: On Event, call function.

Serverless 101: Autoscaling and Disaggregation Storage Compute Transactional Causal Consistency
for Serverless Computing

13 Three Limitations of Current FaaS (Lambda) I/O Bottlenecks 10-100x
higher latency than SSD disks, charges for each I/O. 15-min lifetimes Functions routinely fail, can’t assume any session context No Inbound Network Communication Instead, “communicate” through global services on every call Wang, Liang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. "Peeking behind the curtains of serverless platforms." USENIX ATC, 2018. Hellerstein JM, Faleiro J, Gonzalez JE, Schleier- Smith J, Sreekanti V, Tumanov A, Wu C. “Serverless computing: One step forward, two steps back.” CIDR, 2019.

14 Autoscaling Massive Data Processing Unbounded Distributed Computing 1 STEP
FORWARD 2 STEPS BACK ❌ Serverless & the Three Promises of the Cloud

Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing +
Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem

16 Embracing State Program State: Local data that is managed
across invocations Challenge 1: Data Gravity Expensive to move state around. This policy problem is not so hard. Challenge 2: Distributed Consistency This correctness problem is challenging and unavoidable!

Embracing State: LDPC Compute Storage Logical Disaggregation, Physical Colocation

18 The Challenge: Consistency Ensure that distant agents agree (or
will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? x = ❤

19 The Challenge: Consistency Ensure that distant agents agree (or
will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? If they disagree now, what could happen later? x = ❤ x =

20 Classical Consistency Mechanisms: Coordination Consensus Protocols: Get a (fixed)
set of machines to choose the same thing. (Paxos, etc.) Get a (fixed) set of machines to choose the same sequence of things. (Multipaxos, etc.) Commit Protocols: Get a (fixed) set of machines to agree on the outcome of a unanimous vote. (Two-Phase Commit)

21 Coordination Avoidance (a poem) the first principle of successful
scalability is to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them —James Hamilton (IBM, MS, Amazon) in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009 ” “

22 Why Avoid Coordination? Waiting for control is bad Tail
latency of a quorum of machines can be very high (straggler effects) Waiting leads to queueing It’s not just “your” problem!

23 Towards a Solution Traditional distributed systems is all about
I/O What if we reason about application semantics? With thanks to Peter Bailis…

Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!
• Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem

26 Conjecture (CALM): A distributed program P has a consistent,
coordination-free distributed implementation if and only if it is monotonic. Hellerstein, 2010 CALM: CONSISTENCY AS LOGICAL MONOTONICITY

27 Theorem (CALM): A distributed program P has a consistent,
coordination-free distributed implementation if and only if it is monotonic. Ameloot, Neven, Van den Bussche, 2013 CALM: CONSISTENCY AS LOGICAL MONOTONICITY

28 We’ll need some formal deﬁnitions

29 Consistency: Conﬂuent Distributed Execution Definition: A distributed program P
is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication.

30 Consistency: Confluent Distributed Execution Deﬁnition: A distributed program P
is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. E.g. replica consistency: Two replicas of a consistent program P will agree on outcomes once they receive the same messages. x = ❤ x = ❤

31 Consistency: Conﬂuent Distributed Execution Deﬁnition: A distributed program P
is consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. x = ❤ x = ❤

32 Monotonicity Deﬁnition: A distributed program P is consistent if
it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Deﬁnition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).

33 Monotonicity Deﬁnition: A distributed program P is consistent if
it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Deﬁnition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T).

34 Coordination: Data-Independent Messaging Definition: A distributed program P is
consistent if it is a deterministic function from sets to sets, regardless of non-deterministic message ordering and duplication. Definition: A distributed program P is monotonic if for any input sets S, T if S ⊆ T, then P(S) ⊆ P(T). Definition: A distributed program P(T) uses coordination if it requires messages to be sent under all possible input partitionings of T.

35 Distributed Deadlock: Once you observe the existence of a
waits-for cycle, you can (autonomously) declare deadlock. More information will not change the result. Garbage Collection: Suspecting garbage (the non-existence of a path from root) is not enough; more information may change the result. Hence you are required to check all nodes for information (under any assignment of objects to nodes!) Two Canonical Examples Deadlock! Garbage?

36 CALM (Consistency As Logical Monotonicity) Theorem (CALM): A distributed
program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. Hellerstein JM. The declarative imperative: Experiences and conjectures in distributed logic. ACM SIGMOD Record, Sep 2010. Ameloot TJ, Neven F, Van den Bussche J. Relational transducers for declarative networking. JACM, Apr 2013. Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of monotonicity for declarative networking: a more ﬁne-grained answer to the CALM-conjecture. ACM TODS, Feb 2016. Hellerstein JM, Alvaro P. Keeping CALM: When Distributed Consistency is Easy. To appear, CACM 2020.

37 Related Work

38 Related Work Koutris & Suciu. “Parallel Evaluation of Conjunctive
Queries.” PODS 2011 Beame, Koutris & Suciu. “Communication Steps for Parallel Query Processing.” PODS 213 Beame, Koutris & Suciu. “Skew in Parallel Query Processing”. PODS 2014 Koutris, Beame & Suciu. “Worst-Case Optimal Algorithms for Parallel Query Processing”. ICDT 2016

39 Logic and Lattice Composition Bloom: a “disorderly” programming language
based on logic and lattices Monotone composition of “lego blocks” lattices into bigger programs. Allows non-monotonic expressions but they require “extra” syntax. Syntactic CALM analysis Alvaro P, Conway N, Hellerstein JM, Marczak WR. Consistency Analysis in Bloom: a CALM and Collected Approach. CIDR 2011. Conway N, Marczak WR, Alvaro P, Hellerstein JM, Maier D. Logic and lattices for distributed programming. ACM SoCC, 2012.

Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!
• Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free

Hydro Hydro: A Platform for Programming the Cloud Storage &
overlay network built with lattice composition Anna: autoscaling mul---er KVS ICDE18, VLDB19 HydroLogic: a disorderly IR A universal disorderly algebra for cloud computing. Hydrolysis: a cloud compiler toolkit Containerized FaaS with consistent caching & fault tolerance Cloudburst: Stateful FaaS Currently under submission f(x) HydroCache: Coordina-on-free consistency and fault tolerance ? Logic Programming Functional Reactive Actors Futures Polyglot programming models for developers.

42 Anna Serverless KVS • Anyscale • CALM consistency •
Autoscaling & multitier Chenggang Wu, Vikram Sreekanti, JM Hellerstein. “Autoscaling Tiered Cloud Storage in Anna.” PVLDB 2019 Chenggang Wu, Jose M. Faleiro, Yihan Lin, JM Hellerstein. “Anna: A KVS For Any Scale”. TKDE 2019.

43 Anyscale Shared-nothing at all scales (even across threads) Crazy
fast under contention Up to 700x faster than Masstree within a multicore machine Up to 10x faster than Cassandra in a geo-distributed deployment Coordination-free consistency. No atomics, no locks, no waiting ever! 700x!

44 CALM Consistency Simple, clean lattice composition gives range of
consistency levels Lines of C++ code modiﬁed by system component KEEP CALM AND WRITE(X)

45 Autoscaling & Multi-Tier Cost Tradeoffs 350x the performance of
DynamoDB for the same price!

Hydrocache and LDPC Transactional Causal Consistency for Serverless Computing Compute
Storage Logical Disaggregation, Physical Colocation

Cloudburst/Hydrocache: A Stateful Serverless Platform Main Challenge: Cache consistency! Hydrocache:
new consistency protocols for distributed client “sessions” Compute Storage

48 One Consistency Level: Multisite TCC Multisite Transactional Causal Consistency
(MTCC) Causal: Provide a consistent time based on Lamport’s happened before relation Multisite transactional: an arbitrary functional expression, running across multiple machines. Lamport’s partial ordering Step 1: Causal Cut in a cache “Bolt-On” Causal Consistency: single client, non-transactional Step 2: Causal Cut across caches and keys! Hydrocache MTCC 48 Bailis, P, Ghodsi, A, Hellerstein JM and Stoica I. "Bolt-on causal consistency." SIGMOD 213

52 There are more cases to handle both “ﬁxup” and
“orchestrate from scratch” a whole other story for Read Atomic isolation! 52

CloudBurst/Hydrocache: A Stateful Serverless Platform CompeCCve performance for a predicCon
serving pipeline. 200 400 600 800 1000 1200 1400 Python Droplet AWS Lambda (Mock) AWS SageMaker AWS Lambda (Actual) Latency (ms) 182.5 210.2 325.7 355.8 1181 191.5 277.4 411.3 416.6 1364 Performant consistency on a real-world web app. 1 10 100 1000 Droplet (LWW) Droplet (Causal) Redis Droplet (LWW) Droplet (Causal) Redis Reads Writes Latency (ms) 16.1 18.0 15.0 397 501 810 31.9 79 27.9 503 801 921.3 1 10 100 1000 10000 D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) D roplet (H ot) D roplet (C old) Lam bda (R edis) Lam bda (S3) Size: 80KB Size: 800KB Size: 8MB Size: 80MB Latency (ms) 2.8 5.6 32.7 346 4.7 21.1 100 1065 3.2 9.3 38.3 385 6.7 66.9 112 1630 6.4 59.8 253 506 17.2 279 392 2034 81.6 732 2646 1963 238 2743 5209 4250 End-to-end latency for a task with large data inputs.

• Anna: CALM KVS w/o semantics • LDPC: disaggregate &
colocate • Coordination-free consistent cache Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free

55 A Tale of Two Philosophies Correctness, optimizability, analysis Familiarity,
agility, reuse

Hydro: https://github.com/hydro-project Bloom: http://bloom-lang.net RiseLab: https://rise.cs.berkeley.edu [email protected] @joe_hellerstein 5 More
Information Chenggang Wu Jose M. Faleiro Vikram Sreekanti Alexey Tumanov Joseph Gonzalez Johann Schleier-Smith Charles Lin

58 Backup Slides

Advance Calmly: Serverless Computing and Cloud ...

Advance Calmly: Serverless Computing and Cloud Programming

More Decks by Joe Hellerstein

Other Decks in Technology

Featured

Transcript