A Data-Centric Lens on Cloud Programming and Serverless Computing

A Data-Centric Lens on Cloud Programming and Serverless Computing

ICDE 2020 Keynote talk. Video online at https://www.youtube.com/watch?v=cRxa-PeUk6w

Major shifts in computing platforms are typically accompanied by new programming models. The public cloud emerged a decade ago, but we have yet to see a new generation of programming platforms arise in response. All the traditional challenges of distributed programming and data are present in the cloud, only they are now faced by the general population of software developers. Added to these challenges are new desires for "serverless" computing, including consumption-based pricing and autoscaling, which raise particular challenges for data-centric applications.

This talk will highlight some key principles for cloud programming that came out of database research, including the CALM Theorem and constructive approaches to monotonic coordination-free consistency. I will discuss a new platform called Hydro that we are building at Berkeley to take these ideas and combine them into a polyglot, pay-as-you-go platform for cloud programming and deployment. Early results on Hydro---and its underlying key-value store, Anna---point to major improvements that researchers can offer to Serverless Computing and public clouds. The talk will also illustrate emerging cloud opportunities for application areas of interest to our community, including prediction serving, data science and robotics.

Fb47910b51938c597b6ed6291206cb6e?s=128

Joe Hellerstein

April 22, 2020
Tweet

Transcript

  1. A Data-Centric Lens on Cloud Programming and Serverless Computing JOE

    HELLERSTEIN, UC BERKELEY
  2. Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing The

    CALM Theorem
  3. 3 Cray-1, 1976 Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984

    Personal Computers PDP-11, 1970 Minicomputers Sea Changes in Computing
  4. 4 New Platform + New Language = Innovation Cray-1, 1976

    Supercomputers iPhone, 2007 Smart Phones Macintosh, 1984 Personal Computers PDP-11, 1970 Minicomputers
  5. 5 How will folks program the cloud? In a way

    that fosters unexpected innovation Distributed programming is hard! • Parallelism, consistency, partial failure, … Autoscaling makes it harder! The Big Question
  6. 6 We’ve been talking about this for a while!

  7. 7 Industry finally woke up: Serverless Computing Industry Response: Serverless

    Computing 7
  8. Serverless 101: Functions-as-a-Service (FaaS) Enable developers (outside of AWS, Azure,

    Google, etc.) to program the cloud Access to 1000s of cores, PBs of RAM Fine-grained resource usage and efficiency Enables new economic, pricing models, etc.
  9. 9 Autoscaling Massive Data Processing Unbounded Distributed Computing 1 STEP

    FORWARD 2 STEPS BACK ✅ Serverless & the Three Promises of the Cloud
  10. 10 Three Limitations of Current FaaS (e.g. AWS Lambda) I/O

    Bottlenecks 10-100x higher latency than SSD disks, charges for each I/O. 15-min lifetimes Functions routinely fail, can’t assume any session context No Inbound Network Communication Instead, “communicate” through global services on every call
  11. 11 Still, Serverless Opens the Conversation Small steps to Big

    Questions
  12. Hydro: Stateful Serverless and Beyond Avoiding Coordination Serverless Computing +

    Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem
  13. 13 A First Step: Embracing State Program State: Local data

    that is managed across invocations Challenge 1: Data Gravity Expensive to move state around. This policy problem is not so hard. Challenge 2: Distributed Consistency This correctness problem is difficult and unavoidable!
  14. 14 The Challenge: Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? x = ❤
  15. 15 The Challenge: Consistency Ensure that distant agents agree (or

    will agree) on common knowledge. Classic example: data replication How do we know if they agree on the value of a mutable variable x? If they disagree now, what could happen later? x = ❤ x =
  16. 16 Classical Consistency Mechanisms: Coordination Consensus (Paxos, etc), Commit (Two-Phase

    Commit, etc)
  17. 17 Coordination Avoidance (a poem) the first principle of successful

    scalability is to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them —James Hamilton (IBM, MS, Amazon) in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009 ” “
  18. 18 Why Avoid Coordination? Waiting for control is bad Tail

    latency of a quorum of machines can be very high (straggler effects) Waiting leads to slowdown cascades It’s not just “your” problem!
  19. 19 Towards a Solution Traditional distributed systems is all about

    I/O What if we reason about application semantics? With thanks to Peter Bailis…
  20. None
  21. Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!

    • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem
  22. 22 Hellerstein JM. The Declarative Imperative: Experiences and conjectures in

    distributed logic. ACM PODS Keynote, June 2010 ACM SIGMOD Record, Sep 2010. Ameloot TJ, Neven F, Van den Bussche J. Relational transducers for declarative networking. JACM, Apr 2013. Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. ACM TODS, Feb 2016. Hellerstein JM, Alvaro P. Keeping CALM: When Distributed Consistency is Easy. To appear, CACM 2020. Theorem (CALM): A distributed program P has a consistent, coordination-free distributed implementation if and only if it is monotonic. CALM: CONSISTENCY AS LOGICAL MONOTONICITY
  23. 23 We’ll need some formal definitions

  24. 24 Intuitively… Consistency: A unique outcome guaranteed regardless of NW

    shenanigans. Monotonicity: The set of outcomes only grows during execution. Emit outputs without regret! Coordination: Responses we await even though we have all the data.
  25. 25 Distributed Deadlock: Once you observe the existence of a

    waits-for cycle, you can (autonomously) declare deadlock. More information will not change the result. Garbage Collection: Suspecting garbage (the non-existence of a path from root) is not enough; more information may change the result. Hence you are required to check all nodes for information (under any assignment of objects to nodes!) Two Canonical Examples Deadlock! Garbage?
  26. 26 That’s interesting. Who cares? CALM thinking inspires crazy-fast, infinitely-scalable

    systems No coordination = insane parallelism and smooth scalability E.g. we’ll see the Anna KVS in a few slides We can actually check monotonicity syntactically in a logic language! E.g. in SQL. Or Bloom. But who writes distributed programs in logic?! CALM explains CAP, the times when we get Safety+Liveness A conversation for another day… http://bit.ly/calm-cacm
  27. Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination!

    • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free
  28. Hydro Hydro: A Platform for Programming the Cloud Anna: autoscaling

    mul---er KVS ICDE18, VLDB19 HydroLogic: a disorderly IR Hydrolysis: a cloud compiler toolkit Cloudburst: Stateful FaaS https://arxiv.org/abs/2001.04592 f(x) ? Logic Programming Functional Reactive Actors Futures
  29. 29 Anna Serverless KVS • Anyscale: perform like Redis, scale

    like S3 • CALM consistency levels via simple lattices • Autoscaling & multitier serverless storage • Won best-of-conference at ICDE, VLDB1, 2 1 Wu, Chenggang, et al. "Anna: A kvs for any scale." IEEE Transac*ons on Knowledge and Data Engineering (2019). 2 Wu, Chenggang, Vikram SreekanC, and Joseph M. Hellerstein. "Autoscaling Cered cloud storage in Anna." PVLDB 12.6 (2019): 624-638.
  30. 30 Anna Performance Shared-nothing at all scales (even across threads)

    Crazy fast under contention Up to 700x faster than Masstree within a multicore machine Up to 10x faster than Cassandra in a geo-distributed deployment Coordination-free consistency. No atomics, no locks, no waiting ever! 700x!
  31. 31 CALM Consistency Simple, clean lattice composition gives range of

    consistency levels 31 Lines of C++ code modified by system component KEEP CALM AND WRITE(X)
  32. 32 Autoscaling & Multi-Tier Cost Tradeoffs 350x the performance of

    DynamoDB for the same price!
  33. Cloudburst: A Stateful Serverless Platform Main Challenge: Cache consistency! Hydrocache:

    new consistency protocols for distributed client “sessions” Compute Storage
  34. 34 Multiple Consistency Levels Here Too Read Atomic transactions AFT1:

    a fault tolerance shim layer between any FaaS and any object store • Currently evaluated between AWS Lambda and AWS S3! Multisite Transactional Causal Consistency (MTCC)2 Causal: Preserve Lamport’s happened before relation Multisite transactional: Nested functions running across multiple machines. 34 1Sreekanti, Vikram, et al. A Fault-Tolerance Shim for Serverless Computing. To appear, Eurosys (2020). 2Wu, Chenggang, et al. Transactional Causal Consistency for Serverless Computing. To appear, ACM SIGMOD (2020).
  35. Running a Twitter Clone on Cloudburst 1 10 100 1000

    Cloudburst (LWW) Cloudburst (Causal) Redis Cloudburst (LWW) Cloudburst (Causal) Redis Reads Writes Latency (ms) 16.1 18.0 15.0 397 501 810 31.9 79 27.9 503 801 921.3
  36. Prediction Serving on Cloudburst 200 400 600 800 1000 1200

    1400 Python Cloudburst AWS SageMaker AWS Lambda Latency (ms) 182.5 210.2 355.8 1181 191.5 277.4 416.6 1364
  37. 37 Applications on Cloudburst Hydro Anna: autoscaling mul---er KVS ICDE18,

    VLDB19 HydroLogic: a disorderly IR Hydrolysis: a cloud compiler toolkit ? Logic Programming Functional Reactive Actors Futures Cloudburst: Stateful FaaS https://arxiv.org/abs/2001.04592 f(x)
  38. 38 Applications on Cloudburst Hydro Anna: autoscaling multi-tier KVS ICDE18,

    VLDB19 Cloudburst: Stateful FaaS https://arxiv.org/abs/2001.04592 f(x) Serverless Data Science Robot Mo-on Planning ML Predic-on: ModelZoo Charles Lin Devin Petersohn Simon Mo Rehan Durrani Aditya Ramkumar Avinash Arjavalingam Jeffrey Ichnowski
  39. ModelZoo on Cloudburst

  40. Why Serverless Jupyter? Large Jupyter deployments! ⋅ Berkeley DataHub: Jupyter

    deployment that serves over 37,000 students! ⋅ Scaling issues ⋅ Resource efficiency issues
  41. A single user’s compute demands RESOURCE DEMANDS TIME Running a

    cell Typing; thinking; not at computer
  42. RESOURCE DEMANDS + … (x 37000) + 37000

  43. Deadline rush Light utilization RESOURCE DEMANDS TIME

  44. Jupyter on Cloudburst ⋅ A prototype Jupyter notebook that has

    been ported to execute on Cloudburst ⋅ Each cell is a serverless func:on execu:on ⋅ Notebooks hold zero provisioned compute when not running a cell!
  45. Cloudburst

  46. Cloudburst

  47. Cloudburst x: 3

  48. Cloudburst x: 3 Program state stored in Anna Each cell

    retrieves only the definiDons it needs
  49. Demo: Jupyter on Cloudburst

  50. Beyond opera:onal benefits ⋅ Serverless architecture does much more than

    just address Jupyter’s scaling and cost problems ⋅ Also enables new direc:ons for Jupyter!
  51. Demo: Spiking memory usage

  52. This is only a taste of what’s possible. Future work:

    Choosing instance types for cells
  53. Cell returns immediately! `table` loads in background This is only

    a taste of what’s possible. Future work: AutomaHc asynchronous evaluaHon
  54. Cloudburst a: np.array(...) This is only a taste of what’s

    possible. Future work: True notebook sharing
  55. Scalable Serverless Robot Motion Planning with Cloudburst and Anna Jeffrey

    Ichnowski, Chenggang Wu, Jackson Chui, Raghav Anand, Samuel Paradis, Vikram Sreekanti, Joseph Hellerstein, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg AUTOLAB
  56. Robot Wants to Get Around Room

  57. Motion Planning Compute Requirements Navigation Manipulation Different problems require different

    amounts of computing High CPU Low CPU “Go to office desk” “Declutter desk”
  58. Motion Planning Compute Requirements 0 25 50 75 100 0

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Robot’s CPU usage over time Simple planning
 problem Complex planning
 problem Robot moving Robot moving
  59. Motion Planning in Serverless Computing λ λ λ λ λ

    λ λ λ Requirements: • Low-latency simultaneous launch of 100s of functions • Fast sharing of best path between functions • Conflict resolution based on path cost
  60. Lambda Communication λ λ λ λ λ λ λ λ

    AWS API Endpoint “start 8 lambdas” How do they communicate? Originally Built on AWS Lambda
  61. Lambda Communication λ λ λ λ λ λ λ λ

    AWS API Endpoint “start 8 lambdas” How do they communicate? Originally Built on AWS Lambda
  62. Lambda Communication λ λ λ λ λ λ λ λ

    AWS API Endpoint “start 8 lambdas” coordinator (EC2 1-core instance) my IP is 192.168.0.54
  63. Communication Bottleneck λ λ λ λ λ λ λ λ

    AWS API Endpoint “start 8 lambdas” coordinator (EC2 1-core instance) my IP is 192.168.0.54 Bottleneck on number of lambdas
  64. Overcoming Communication Bottleneck λ λ λ λ λ λ λ

    λ Cloudburst API Endpoint “start 8 executors” Anna “use Anna key: (…)” Anna
  65. Before Anna Lattices λa λb …, τa 3 , τa

    2 , τa 1 …, τb 3 , τb 2 , τb 1 …, τc 3 , τc 2 , τc 1 λc τb 1 , τc 1 , τb 2 , τc 2 , τc 3 , τb 3 , … τa 1 , τc 1 , τa 2 , τc 2 , τc 3 , τa 3 , … τa 1 , τb 1 , τa 2 , τb 2 , τb 3 , τa 3 , … coordinator (EC2 1-core instance)
  66. After Anna Lattices λa λb …, τa 3 , τa

    2 , τa 1 …, τb 3 , τb 2 , τb 1 …, τc 3 , τc 2 , τc 1 λc τb 1 , τc 1 , τa 3 , … τa 1 , τa 2 , τc 3 , … τa 1 , τb 1 , τa 3 , … Anna KVS
  67. None
  68. None
  69. Decluttering with Cloudburst + Anna Bottle to grasp

  70. Cloudburst + Anna: Motion Plan Cost Over Time 10 s

    20 s 30 s 40 s 50 s 60 s 10 concurrent Cloudburst functions 100 concurrent Cloudburst functions 100 90 80 70 60 50 40 30 Cost
 (sum of joint angle changes) Cost w/10 functions after 60 seconds = cost w/100 functions after 2 seconds
  71. ROBOT DEMO

  72. • Anna: CALM anyscale KVS • Stateful FaaS: cache consistency

    • Stateful FaaS applications Hydro: Stateful Serverless and Beyond • Autoscaling stateful? Avoid Coordination! • Semantics to the rescue? Avoiding Coordination Serverless Computing + Autoscaling — Latency-Sensitive Data Access — Distributed Computing The CALM Theorem Monotonicity is the “bright line” between what can and cannot be done coordination-free
  73. 73 We’re pushing the state of art in FaaS But

    Stateful FaaS is a limited API Python functions + explicit storage With limited contracts from the PL Developer must reason about consistency guarantees And decide when app logic needs to coordinate The real dream takes time! Did We Answer the Big Question? Not Yet.
  74. 74 Research Futures Hydro Anna: autoscaling mul---er KVS ICDE18, VLDB19

    HydroLogic: a disorderly IR Hydrolysis: a cloud compiler toolkit ? Logic Programming Functional Reactive Actors Futures Cloudburst: Stateful FaaS https://arxiv.org/abs/2001.04592 f(x)
  75. Hydro: https://github.com/hydro-project Bloom: http://bloom-lang.net RiseLab: https://rise.cs.berkeley.edu hellerstein@berkeley.edu @joe_hellerstein 7 5

    More Information Chenggang Wu Vikram Sreekanti Joseph Gonzalez Johann Schleier-Smith Charles Lin Music composed and performed by: