Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Data-Centric Lens on Cloud Programming and Serverless Computing

A Data-Centric Lens on Cloud Programming and Serverless Computing

ICDE 2020 Keynote talk. Video online at https://www.youtube.com/watch?v=cRxa-PeUk6w

Major shifts in computing platforms are typically accompanied by new programming models. The public cloud emerged a decade ago, but we have yet to see a new generation of programming platforms arise in response. All the traditional challenges of distributed programming and data are present in the cloud, only they are now faced by the general population of software developers. Added to these challenges are new desires for "serverless" computing, including consumption-based pricing and autoscaling, which raise particular challenges for data-centric applications.

This talk will highlight some key principles for cloud programming that came out of database research, including the CALM Theorem and constructive approaches to monotonic coordination-free consistency. I will discuss a new platform called Hydro that we are building at Berkeley to take these ideas and combine them into a polyglot, pay-as-you-go platform for cloud programming and deployment. Early results on Hydro---and its underlying key-value store, Anna---point to major improvements that researchers can offer to Serverless Computing and public clouds. The talk will also illustrate emerging cloud opportunities for application areas of interest to our community, including prediction serving, data science and robotics.

Joe Hellerstein

April 22, 2020
Tweet

More Decks by Joe Hellerstein

Other Decks in Technology

Transcript

  1. A Data-Centric Lens on
    Cloud Programming and
    Serverless Computing
    JOE HELLERSTEIN, UC BERKELEY

    View Slide

  2. Hydro: Stateful
    Serverless and Beyond
    Avoiding Coordination
    Serverless Computing
    The CALM Theorem

    View Slide

  3. 3
    Cray-1, 1976
    Supercomputers
    iPhone, 2007
    Smart Phones
    Macintosh, 1984
    Personal Computers
    PDP-11, 1970
    Minicomputers
    Sea Changes in Computing

    View Slide

  4. 4
    New Platform + New Language = Innovation
    Cray-1, 1976
    Supercomputers
    iPhone, 2007
    Smart Phones
    Macintosh, 1984
    Personal Computers
    PDP-11, 1970
    Minicomputers

    View Slide

  5. 5
    How will folks program the cloud?
    In a way that fosters unexpected innovation
    Distributed programming is hard!
    • Parallelism, consistency, partial failure, …
    Autoscaling makes it harder!
    The Big Question

    View Slide

  6. 6
    We’ve been talking about this for a while!

    View Slide

  7. 7
    Industry finally woke up: Serverless Computing
    Industry Response:
    Serverless Computing
    7

    View Slide

  8. Serverless 101: Functions-as-a-Service (FaaS)
    Enable developers (outside of AWS, Azure, Google, etc.) to program the cloud
    Access to 1000s
    of cores, PBs of
    RAM
    Fine-grained
    resource usage
    and efficiency
    Enables new
    economic,
    pricing models,
    etc.

    View Slide

  9. 9
    Autoscaling
    Massive Data Processing
    Unbounded Distributed Computing
    1
    STEP
    FORWARD 2
    STEPS
    BACK

    Serverless & the Three Promises of the Cloud

    View Slide

  10. 10
    Three Limitations of Current FaaS (e.g. AWS Lambda)
    I/O Bottlenecks
    10-100x higher latency than SSD disks, charges for each I/O.
    15-min lifetimes
    Functions routinely fail, can’t assume any session context
    No Inbound Network Communication
    Instead, “communicate” through global services on every call

    View Slide

  11. 11
    Still, Serverless Opens the Conversation
    Small steps to Big Questions

    View Slide

  12. Hydro: Stateful
    Serverless and Beyond
    Avoiding Coordination
    Serverless Computing
    + Autoscaling
    — Latency-Sensitive Data Access
    — Distributed Computing
    The CALM Theorem

    View Slide

  13. 13
    A First Step: Embracing State
    Program State: Local data that is managed across invocations
    Challenge 1: Data Gravity
    Expensive to move state around. This policy problem is not so hard.
    Challenge 2: Distributed Consistency
    This correctness problem is difficult and unavoidable!

    View Slide

  14. 14
    The Challenge: Consistency
    Ensure that distant agents agree (or will agree) on common knowledge.
    Classic example: data replication
    How do we know if they agree on the value of a mutable variable x?
    x = ❤

    View Slide

  15. 15
    The Challenge: Consistency
    Ensure that distant agents agree (or will agree) on common knowledge.
    Classic example: data replication
    How do we know if they agree on the value of a mutable variable x?
    If they disagree now, what could happen later?
    x = ❤
    x =

    View Slide

  16. 16
    Classical Consistency Mechanisms: Coordination
    Consensus (Paxos, etc), Commit (Two-Phase Commit, etc)

    View Slide

  17. 17
    Coordination Avoidance (a poem)
    the first principle of successful scalability is
    to batter the consistency mechanisms down to a minimum
    move them off the critical path
    hide them in a rarely visited corner of the system, and then
    make it as hard as possible
    for application developers
    to get permission to use them
    —James Hamilton (IBM, MS, Amazon)
    in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009


    View Slide

  18. 18
    Why Avoid Coordination?
    Waiting for control is bad
    Tail latency of a quorum of machines can be very high (straggler effects)
    Waiting leads to slowdown cascades
    It’s not just “your” problem!

    View Slide

  19. 19
    Towards a Solution
    Traditional distributed systems is all about I/O
    What if we reason about application semantics?
    With thanks to Peter Bailis…

    View Slide

  20. View Slide

  21. Hydro: Stateful
    Serverless and Beyond
    • Autoscaling stateful?
    Avoid Coordination!
    • Semantics to the rescue?
    Avoiding Coordination
    Serverless Computing
    + Autoscaling
    — Latency-Sensitive Data Access
    — Distributed Computing
    The CALM Theorem

    View Slide

  22. 22
    Hellerstein JM. The Declarative Imperative:
    Experiences and conjectures in distributed logic.
    ACM PODS Keynote, June 2010
    ACM SIGMOD Record, Sep 2010.
    Ameloot TJ, Neven F, Van den Bussche J. Relational
    transducers for declarative networking.
    JACM, Apr 2013.
    Ameloot TJ, Ketsman B, Neven F, Zinn D. Weaker forms of
    monotonicity for declarative networking: a more fine-grained
    answer to the CALM-conjecture.
    ACM TODS, Feb 2016.
    Hellerstein JM, Alvaro P.
    Keeping CALM: When Distributed Consistency is Easy.
    To appear, CACM 2020.
    Theorem (CALM): A distributed program P has a consistent,
    coordination-free distributed implementation if and only if it is
    monotonic.
    CALM: CONSISTENCY AS LOGICAL MONOTONICITY

    View Slide

  23. 23
    We’ll need some formal definitions

    View Slide

  24. 24
    Intuitively…
    Consistency: A unique outcome guaranteed regardless of NW shenanigans.
    Monotonicity: The set of outcomes only grows during execution.
    Emit outputs without regret!
    Coordination: Responses we await even though we have all the data.

    View Slide

  25. 25
    Distributed Deadlock: Once you observe the existence
    of a waits-for cycle, you can (autonomously) declare
    deadlock. More information will not change the result.
    Garbage Collection: Suspecting garbage (the
    non-existence of a path from root) is not enough; more
    information may change the result. Hence you are
    required to check all nodes for information (under any
    assignment of objects to nodes!)
    Two Canonical Examples Deadlock!
    Garbage?

    View Slide

  26. 26
    That’s interesting. Who cares?
    CALM thinking inspires crazy-fast, infinitely-scalable systems
    No coordination = insane parallelism and smooth scalability
    E.g. we’ll see the Anna KVS in a few slides
    We can actually check monotonicity syntactically in a logic language!
    E.g. in SQL. Or Bloom.
    But who writes distributed programs in logic?!
    CALM explains CAP, the times when we get Safety+Liveness
    A conversation for another day…
    http://bit.ly/calm-cacm

    View Slide

  27. Hydro: Stateful
    Serverless and Beyond
    • Autoscaling stateful?
    Avoid Coordination!
    • Semantics to the rescue?
    Avoiding Coordination
    Serverless Computing
    + Autoscaling
    — Latency-Sensitive Data Access
    — Distributed Computing
    The CALM Theorem
    Monotonicity is the “bright line”
    between what can and cannot be done
    coordination-free

    View Slide

  28. Hydro
    Hydro: A Platform for Programming the Cloud
    Anna: autoscaling mul---er KVS
    ICDE18, VLDB19
    HydroLogic: a disorderly IR
    Hydrolysis: a cloud compiler toolkit
    Cloudburst: Stateful FaaS
    https://arxiv.org/abs/2001.04592
    f(x)
    ?
    Logic
    Programming
    Functional
    Reactive
    Actors Futures

    View Slide

  29. 29
    Anna Serverless KVS

    Anyscale: perform like Redis, scale like S3

    CALM consistency levels via simple lattices

    Autoscaling & multitier serverless storage

    Won best-of-conference at ICDE, VLDB1, 2
    1 Wu, Chenggang, et al. "Anna: A kvs for any scale." IEEE Transac*ons on Knowledge and Data Engineering (2019).
    2 Wu, Chenggang, Vikram SreekanC, and Joseph M. Hellerstein. "Autoscaling Cered cloud storage in Anna." PVLDB 12.6 (2019): 624-638.

    View Slide

  30. 30
    Anna Performance
    Shared-nothing at all scales (even across threads)
    Crazy fast under contention
    Up to 700x faster than Masstree within a multicore machine
    Up to 10x faster than Cassandra in a geo-distributed deployment
    Coordination-free consistency. No atomics, no locks,
    no waiting ever!
    700x!

    View Slide

  31. 31
    CALM Consistency
    Simple, clean lattice composition
    gives range of consistency levels
    31
    Lines of C++ code modified by system component
    KEEP
    CALM
    AND
    WRITE(X)

    View Slide

  32. 32
    Autoscaling & Multi-Tier Cost Tradeoffs
    350x the performance of
    DynamoDB for the same price!

    View Slide

  33. Cloudburst:
    A Stateful Serverless Platform
    Main Challenge: Cache consistency!
    Hydrocache: new consistency protocols for
    distributed client “sessions”
    Compute
    Storage

    View Slide

  34. 34
    Multiple Consistency Levels Here Too
    Read Atomic transactions
    AFT1: a fault tolerance shim layer between any FaaS and any object store
    • Currently evaluated between AWS Lambda and AWS S3!
    Multisite Transactional Causal Consistency (MTCC)2
    Causal: Preserve Lamport’s happened before relation
    Multisite transactional: Nested functions running across multiple machines.
    34
    1Sreekanti, Vikram, et al. A Fault-Tolerance Shim for Serverless Computing. To appear, Eurosys (2020).
    2Wu, Chenggang, et al. Transactional Causal Consistency for Serverless Computing. To appear, ACM SIGMOD (2020).

    View Slide

  35. Running a Twitter Clone on Cloudburst
    1
    10
    100
    1000
    Cloudburst
    (LWW)
    Cloudburst
    (Causal)
    Redis Cloudburst
    (LWW)
    Cloudburst
    (Causal)
    Redis
    Reads Writes
    Latency (ms)
    16.1 18.0
    15.0
    397
    501
    810
    31.9
    79
    27.9
    503
    801 921.3

    View Slide

  36. Prediction Serving on Cloudburst
    200
    400
    600
    800
    1000
    1200
    1400
    Python Cloudburst AWS
    SageMaker
    AWS Lambda
    Latency (ms)
    182.5 210.2
    355.8
    1181
    191.5
    277.4
    416.6
    1364

    View Slide

  37. 37
    Applications on Cloudburst
    Hydro
    Anna: autoscaling mul---er KVS
    ICDE18, VLDB19
    HydroLogic: a disorderly IR
    Hydrolysis: a cloud compiler toolkit
    ?
    Logic
    Programming
    Functional
    Reactive
    Actors Futures
    Cloudburst: Stateful FaaS
    https://arxiv.org/abs/2001.04592
    f(x)

    View Slide

  38. 38
    Applications on Cloudburst
    Hydro
    Anna: autoscaling multi-tier KVS
    ICDE18, VLDB19
    Cloudburst: Stateful FaaS
    https://arxiv.org/abs/2001.04592
    f(x)
    Serverless Data Science Robot Mo-on Planning ML Predic-on: ModelZoo
    Charles Lin
    Devin Petersohn
    Simon Mo
    Rehan Durrani
    Aditya Ramkumar
    Avinash Arjavalingam
    Jeffrey Ichnowski

    View Slide

  39. ModelZoo on Cloudburst

    View Slide

  40. Why Serverless Jupyter?
    Large Jupyter deployments!

    Berkeley DataHub: Jupyter deployment that serves over
    37,000 students!

    Scaling issues

    Resource efficiency issues

    View Slide

  41. A single user’s compute demands
    RESOURCE
    DEMANDS
    TIME
    Running a cell Typing; thinking;
    not at computer

    View Slide

  42. RESOURCE
    DEMANDS
    +
    … (x 37000)
    + 37000

    View Slide

  43. Deadline rush
    Light utilization
    RESOURCE
    DEMANDS
    TIME

    View Slide

  44. Jupyter on Cloudburst

    A prototype Jupyter notebook that has been ported to
    execute on Cloudburst

    Each cell is a serverless func:on execu:on

    Notebooks hold zero provisioned compute when not
    running a cell!

    View Slide

  45. Cloudburst

    View Slide

  46. Cloudburst

    View Slide

  47. Cloudburst
    x: 3

    View Slide

  48. Cloudburst
    x: 3
    Program state stored in Anna
    Each cell retrieves only the
    definiDons it needs

    View Slide

  49. Demo: Jupyter on Cloudburst

    View Slide

  50. Beyond opera:onal benefits

    Serverless architecture does much more than just
    address Jupyter’s scaling and cost problems

    Also enables new direc:ons for Jupyter!

    View Slide

  51. Demo: Spiking memory usage

    View Slide

  52. This is only a taste of what’s possible.
    Future work: Choosing instance types for cells

    View Slide

  53. Cell returns immediately!
    `table` loads in background
    This is only a taste of what’s possible.
    Future work: AutomaHc asynchronous evaluaHon

    View Slide

  54. Cloudburst
    a:
    np.array(...)
    This is only a taste of what’s possible.
    Future work: True notebook sharing

    View Slide

  55. Scalable Serverless Robot Motion
    Planning with Cloudburst and Anna
    Jeffrey Ichnowski, Chenggang Wu, Jackson Chui, Raghav Anand, Samuel Paradis,
    Vikram Sreekanti, Joseph Hellerstein, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg
    AUTOLAB

    View Slide

  56. Robot Wants to Get Around Room

    View Slide

  57. Motion Planning Compute Requirements
    Navigation Manipulation
    Different
    problems require
    different amounts
    of computing
    High CPU
    Low CPU
    “Go to office desk”
    “Declutter desk”

    View Slide

  58. Motion Planning Compute Requirements
    0
    25
    50
    75
    100
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
    Robot’s CPU usage over time
    Simple planning

    problem
    Complex planning

    problem
    Robot moving Robot moving

    View Slide

  59. Motion Planning in Serverless Computing
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    Requirements:
    • Low-latency simultaneous
    launch of 100s of functions

    • Fast sharing of best path
    between functions

    • Conflict resolution based on
    path cost

    View Slide

  60. Lambda Communication
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    AWS API Endpoint
    “start 8 lambdas”
    How do they communicate?
    Originally Built on AWS Lambda

    View Slide

  61. Lambda Communication
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    AWS API Endpoint
    “start 8 lambdas”
    How do they communicate?
    Originally Built on AWS Lambda

    View Slide

  62. Lambda Communication
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    AWS API Endpoint
    “start 8 lambdas”
    coordinator
    (EC2 1-core
    instance)
    my IP is 192.168.0.54

    View Slide

  63. Communication Bottleneck
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    AWS API Endpoint
    “start 8 lambdas”
    coordinator
    (EC2 1-core
    instance)
    my IP is 192.168.0.54
    Bottleneck
    on number of
    lambdas

    View Slide

  64. Overcoming Communication Bottleneck
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    λ
    Cloudburst API Endpoint
    “start 8 executors”
    Anna
    “use Anna key: (…)”
    Anna

    View Slide

  65. Before Anna Lattices
    λa
    λb
    …, τa
    3
    , τa
    2
    , τa
    1
    …, τb
    3
    , τb
    2
    , τb
    1
    …, τc
    3
    , τc
    2
    , τc
    1
    λc
    τb
    1
    , τc
    1
    , τb
    2
    , τc
    2
    , τc
    3
    , τb
    3
    , …
    τa
    1
    , τc
    1
    , τa
    2
    , τc
    2
    , τc
    3
    , τa
    3
    , …
    τa
    1
    , τb
    1
    , τa
    2
    , τb
    2
    , τb
    3
    , τa
    3
    , …
    coordinator
    (EC2 1-core
    instance)

    View Slide

  66. After Anna Lattices
    λa
    λb
    …, τa
    3
    , τa
    2
    , τa
    1
    …, τb
    3
    , τb
    2
    , τb
    1
    …, τc
    3
    , τc
    2
    , τc
    1
    λc
    τb
    1
    , τc
    1
    , τa
    3
    , …
    τa
    1
    , τa
    2
    , τc
    3
    , …
    τa
    1
    , τb
    1
    , τa
    3
    , …
    Anna KVS

    View Slide

  67. View Slide

  68. View Slide

  69. Decluttering with Cloudburst + Anna
    Bottle to grasp

    View Slide

  70. Cloudburst + Anna: Motion Plan Cost Over Time
    10 s 20 s 30 s 40 s 50 s 60 s
    10 concurrent Cloudburst functions
    100 concurrent Cloudburst functions
    100
    90
    80
    70
    60
    50
    40
    30
    Cost

    (sum of joint angle changes)
    Cost w/10 functions after 60 seconds
    = cost w/100 functions after 2 seconds

    View Slide

  71. ROBOT DEMO

    View Slide

  72. • Anna: CALM anyscale KVS
    • Stateful FaaS: cache consistency
    • Stateful FaaS applications
    Hydro: Stateful
    Serverless and Beyond
    • Autoscaling stateful?
    Avoid Coordination!
    • Semantics to the rescue?
    Avoiding Coordination
    Serverless Computing
    + Autoscaling
    — Latency-Sensitive Data Access
    — Distributed Computing
    The CALM Theorem
    Monotonicity is the “bright line”
    between what can and cannot be done
    coordination-free

    View Slide

  73. 73
    We’re pushing the state of art in FaaS
    But Stateful FaaS is a limited API
    Python functions + explicit storage
    With limited contracts from the PL
    Developer must reason about consistency guarantees
    And decide when app logic needs to coordinate
    The real dream takes time!
    Did We Answer the Big Question? Not Yet.

    View Slide

  74. 74
    Research Futures
    Hydro
    Anna: autoscaling mul---er KVS
    ICDE18, VLDB19
    HydroLogic: a disorderly IR
    Hydrolysis: a cloud compiler toolkit
    ?
    Logic
    Programming
    Functional
    Reactive
    Actors Futures
    Cloudburst: Stateful FaaS
    https://arxiv.org/abs/2001.04592
    f(x)

    View Slide

  75. Hydro: https://github.com/hydro-project
    Bloom: http://bloom-lang.net
    RiseLab: https://rise.cs.berkeley.edu
    [email protected]
    @joe_hellerstein
    7
    5
    More Information
    Chenggang Wu
    Vikram Sreekanti
    Joseph Gonzalez
    Johann Schleier-Smith
    Charles Lin
    Music composed and performed by:

    View Slide