Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Worst is Best in Distributed Systems Design

pbailis
September 25, 2015

When Worst is Best in Distributed Systems Design

StrangeLoop 2015
25 September 2015
St. Louis, MO

Talk video: https://www.youtube.com/watch?v=ZGIAypUUwoQ
More information: http://bailis.org/

n many areas of systems design, provisioning for worst-case behavior (e.g., load spikes and anomalous user activity) incurs sizable penalties (e.g., performance and operational overheads) in the typical and best cases. However, in distributed systems, building software that is resilient to worse-case network behavior can -- perhaps paradoxically -- lead to improved behavior in typical and best-case scenarios. That is, systems that don't rely on synchronous communication (or coordination) in the worst case frequently aren't forced to wait in any case -- improving latency, scalability, and performance via increased concurrency.

In this talk, we'll explore how to use this worst-case analysis as a more general design principle for scalable systems design. As developers increasingly interacting with and building our own distributed systems, we tend to fixate only on failure scenarios (e.g., "partition tolerance" in the CAP Theorem); this is an important first step, but it's not the whole story. To illustrate why, I'll present practical lessons learned from applying this principle to both web and transaction processing applications as well as database internals such as integrity constraints and indexes. We've found considerable evidence that many of these common tasks and workloads can benefit substantially (e.g., regular order-of-magnitude speedups) from this analysis. In all likelihood, you can too.

pbailis

September 25, 2015
Tweet

More Decks by pbailis

Other Decks in Technology

Transcript

  1. WHEN
    WORST
    IS BEST Peter Bailis
    Stanford CS

    @pbailis

    in distributed
    systems
    design StrangeLoop 2015
    25 September, St. Louis

    View Slide

  2. What if we designed computer
    systems for worst case scenarios?

    View Slide

  3. Cluster provisioning: 7.3B simultaneous users
    many idle resources!
    What if we designed computer
    systems for worst case scenarios?

    View Slide

  4. Cluster provisioning: 7.3B simultaneous users
    many idle resources!
    Hardware: chips for the next Mars rover
    hugely expensive packaging!
    What if we designed computer
    systems for worst case scenarios?

    View Slide

  5. Cluster provisioning: 7.3B simultaneous users
    many idle resources!
    Hardware: chips for the next Mars rover
    hugely expensive packaging!
    Security: all our developers are malicious
    expensive code deployment!
    What if we designed computer
    systems for worst case scenarios?

    View Slide

  6. Designing for
    the worst case

    often penalizes
    the average case

    View Slide

  7. Designing for the worst case
    often penalizes the average case
    Average
    case

    performance
    Worst case performance

    View Slide

  8. Designing for the worst case
    often penalizes the average case
    Average
    case

    performance
    Worst case performance
    ???
    this talk

    View Slide

  9. This talk: When can designing for
    the worst case improve the
    average case?
    Structure
    Distributed systems and the network

    Beyond the network

    Lessons

    View Slide

  10. This talk: When can designing for
    the worst case improve the
    average case?
    Structure
    Distributed systems and the network

    Beyond the network

    Lessons

    View Slide

  11. Almost every non-trivial application today is (or is
    becoming) distributed
    Distributed Systems Matter
    Distribution happens over a network

    View Slide

  12. Almost every non-trivial application today is (or is
    becoming) distributed
    Corollary:
    Almost every non-trivial application today needs to
    worry about the network
    Distributed Systems Matter
    Distribution happens over a network

    View Slide

  13. Networks make design hard
    Many things can go wrong:

    View Slide

  14. Networks make design hard
    Many things can go wrong:
    Packets may be delayed
    Packets may be dropped
    Sometimes called an asynchronous network

    View Slide

  15. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  16. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  17. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  18. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  19. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  20. any replica can respond to any request
    Handling Worst-Case Net Behavior
    availability addresses delays, drops:

    View Slide

  21. any replica can respond to any request
    Handling Worst-Case Net Behavior
    if our system is available,
    then even when network is fine,

    we still don’t have to talk!
    availability addresses delays, drops:

    View Slide

  22. any replica can respond to any request
    Handling Worst-Case Net Behavior
    if our system is available,
    then even when network is fine,

    we still don’t have to talk!
    NO
    COORDINATION
    availability addresses delays, drops:

    View Slide

  23. Coordination-free systems
    What if we don’t have to talk?

    View Slide

  24. Coordination-free systems:

    1.) Enable infinite scale-out
    What if we don’t have to talk?

    View Slide

  25. Coordination-free systems:

    1.) Enable infinite scale-out
    What if we don’t have to talk?

    View Slide

  26. Coordination-free systems:

    1.) Enable infinite scale-out
    What if we don’t have to talk?

    View Slide

  27. Coordination-free systems:

    1.) Enable infinite scale-out
    What if we don’t have to talk?

    View Slide

  28. Coordination-free systems:

    1.) Enable infinite scale-out
    What if we don’t have to talk?

    View Slide

  29. A B C D E F G H
    DISTRIBUTED TRANSACTIONS (EC2)
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    Number of Servers (Items) Accessed per Transaction
    Number of Servers (Items) Accessed per Transaction

    View Slide

  30. A B C D E F G H
    IN-MEMORY
    LOCKING
    COORDINATED
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    DISTRIBUTED TRANSACTIONS (EC2)
    Number of Servers (Items) Accessed per Transaction
    Number of Servers (Items) Accessed per Transaction

    View Slide

  31. A B C D E F G H
    IN-MEMORY
    LOCKING
    COORDINATED
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    DISTRIBUTED TRANSACTIONS (EC2)
    LOG SCALE!
    -398x
    Number of Servers (Items) Accessed per Transaction
    Number of Servers (Items) Accessed per Transaction

    View Slide

  32. A B C D E F G H
    IN-MEMORY
    LOCKING
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    COORDINATED
    COORDINATION-FREE
    DISTRIBUTED TRANSACTIONS (EC2)
    -398x
    Number of Servers (Items) Accessed per Transaction

    View Slide

  33. Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    What if we don’t have to talk?

    View Slide

  34. 133.7+ ms
    RTT

    View Slide

  35. 133.7+ ms
    RTT

    View Slide

  36. 133.7+ ms
    RTT
    85.1+ ms
    RTT

    View Slide

  37. What if we don’t have to talk?
    Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response

    View Slide

  38. What if we don’t have to talk?
    Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response

    View Slide

  39. Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response
    What if we don’t have to talk?

    View Slide

  40. But wait! What about CAP?!?!
    • CAP Thm.: Famous result from Eric Brewer, Inktomi
    • Takeaway (+ related results): properties like serializability
    require unavailability (or require coordination)
    • Common (incorrect) conclusion: availability is too
    expensive, only matters during failures, so forget about it

    View Slide

  41. But wait! What about CAP?!?!
    • CAP Thm.: Famous result from Eric Brewer, Inktomi
    • Takeaway (+ related results): properties like serializability
    require unavailability (or require coordination)
    • Common (incorrect) conclusion: availability is too
    expensive, only matters during failures, so forget about it
    surprise: many useful guarantees don’t
    require coordination (or unavailability)!

    View Slide

  42. “Worst” is a Design Tool
    legacy implementations: designed for single-
    node context, use coordination
    research question: what if we built systems that
    didn’t have to coordinate?
    result: new designs that avoid coordination unless
    strictly necessary
    Example: Coordination-Avoiding Databases

    View Slide

  43. Simple Example: Read Committed
    legacy implementation: lock records during access
    research question: is coordination necessary?
    goal: never read from uncommitted transactions

    View Slide

  44. Simple Example: Read Committed
    legacy implementation: lock records during access
    research question: is coordination necessary?
    result: no! for example, buffer writes until commit

    result: OOM speedups over classic implementations
    goal: never read from uncommitted transactions
    VLDB 2014, SIGMOD 2015

    View Slide

  45. What if we don’t have to talk?
    Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response

    View Slide

  46. Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response
    What if we don’t have to talk?

    View Slide

  47. Coordination-free systems:

    1.) Enable infinite scale-out

    2.) Improve throughput

    3.) Ensure low latency

    4.) Guarantee “always on" response
    What if we don’t have to talk?
    Accounting for worst case improves average case

    View Slide

  48. Punchline: Distributed Systems & Networks
    • Systems that behave well during network faults can
    behave better in non-faulty environments too
    • With good designs, popular guarantees from today’s
    RDBMSs can benefit! (see also Martin’s talk, 11AM Sat)
    • Research on coordination-avoiding systems highlights
    potential for huge speedups (see bailis.org)
    • Keywords: CRDTs, I-confluence, RAMP, HAT, Bloom^L

    View Slide

  49. This talk: When can designing for
    the worst case improve the
    average case?
    Structure
    Distributed systems and the network

    Beyond the network

    Lessons

    View Slide

  50. Replication for fault tolerance
    can increase request capacity
    Replication helps Capacity

    View Slide

  51. Fail-over helps (Dev)Ops

    View Slide

  52. Fail-over helps (Dev)Ops

    View Slide

  53. If services can
    auto-fail-over…
    can kill processes:

    to perform upgrades

    to manage stragglers

    to revoke resources
    Fail-over helps (Dev)Ops

    View Slide

  54. 99.9th %ile latency: 100ms
    avg latency: 1.2ms
    YOUR
    SERVICE
    HERE
    Tail Latency in (Micro)services

    View Slide

  55. 99.9th %ile latency: 100ms
    avg latency: 1.2ms
    YOUR
    SERVICE
    HERE
    10ms
    Tail Latency in (Micro)services

    View Slide

  56. 99.9th %ile latency: 100ms
    avg latency: 1.2ms
    YOUR
    SERVICE
    HERE
    10ms
    1.09ms
    Tail Latency in (Micro)services

    View Slide

  57. 99.9th %ile latency: 100ms
    Tail Latency in (Micro)services

    View Slide

  58. front-end avg. latency: 64ms
    at 100x fan-out,
    99.9th %ile latency: 100ms
    Tail Latency in (Micro)services

    View Slide

  59. front-end avg. latency: 64ms
    at 100x fan-out,
    99.9th %ile latency: 100ms 10ms
    Tail Latency in (Micro)services

    View Slide

  60. front-end avg. latency: 64ms
    at 100x fan-out,
    6.7ms
    99.9th %ile latency: 100ms 10ms
    Tail Latency in (Micro)services

    View Slide

  61. YOUR SERVICE’S
    CORNER CASE
    MAY BE ITS
    CONSUMER’S
    AVERAGE CASE

    View Slide

  62. Universal Design

    View Slide

  63. Universal Design

    View Slide

  64. View Slide

  65. There is also a strong business case for accessibility.
    Accessibility overlaps with other best practices such as mobile web
    design, device independence, multi-modal interaction, usability,
    design for older users, and search engine optimization (SEO).

    Case studies show that accessible websites have better search
    results, reduced maintenance costs, and increased audience reach,
    among other benefits.

    View Slide

  66. x
    f(x)
    When “Best” Is Brittle
    Idealized function
    Optimum

    View Slide

  67. x
    f(x)
    When “Best” Is Brittle
    Idealized function
    Optimum
    Less well-behaved
    x
    f(x)
    Optimum

    View Slide

  68. x
    f(x)
    When “Best” Is Brittle
    Idealized function
    Optimum
    Less well-behaved
    x
    f(x)
    Optimum
    Missed the target

    View Slide

  69. x
    f(x)
    When “Best” Is Brittle
    Idealized function
    Optimum
    Less well-behaved
    x
    f(x)
    Optimum
    Missed the target
    “Stable”
    solution

    View Slide

  70. x
    f(x)
    When “Best” Is Brittle
    Idealized function
    Optimum
    Less well-behaved
    x
    f(x)
    Optimum
    Missed the target
    “Stable”
    solution
    Robust Optimization studies finding the stable solution

    View Slide

  71. This talk: When can designing for
    the worst case improve the
    average case?
    Structure
    Distributed systems and the network

    Beyond the network

    Lessons

    View Slide

  72. This talk: When can designing for
    the worst case improve the
    average case?

    View Slide

  73. When does this apply?
    When corner cases are common
    When environmental conditions are variable
    When “normal” isn’t what we think
    This talk: When can designing for
    the worst case improve the
    average case?

    View Slide

  74. DEFINING
    “NORMAL”
    DEFINES OUR
    DESIGNS

    View Slide

  75. “Worst” raises tough questions

    View Slide

  76. Cluster provisioning:
    what’s our scale-out strategy?
    “Worst” raises tough questions

    View Slide

  77. Cluster provisioning:
    what’s our scale-out strategy?
    Hardware:
    what happens during bit flips? do we need ECC?
    “Worst” raises tough questions

    View Slide

  78. Cluster provisioning:
    what’s our scale-out strategy?
    Hardware:
    what happens during bit flips? do we need ECC?
    Security:
    how to do we manage internal data accesses?
    “Worst” raises tough questions

    View Slide

  79. EXAMINE
    YOUR
    BIASES

    View Slide

  80. Reasoning about worst-case scenarios
    can be a powerful design tool
    Key to coordination avoiding distributed
    systems designs
    Can often improve performance and
    robustness, also combat bias
    @PBAILIS // bailis.org

    View Slide

  81. Special thanks to

    David Andersen, Ali Ghodsi, Joe Hellerstein, Eddie Kohler,
    Phil Levis, Alex Miller, Oscar Moll, Barzan Mozafari, Ion
    Stoica, Eugene Wu, Jean Yang, Matei Zaharia

    View Slide

  82. Reasoning about worst-case scenarios
    can be a powerful design tool
    Key to coordination avoiding distributed
    systems designs
    Can often improve performance and
    robustness, also combat bias
    @PBAILIS // bailis.org

    View Slide