When Worst is Best in Distributed Systems Design

B7dc26518988058faa50712248c80bd3?s=47 pbailis
September 25, 2015

When Worst is Best in Distributed Systems Design

StrangeLoop 2015
25 September 2015
St. Louis, MO

Talk video: https://www.youtube.com/watch?v=ZGIAypUUwoQ
More information: http://bailis.org/

n many areas of systems design, provisioning for worst-case behavior (e.g., load spikes and anomalous user activity) incurs sizable penalties (e.g., performance and operational overheads) in the typical and best cases. However, in distributed systems, building software that is resilient to worse-case network behavior can -- perhaps paradoxically -- lead to improved behavior in typical and best-case scenarios. That is, systems that don't rely on synchronous communication (or coordination) in the worst case frequently aren't forced to wait in any case -- improving latency, scalability, and performance via increased concurrency.

In this talk, we'll explore how to use this worst-case analysis as a more general design principle for scalable systems design. As developers increasingly interacting with and building our own distributed systems, we tend to fixate only on failure scenarios (e.g., "partition tolerance" in the CAP Theorem); this is an important first step, but it's not the whole story. To illustrate why, I'll present practical lessons learned from applying this principle to both web and transaction processing applications as well as database internals such as integrity constraints and indexes. We've found considerable evidence that many of these common tasks and workloads can benefit substantially (e.g., regular order-of-magnitude speedups) from this analysis. In all likelihood, you can too.

B7dc26518988058faa50712248c80bd3?s=128

pbailis

September 25, 2015
Tweet

Transcript

  1. WHEN WORST IS BEST Peter Bailis Stanford CS @pbailis in

    distributed systems design StrangeLoop 2015 25 September, St. Louis
  2. What if we designed computer systems for worst case scenarios?

  3. Cluster provisioning: 7.3B simultaneous users many idle resources! What if

    we designed computer systems for worst case scenarios?
  4. Cluster provisioning: 7.3B simultaneous users many idle resources! Hardware: chips

    for the next Mars rover hugely expensive packaging! What if we designed computer systems for worst case scenarios?
  5. Cluster provisioning: 7.3B simultaneous users many idle resources! Hardware: chips

    for the next Mars rover hugely expensive packaging! Security: all our developers are malicious expensive code deployment! What if we designed computer systems for worst case scenarios?
  6. Designing for the worst case often penalizes the average case

  7. Designing for the worst case often penalizes the average case

    Average case performance Worst case performance
  8. Designing for the worst case often penalizes the average case

    Average case performance Worst case performance ??? this talk
  9. This talk: When can designing for the worst case improve

    the average case? Structure Distributed systems and the network Beyond the network Lessons
  10. This talk: When can designing for the worst case improve

    the average case? Structure Distributed systems and the network Beyond the network Lessons
  11. Almost every non-trivial application today is (or is becoming) distributed

    Distributed Systems Matter Distribution happens over a network
  12. Almost every non-trivial application today is (or is becoming) distributed

    Corollary: Almost every non-trivial application today needs to worry about the network Distributed Systems Matter Distribution happens over a network
  13. Networks make design hard Many things can go wrong:

  14. Networks make design hard Many things can go wrong: Packets

    may be delayed Packets may be dropped Sometimes called an asynchronous network
  15. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  16. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  17. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  18. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  19. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  20. any replica can respond to any request Handling Worst-Case Net

    Behavior availability addresses delays, drops:
  21. any replica can respond to any request Handling Worst-Case Net

    Behavior if our system is available, then even when network is fine, we still don’t have to talk! availability addresses delays, drops:
  22. any replica can respond to any request Handling Worst-Case Net

    Behavior if our system is available, then even when network is fine, we still don’t have to talk! NO COORDINATION availability addresses delays, drops:
  23. Coordination-free systems What if we don’t have to talk?

  24. Coordination-free systems: 1.) Enable infinite scale-out What if we don’t

    have to talk?
  25. Coordination-free systems: 1.) Enable infinite scale-out What if we don’t

    have to talk?
  26. Coordination-free systems: 1.) Enable infinite scale-out What if we don’t

    have to talk?
  27. Coordination-free systems: 1.) Enable infinite scale-out What if we don’t

    have to talk?
  28. Coordination-free systems: 1.) Enable infinite scale-out What if we don’t

    have to talk?
  29. A B C D E F G H DISTRIBUTED TRANSACTIONS

    (EC2) 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) Number of Servers (Items) Accessed per Transaction Number of Servers (Items) Accessed per Transaction
  30. A B C D E F G H IN-MEMORY LOCKING

    COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) Number of Servers (Items) Accessed per Transaction Number of Servers (Items) Accessed per Transaction
  31. A B C D E F G H IN-MEMORY LOCKING

    COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) LOG SCALE! -398x Number of Servers (Items) Accessed per Transaction Number of Servers (Items) Accessed per Transaction
  32. A B C D E F G H IN-MEMORY LOCKING

    1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE DISTRIBUTED TRANSACTIONS (EC2) -398x Number of Servers (Items) Accessed per Transaction
  33. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput What

    if we don’t have to talk?
  34. 133.7+ ms RTT

  35. 133.7+ ms RTT

  36. 133.7+ ms RTT 85.1+ ms RTT

  37. What if we don’t have to talk? Coordination-free systems: 1.)

    Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on" response
  38. What if we don’t have to talk? Coordination-free systems: 1.)

    Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on" response
  39. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on" response What if we don’t have to talk?
  40. But wait! What about CAP?!?! • CAP Thm.: Famous result

    from Eric Brewer, Inktomi • Takeaway (+ related results): properties like serializability require unavailability (or require coordination) • Common (incorrect) conclusion: availability is too expensive, only matters during failures, so forget about it
  41. But wait! What about CAP?!?! • CAP Thm.: Famous result

    from Eric Brewer, Inktomi • Takeaway (+ related results): properties like serializability require unavailability (or require coordination) • Common (incorrect) conclusion: availability is too expensive, only matters during failures, so forget about it surprise: many useful guarantees don’t require coordination (or unavailability)!
  42. “Worst” is a Design Tool legacy implementations: designed for single-

    node context, use coordination research question: what if we built systems that didn’t have to coordinate? result: new designs that avoid coordination unless strictly necessary Example: Coordination-Avoiding Databases
  43. Simple Example: Read Committed legacy implementation: lock records during access

    research question: is coordination necessary? goal: never read from uncommitted transactions
  44. Simple Example: Read Committed legacy implementation: lock records during access

    research question: is coordination necessary? result: no! for example, buffer writes until commit result: OOM speedups over classic implementations goal: never read from uncommitted transactions VLDB 2014, SIGMOD 2015
  45. What if we don’t have to talk? Coordination-free systems: 1.)

    Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on" response
  46. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on" response What if we don’t have to talk?
  47. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on" response What if we don’t have to talk? Accounting for worst case improves average case
  48. Punchline: Distributed Systems & Networks • Systems that behave well

    during network faults can behave better in non-faulty environments too • With good designs, popular guarantees from today’s RDBMSs can benefit! (see also Martin’s talk, 11AM Sat) • Research on coordination-avoiding systems highlights potential for huge speedups (see bailis.org) • Keywords: CRDTs, I-confluence, RAMP, HAT, Bloom^L
  49. This talk: When can designing for the worst case improve

    the average case? Structure Distributed systems and the network Beyond the network Lessons
  50. Replication for fault tolerance can increase request capacity Replication helps

    Capacity
  51. Fail-over helps (Dev)Ops

  52. Fail-over helps (Dev)Ops

  53. If services can auto-fail-over… can kill processes: to perform upgrades

    to manage stragglers to revoke resources Fail-over helps (Dev)Ops
  54. 99.9th %ile latency: 100ms avg latency: 1.2ms YOUR SERVICE HERE

    Tail Latency in (Micro)services
  55. 99.9th %ile latency: 100ms avg latency: 1.2ms YOUR SERVICE HERE

    10ms Tail Latency in (Micro)services
  56. 99.9th %ile latency: 100ms avg latency: 1.2ms YOUR SERVICE HERE

    10ms 1.09ms Tail Latency in (Micro)services
  57. 99.9th %ile latency: 100ms Tail Latency in (Micro)services

  58. front-end avg. latency: 64ms at 100x fan-out, 99.9th %ile latency:

    100ms Tail Latency in (Micro)services
  59. front-end avg. latency: 64ms at 100x fan-out, 99.9th %ile latency:

    100ms 10ms Tail Latency in (Micro)services
  60. front-end avg. latency: 64ms at 100x fan-out, 6.7ms 99.9th %ile

    latency: 100ms 10ms Tail Latency in (Micro)services
  61. YOUR SERVICE’S CORNER CASE MAY BE ITS CONSUMER’S AVERAGE CASE

  62. Universal Design

  63. Universal Design

  64. None
  65. There is also a strong business case for accessibility. Accessibility

    overlaps with other best practices such as mobile web design, device independence, multi-modal interaction, usability, design for older users, and search engine optimization (SEO). Case studies show that accessible websites have better search results, reduced maintenance costs, and increased audience reach, among other benefits.
  66. x f(x) When “Best” Is Brittle Idealized function Optimum

  67. x f(x) When “Best” Is Brittle Idealized function Optimum Less

    well-behaved x f(x) Optimum
  68. x f(x) When “Best” Is Brittle Idealized function Optimum Less

    well-behaved x f(x) Optimum Missed the target
  69. x f(x) When “Best” Is Brittle Idealized function Optimum Less

    well-behaved x f(x) Optimum Missed the target “Stable” solution
  70. x f(x) When “Best” Is Brittle Idealized function Optimum Less

    well-behaved x f(x) Optimum Missed the target “Stable” solution Robust Optimization studies finding the stable solution
  71. This talk: When can designing for the worst case improve

    the average case? Structure Distributed systems and the network Beyond the network Lessons
  72. This talk: When can designing for the worst case improve

    the average case?
  73. When does this apply? When corner cases are common When

    environmental conditions are variable When “normal” isn’t what we think This talk: When can designing for the worst case improve the average case?
  74. DEFINING “NORMAL” DEFINES OUR DESIGNS

  75. “Worst” raises tough questions

  76. Cluster provisioning: what’s our scale-out strategy? “Worst” raises tough questions

  77. Cluster provisioning: what’s our scale-out strategy? Hardware: what happens during

    bit flips? do we need ECC? “Worst” raises tough questions
  78. Cluster provisioning: what’s our scale-out strategy? Hardware: what happens during

    bit flips? do we need ECC? Security: how to do we manage internal data accesses? “Worst” raises tough questions
  79. EXAMINE YOUR BIASES

  80. Reasoning about worst-case scenarios can be a powerful design tool

    Key to coordination avoiding distributed systems designs Can often improve performance and robustness, also combat bias @PBAILIS // bailis.org
  81. Special thanks to David Andersen, Ali Ghodsi, Joe Hellerstein, Eddie

    Kohler, Phil Levis, Alex Miller, Oscar Moll, Barzan Mozafari, Ion Stoica, Eugene Wu, Jean Yang, Matei Zaharia
  82. Reasoning about worst-case scenarios can be a powerful design tool

    Key to coordination avoiding distributed systems designs Can often improve performance and robustness, also combat bias @PBAILIS // bailis.org