WHEN
WORST
IS BEST Peter Bailis
Stanford CS
@pbailis
in distributed
systems
design StrangeLoop 2015
25 September, St. Louis
Slide 2
Slide 2 text
What if we designed computer
systems for worst case scenarios?
Slide 3
Slide 3 text
Cluster provisioning: 7.3B simultaneous users
many idle resources!
What if we designed computer
systems for worst case scenarios?
Slide 4
Slide 4 text
Cluster provisioning: 7.3B simultaneous users
many idle resources!
Hardware: chips for the next Mars rover
hugely expensive packaging!
What if we designed computer
systems for worst case scenarios?
Slide 5
Slide 5 text
Cluster provisioning: 7.3B simultaneous users
many idle resources!
Hardware: chips for the next Mars rover
hugely expensive packaging!
Security: all our developers are malicious
expensive code deployment!
What if we designed computer
systems for worst case scenarios?
Slide 6
Slide 6 text
Designing for
the worst case
often penalizes
the average case
Slide 7
Slide 7 text
Designing for the worst case
often penalizes the average case
Average
case
performance
Worst case performance
Slide 8
Slide 8 text
Designing for the worst case
often penalizes the average case
Average
case
performance
Worst case performance
???
this talk
Slide 9
Slide 9 text
This talk: When can designing for
the worst case improve the
average case?
Structure
Distributed systems and the network
Beyond the network
Lessons
Slide 10
Slide 10 text
This talk: When can designing for
the worst case improve the
average case?
Structure
Distributed systems and the network
Beyond the network
Lessons
Slide 11
Slide 11 text
Almost every non-trivial application today is (or is
becoming) distributed
Distributed Systems Matter
Distribution happens over a network
Slide 12
Slide 12 text
Almost every non-trivial application today is (or is
becoming) distributed
Corollary:
Almost every non-trivial application today needs to
worry about the network
Distributed Systems Matter
Distribution happens over a network
Slide 13
Slide 13 text
Networks make design hard
Many things can go wrong:
Slide 14
Slide 14 text
Networks make design hard
Many things can go wrong:
Packets may be delayed
Packets may be dropped
Sometimes called an asynchronous network
Slide 15
Slide 15 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 16
Slide 16 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 17
Slide 17 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 18
Slide 18 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 19
Slide 19 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 20
Slide 20 text
any replica can respond to any request
Handling Worst-Case Net Behavior
availability addresses delays, drops:
Slide 21
Slide 21 text
any replica can respond to any request
Handling Worst-Case Net Behavior
if our system is available,
then even when network is fine,
we still don’t have to talk!
availability addresses delays, drops:
Slide 22
Slide 22 text
any replica can respond to any request
Handling Worst-Case Net Behavior
if our system is available,
then even when network is fine,
we still don’t have to talk!
NO
COORDINATION
availability addresses delays, drops:
Slide 23
Slide 23 text
Coordination-free systems
What if we don’t have to talk?
Slide 24
Slide 24 text
Coordination-free systems:
1.) Enable infinite scale-out
What if we don’t have to talk?
Slide 25
Slide 25 text
Coordination-free systems:
1.) Enable infinite scale-out
What if we don’t have to talk?
Slide 26
Slide 26 text
Coordination-free systems:
1.) Enable infinite scale-out
What if we don’t have to talk?
Slide 27
Slide 27 text
Coordination-free systems:
1.) Enable infinite scale-out
What if we don’t have to talk?
Slide 28
Slide 28 text
Coordination-free systems:
1.) Enable infinite scale-out
What if we don’t have to talk?
Slide 29
Slide 29 text
A B C D E F G H
DISTRIBUTED TRANSACTIONS (EC2)
1 2 3 4 5 6 7
Number of Items per Transaction
Throughput (txns/s)
Number of Servers (Items) Accessed per Transaction
Number of Servers (Items) Accessed per Transaction
Slide 30
Slide 30 text
A B C D E F G H
IN-MEMORY
LOCKING
COORDINATED
1 2 3 4 5 6 7
Number of Items per Transaction
Throughput (txns/s)
DISTRIBUTED TRANSACTIONS (EC2)
Number of Servers (Items) Accessed per Transaction
Number of Servers (Items) Accessed per Transaction
Slide 31
Slide 31 text
A B C D E F G H
IN-MEMORY
LOCKING
COORDINATED
1 2 3 4 5 6 7
Number of Items per Transaction
Throughput (txns/s)
DISTRIBUTED TRANSACTIONS (EC2)
LOG SCALE!
-398x
Number of Servers (Items) Accessed per Transaction
Number of Servers (Items) Accessed per Transaction
Slide 32
Slide 32 text
A B C D E F G H
IN-MEMORY
LOCKING
1 2 3 4 5 6 7
Number of Items per Transaction
Throughput (txns/s)
COORDINATED
COORDINATION-FREE
DISTRIBUTED TRANSACTIONS (EC2)
-398x
Number of Servers (Items) Accessed per Transaction
Slide 33
Slide 33 text
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
What if we don’t have to talk?
Slide 34
Slide 34 text
133.7+ ms
RTT
Slide 35
Slide 35 text
133.7+ ms
RTT
Slide 36
Slide 36 text
133.7+ ms
RTT
85.1+ ms
RTT
Slide 37
Slide 37 text
What if we don’t have to talk?
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
Slide 38
Slide 38 text
What if we don’t have to talk?
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
Slide 39
Slide 39 text
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
What if we don’t have to talk?
Slide 40
Slide 40 text
But wait! What about CAP?!?!
• CAP Thm.: Famous result from Eric Brewer, Inktomi
• Takeaway (+ related results): properties like serializability
require unavailability (or require coordination)
• Common (incorrect) conclusion: availability is too
expensive, only matters during failures, so forget about it
Slide 41
Slide 41 text
But wait! What about CAP?!?!
• CAP Thm.: Famous result from Eric Brewer, Inktomi
• Takeaway (+ related results): properties like serializability
require unavailability (or require coordination)
• Common (incorrect) conclusion: availability is too
expensive, only matters during failures, so forget about it
surprise: many useful guarantees don’t
require coordination (or unavailability)!
Slide 42
Slide 42 text
“Worst” is a Design Tool
legacy implementations: designed for single-
node context, use coordination
research question: what if we built systems that
didn’t have to coordinate?
result: new designs that avoid coordination unless
strictly necessary
Example: Coordination-Avoiding Databases
Slide 43
Slide 43 text
Simple Example: Read Committed
legacy implementation: lock records during access
research question: is coordination necessary?
goal: never read from uncommitted transactions
Slide 44
Slide 44 text
Simple Example: Read Committed
legacy implementation: lock records during access
research question: is coordination necessary?
result: no! for example, buffer writes until commit
result: OOM speedups over classic implementations
goal: never read from uncommitted transactions
VLDB 2014, SIGMOD 2015
Slide 45
Slide 45 text
What if we don’t have to talk?
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
Slide 46
Slide 46 text
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
What if we don’t have to talk?
Slide 47
Slide 47 text
Coordination-free systems:
1.) Enable infinite scale-out
2.) Improve throughput
3.) Ensure low latency
4.) Guarantee “always on" response
What if we don’t have to talk?
Accounting for worst case improves average case
Slide 48
Slide 48 text
Punchline: Distributed Systems & Networks
• Systems that behave well during network faults can
behave better in non-faulty environments too
• With good designs, popular guarantees from today’s
RDBMSs can benefit! (see also Martin’s talk, 11AM Sat)
• Research on coordination-avoiding systems highlights
potential for huge speedups (see bailis.org)
• Keywords: CRDTs, I-confluence, RAMP, HAT, Bloom^L
Slide 49
Slide 49 text
This talk: When can designing for
the worst case improve the
average case?
Structure
Distributed systems and the network
Beyond the network
Lessons
Slide 50
Slide 50 text
Replication for fault tolerance
can increase request capacity
Replication helps Capacity
Slide 51
Slide 51 text
Fail-over helps (Dev)Ops
Slide 52
Slide 52 text
Fail-over helps (Dev)Ops
Slide 53
Slide 53 text
If services can
auto-fail-over…
can kill processes:
to perform upgrades
to manage stragglers
to revoke resources
Fail-over helps (Dev)Ops
Slide 54
Slide 54 text
99.9th %ile latency: 100ms
avg latency: 1.2ms
YOUR
SERVICE
HERE
Tail Latency in (Micro)services
Slide 55
Slide 55 text
99.9th %ile latency: 100ms
avg latency: 1.2ms
YOUR
SERVICE
HERE
10ms
Tail Latency in (Micro)services
Slide 56
Slide 56 text
99.9th %ile latency: 100ms
avg latency: 1.2ms
YOUR
SERVICE
HERE
10ms
1.09ms
Tail Latency in (Micro)services
Slide 57
Slide 57 text
99.9th %ile latency: 100ms
Tail Latency in (Micro)services
Slide 58
Slide 58 text
front-end avg. latency: 64ms
at 100x fan-out,
99.9th %ile latency: 100ms
Tail Latency in (Micro)services
Slide 59
Slide 59 text
front-end avg. latency: 64ms
at 100x fan-out,
99.9th %ile latency: 100ms 10ms
Tail Latency in (Micro)services
Slide 60
Slide 60 text
front-end avg. latency: 64ms
at 100x fan-out,
6.7ms
99.9th %ile latency: 100ms 10ms
Tail Latency in (Micro)services
Slide 61
Slide 61 text
YOUR SERVICE’S
CORNER CASE
MAY BE ITS
CONSUMER’S
AVERAGE CASE
Slide 62
Slide 62 text
Universal Design
Slide 63
Slide 63 text
Universal Design
Slide 64
Slide 64 text
No content
Slide 65
Slide 65 text
There is also a strong business case for accessibility.
Accessibility overlaps with other best practices such as mobile web
design, device independence, multi-modal interaction, usability,
design for older users, and search engine optimization (SEO).
Case studies show that accessible websites have better search
results, reduced maintenance costs, and increased audience reach,
among other benefits.
Slide 66
Slide 66 text
x
f(x)
When “Best” Is Brittle
Idealized function
Optimum
Slide 67
Slide 67 text
x
f(x)
When “Best” Is Brittle
Idealized function
Optimum
Less well-behaved
x
f(x)
Optimum
Slide 68
Slide 68 text
x
f(x)
When “Best” Is Brittle
Idealized function
Optimum
Less well-behaved
x
f(x)
Optimum
Missed the target
Slide 69
Slide 69 text
x
f(x)
When “Best” Is Brittle
Idealized function
Optimum
Less well-behaved
x
f(x)
Optimum
Missed the target
“Stable”
solution
Slide 70
Slide 70 text
x
f(x)
When “Best” Is Brittle
Idealized function
Optimum
Less well-behaved
x
f(x)
Optimum
Missed the target
“Stable”
solution
Robust Optimization studies finding the stable solution
Slide 71
Slide 71 text
This talk: When can designing for
the worst case improve the
average case?
Structure
Distributed systems and the network
Beyond the network
Lessons
Slide 72
Slide 72 text
This talk: When can designing for
the worst case improve the
average case?
Slide 73
Slide 73 text
When does this apply?
When corner cases are common
When environmental conditions are variable
When “normal” isn’t what we think
This talk: When can designing for
the worst case improve the
average case?
Cluster provisioning:
what’s our scale-out strategy?
Hardware:
what happens during bit flips? do we need ECC?
“Worst” raises tough questions
Slide 78
Slide 78 text
Cluster provisioning:
what’s our scale-out strategy?
Hardware:
what happens during bit flips? do we need ECC?
Security:
how to do we manage internal data accesses?
“Worst” raises tough questions
Slide 79
Slide 79 text
EXAMINE
YOUR
BIASES
Slide 80
Slide 80 text
Reasoning about worst-case scenarios
can be a powerful design tool
Key to coordination avoiding distributed
systems designs
Can often improve performance and
robustness, also combat bias
@PBAILIS // bailis.org
Slide 81
Slide 81 text
Special thanks to
David Andersen, Ali Ghodsi, Joe Hellerstein, Eddie Kohler,
Phil Levis, Alex Miller, Oscar Moll, Barzan Mozafari, Ion
Stoica, Eugene Wu, Jean Yang, Matei Zaharia
Slide 82
Slide 82 text
Reasoning about worst-case scenarios
can be a powerful design tool
Key to coordination avoiding distributed
systems designs
Can often improve performance and
robustness, also combat bias
@PBAILIS // bailis.org