Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Chris Keathley / @ChrisKeathley /
[email protected]
Building Adaptive Systems
Slide 2
Slide 2 text
Server Server
Slide 3
Slide 3 text
Server Server I have a request
Slide 4
Slide 4 text
Server Server
Slide 5
Slide 5 text
Server Server
Slide 6
Slide 6 text
Server Server No Problem!
Slide 7
Slide 7 text
Server Server
Slide 8
Slide 8 text
Server Server Thanks!
Slide 9
Slide 9 text
Server Server
Slide 10
Slide 10 text
Server Server I have a request
Slide 11
Slide 11 text
Server Server
Slide 12
Slide 12 text
Server Server
Slide 13
Slide 13 text
Server Server I’m a little busy
Slide 14
Slide 14 text
Server Server I’m a little busy I have more requests!
Slide 15
Slide 15 text
Server Server I’m a little busy I have more requests!
Slide 16
Slide 16 text
Server Server I’m a little busy I have more requests!
Slide 17
Slide 17 text
Server Server I’m a little busy I have more requests!
Slide 18
Slide 18 text
Server Server I’m a little busy I have more requests!
Slide 19
Slide 19 text
Server Server I’m a little busy I have more requests!
Slide 20
Slide 20 text
Server Server I’m a little busy I have more requests!
Slide 21
Slide 21 text
Server Server I’m a little busy I have more requests!
Slide 22
Slide 22 text
Server Server I don’t feel so good
Slide 23
Slide 23 text
Server
Slide 24
Slide 24 text
Server Welp
Slide 25
Slide 25 text
Server Welp
Slide 26
Slide 26 text
All services have objectives
Slide 27
Slide 27 text
A resilient service should be able to withstand a 10x traffic spike and continue to meet those objectives
Slide 28
Slide 28 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 29
Slide 29 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 30
Slide 30 text
What causes overload?
Slide 31
Slide 31 text
What causes overload? Server Queue
Slide 32
Slide 32 text
What causes overload? Server Queue Processing Time Arrival Rate >
Slide 33
Slide 33 text
Little’s Law Elements in the queue = Arrival Rate * Processing Time
Slide 34
Slide 34 text
Little’s Law Server 1 requests = 10 rps * 100 ms 100ms
Slide 35
Slide 35 text
Little’s Law Server 1 requests = 10 rps * 100 ms 100ms
Slide 36
Slide 36 text
Little’s Law Server 1 requests = 10 rps * 100 ms 100ms
Slide 37
Slide 37 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms
Slide 38
Slide 38 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms
Slide 39
Slide 39 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms
Slide 40
Slide 40 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms
Slide 41
Slide 41 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms
Slide 42
Slide 42 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms BEAM Processes
Slide 43
Slide 43 text
Little’s Law Server 2 requests = 10 rps * 200 ms 200ms BEAM Processes CPU Pressure
Slide 44
Slide 44 text
Little’s Law Server 3 requests = 10 rps * 300 ms 300ms BEAM Processes CPU Pressure
Slide 45
Slide 45 text
Little’s Law Server 30 requests = 10 rps * 3000 ms 3000ms BEAM Processes CPU Pressure
Slide 46
Slide 46 text
Little’s Law Server 30 requests = 10 rps * ∞ ms ∞ BEAM Processes CPU Pressure
Slide 47
Slide 47 text
Little’s Law 30 requests = 10 rps * ∞ ms
Slide 48
Slide 48 text
Little’s Law ∞ requests = 10 rps * ∞ ms
Slide 49
Slide 49 text
Little’s Law ∞ requests = 10 rps * ∞ ms This is bad
Slide 50
Slide 50 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 51
Slide 51 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 52
Slide 52 text
Overload Arrival Rate > Processing Time
Slide 53
Slide 53 text
Overload Arrival Rate > Processing Time We need to get these under control
Slide 54
Slide 54 text
Load Shedding Server Queue Server
Slide 55
Slide 55 text
Load Shedding Server Queue Server Drop requests
Slide 56
Slide 56 text
Load Shedding Server Queue Server Drop requests Stop sending
Slide 57
Slide 57 text
Autoscaling
Slide 58
Slide 58 text
Autoscaling
Slide 59
Slide 59 text
Autoscaling Server DB Server
Slide 60
Slide 60 text
Autoscaling Server DB Server Requests start queueing
Slide 61
Slide 61 text
Autoscaling Server DB Server Server
Slide 62
Slide 62 text
Autoscaling Server DB Server Server Now its worse
Slide 63
Slide 63 text
Autoscaling needs to be in response to load shedding
Slide 64
Slide 64 text
Circuit Breakers
Slide 65
Slide 65 text
Circuit Breakers
Slide 66
Slide 66 text
Circuit Breakers Server Server
Slide 67
Slide 67 text
Circuit Breakers Server Server
Slide 68
Slide 68 text
Circuit Breakers Server Server Shut off traffic
Slide 69
Slide 69 text
Circuit Breakers Server Server
Slide 70
Slide 70 text
Circuit Breakers Server Server I’m not quite dead yet
Slide 71
Slide 71 text
Circuit Breakers are your last line of defense
Slide 72
Slide 72 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 73
Slide 73 text
Lets Talk About… Queues Overload Mitigation Adaptive Concurrency
Slide 74
Slide 74 text
We want to allow as many requests as we can actually handle
Slide 75
Slide 75 text
No content
Slide 76
Slide 76 text
Adaptive Limits Time Concurrency
Slide 77
Slide 77 text
Adaptive Limits Actual limit Time Concurrency
Slide 78
Slide 78 text
Adaptive Limits Actual limit Dynamic Discovery Time Concurrency
Slide 79
Slide 79 text
Load Shedding Server Server
Slide 80
Slide 80 text
Load Shedding Server Server Are we at the limit?
Slide 81
Slide 81 text
Load Shedding Server Server Am I still healthy?
Slide 82
Slide 82 text
Load Shedding Server Server
Slide 83
Slide 83 text
Load Shedding Server Server Update Limits
Slide 84
Slide 84 text
Adaptive Limits Time Concurrency Increased latency
Slide 85
Slide 85 text
Latency Successful vs. Failed requests Signals for Adjusting Limits
Slide 86
Slide 86 text
Additive Increase Multiplicative Decrease Success state: limit + 1 Backoff state: limit * 0.95 Time Concurrency
Slide 87
Slide 87 text
Prior Art/Alternatives https://github.com/ferd/pobox/ https://github.com/fishcakez/sbroker/ https://github.com/heroku/canal_lock https://github.com/jlouis/safetyvalve https://github.com/jlouis/fuse
Slide 88
Slide 88 text
Regulator https://github.com/keathley/regulator
Slide 89
Slide 89 text
Regulator.install(:service, [ limit: {Regulator.Limit.AIMD, [timeout: 500]} ]) Regulator.ask(:service, fn -> {:ok, Finch.request(:get, "https://keathley.io")} end) Regulator
Slide 90
Slide 90 text
Conclusion
Slide 91
Slide 91 text
Queues are everywhere
Slide 92
Slide 92 text
Those queues need to be bounded to avoid overload
Slide 93
Slide 93 text
If your system is dynamic, your solution will also need to be dynamic
Slide 94
Slide 94 text
Go and build awesome stuff
Slide 95
Slide 95 text
Thanks Chris Keathley / @ChrisKeathley /
[email protected]