Building Adaptive Systems

Building Adaptive Systems

06f8b41980eb4c577fa40c41d5030c19?s=128

Chris Keathley

May 28, 2020
Tweet

Transcript

  1. Chris Keathley / @ChrisKeathley / c@keathley.io Building Adaptive Systems

  2. Server Server

  3. Server Server I have a request

  4. Server Server

  5. Server Server

  6. Server Server No Problem!

  7. Server Server

  8. Server Server Thanks!

  9. Server Server

  10. Server Server I have a request

  11. Server Server

  12. Server Server

  13. Server Server I’m a little busy

  14. Server Server I’m a little busy I have more requests!

  15. Server Server I’m a little busy I have more requests!

  16. Server Server I’m a little busy I have more requests!

  17. Server Server I’m a little busy I have more requests!

  18. Server Server I’m a little busy I have more requests!

  19. Server Server I’m a little busy I have more requests!

  20. Server Server I’m a little busy I have more requests!

  21. Server Server I’m a little busy I have more requests!

  22. Server Server I don’t feel so good

  23. Server

  24. Server Welp

  25. Server Welp

  26. All services have objectives

  27. A resilient service should be able to withstand a 10x

    traffic spike and continue to meet those objectives
  28. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  29. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  30. What causes overload?

  31. What causes overload? Server Queue

  32. What causes overload? Server Queue Processing Time Arrival Rate >

  33. Little’s Law Elements in the queue = Arrival Rate *

    Processing Time
  34. Little’s Law Server 1 requests = 10 rps * 100

    ms 100ms
  35. Little’s Law Server 1 requests = 10 rps * 100

    ms 100ms
  36. Little’s Law Server 1 requests = 10 rps * 100

    ms 100ms
  37. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms
  38. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms
  39. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms
  40. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms
  41. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms
  42. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms BEAM Processes
  43. Little’s Law Server 2 requests = 10 rps * 200

    ms 200ms BEAM Processes CPU Pressure
  44. Little’s Law Server 3 requests = 10 rps * 300

    ms 300ms BEAM Processes CPU Pressure
  45. Little’s Law Server 30 requests = 10 rps * 3000

    ms 3000ms BEAM Processes CPU Pressure
  46. Little’s Law Server 30 requests = 10 rps * ∞

    ms ∞ BEAM Processes CPU Pressure
  47. Little’s Law 30 requests = 10 rps * ∞ ms

  48. Little’s Law ∞ requests = 10 rps * ∞ ms

  49. Little’s Law ∞ requests = 10 rps * ∞ ms

    This is bad
  50. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  51. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  52. Overload Arrival Rate > Processing Time

  53. Overload Arrival Rate > Processing Time We need to get

    these under control
  54. Load Shedding Server Queue Server

  55. Load Shedding Server Queue Server Drop requests

  56. Load Shedding Server Queue Server Drop requests Stop sending

  57. Autoscaling

  58. Autoscaling

  59. Autoscaling Server DB Server

  60. Autoscaling Server DB Server Requests start queueing

  61. Autoscaling Server DB Server Server

  62. Autoscaling Server DB Server Server Now its worse

  63. Autoscaling needs to be in response to load shedding

  64. Circuit Breakers

  65. Circuit Breakers

  66. Circuit Breakers Server Server

  67. Circuit Breakers Server Server

  68. Circuit Breakers Server Server Shut off traffic

  69. Circuit Breakers Server Server

  70. Circuit Breakers Server Server I’m not quite dead yet

  71. Circuit Breakers are your last line of defense

  72. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  73. Lets Talk About… Queues Overload Mitigation Adaptive Concurrency

  74. We want to allow as many requests as we can

    actually handle
  75. None
  76. Adaptive Limits Time Concurrency

  77. Adaptive Limits Actual limit Time Concurrency

  78. Adaptive Limits Actual limit Dynamic Discovery Time Concurrency

  79. Load Shedding Server Server

  80. Load Shedding Server Server Are we at the limit?

  81. Load Shedding Server Server Am I still healthy?

  82. Load Shedding Server Server

  83. Load Shedding Server Server Update Limits

  84. Adaptive Limits Time Concurrency Increased latency

  85. Latency Successful vs. Failed requests Signals for Adjusting Limits

  86. Additive Increase Multiplicative Decrease Success state: limit + 1 Backoff

    state: limit * 0.95 Time Concurrency
  87. Prior Art/Alternatives https://github.com/ferd/pobox/ https://github.com/fishcakez/sbroker/ https://github.com/heroku/canal_lock https://github.com/jlouis/safetyvalve https://github.com/jlouis/fuse

  88. Regulator https://github.com/keathley/regulator

  89. Regulator.install(:service, [ limit: {Regulator.Limit.AIMD, [timeout: 500]} ]) Regulator.ask(:service, fn ->

    {:ok, Finch.request(:get, "https://keathley.io")} end) Regulator
  90. Conclusion

  91. Queues are everywhere

  92. Those queues need to be bounded to avoid overload

  93. If your system is dynamic, your solution will also need

    to be dynamic
  94. Go and build awesome stuff

  95. Thanks Chris Keathley / @ChrisKeathley / c@keathley.io