Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handing Failure in Microservice Architectures

67f4a8f2a209a38d7242829947b26ba3?s=47 mattheath
January 15, 2016

Handing Failure in Microservice Architectures

Presented at NDC London on 15th January 2016

Microservice architectures allow us to decompose domain logic into small services with a bounded context, which allows us to gain simplicity within services at the expense of complexity in the interactions between services.

However any distributed system operating at scale will experience failure, and this interaction complexity makes dealing with failure harder. This is especially important when requests may traverse many systems, and failures of a single component may cascade through several more. In this talk we look at a number of common patterns from simple usage of concurrency primitives and timeouts to control and throttle concurrency, to more complex patterns such as the CircuitBreaker which can be used to prevent cascading failures; increasing the reliability of our systems.

67f4a8f2a209a38d7242829947b26ba3?s=128

mattheath

January 15, 2016
Tweet

More Decks by mattheath

Other Decks in Programming

Transcript

  1. Handling Failure in Microservice Architectures Matt Heath, Mondo #ndclondon

  2. @mattheath

  3. None
  4. None
  5. 1895

  6. monoliths traditional dev

  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. None
  16. ?

  17. DATABASE APPLICATION

  18. DATABASE APPLICATION

  19. DATABASE DATABASES APPLICATION

  20. DATABASE DATABASES APPLICATION SEARCH

  21. DATABASE DATABASES APPLICATION CACHE SEARCH

  22. DATABASE DATABASES APPLICATION CACHE SEARCH CAT GIFS

  23. ALL HAIL THE MONOLITH

  24. DATABASE DATABASES APPLICATION CACHE SEARCH CAT GIFS

  25. APPLICATION

  26. None
  27. None
  28. Are your systems reliable?

  29. 2012 June - RBS - Batch processing causes 3 day

    outage 2013 December - RBS - Card payments, cash withdrawals 2015 June - RBS lose 600,000 payments 2015 August - HSBC lose 275,000 payments 2015 October - Barclays - Failure of accounts and cards 2016 January - HSBC - 2 day outage of online systems
  30. Graceful handling of failure is essential

  31. Do microservices make this worse?

  32. The Fallacies of Distributed Computing

  33. Everything can and will fail

  34. Identifying Failure

  35. LOAD BALANCER HTTP API & ROUTING LAYER

  36. API SERVICE LOAD BALANCER HTTP API & ROUTING LAYER

  37. None
  38. /webhooks —-> Webhook API

  39. WEBHOOK API LOAD BALANCER HTTP API & ROUTING LAYER

  40. WEBHOOK API AUTH SERVICE WEBHOOK SERVICE LOAD BALANCER HTTP API

    & ROUTING LAYER
  41. WEBHOOK API AUTH SERVICE WEBHOOK SERVICE LOAD BALANCER HTTP API

    & ROUTING LAYER
  42. WEBHOOK API AUTH SERVICE WEBHOOK SERVICE LOAD BALANCER HTTP API

    & ROUTING LAYER
  43. WEBHOOK API AUTH SERVICE WEBHOOK SERVICE LOAD BALANCER HTTP API

    & ROUTING LAYER
  44. Error Tracking and Propagation

  45. api webhook api webhook service api webhook api webhook service

  46. api webhook api webhook service api webhook api webhook service

  47. api webhook api webhook service api webhook api webhook service

    error
  48. api webhook api webhook service api webhook api webhook service

    error SENTRY
  49. None
  50. api webhook api webhook service api webhook api webhook service

    error SENTRY
  51. api webhook api webhook service api webhook api webhook service

    error SENTRY error partition
  52. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error
  53. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error timeout
  54. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error timeout error timeout
  55. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error duplicated errors
  56. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error
  57. 8096820c-3b7b-47ec-bce6-1c239252ab40

  58. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error
  59. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error
  60. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error
  61. api webhook api webhook service api webhook api webhook service

    ????? SENTRY error error error error hash & deduplicate
  62. None
  63. None
  64. Service Healthchecks

  65. type Checker func() (error, map[string]string)

  66. None
  67. None
  68. Endpoint error % DB Connection Status Configuration loaded?

  69. Instrumentation

  70. None
  71. None
  72. metrics.Counter(1.0, "cassandra.read.error", 1)

  73. metrics.Timing(1.0, "cassandra.read", time.Since(start))

  74. STATSD SERVICE A UDP SERVICE B UDP HOST INSTANCES GRAFANA

    INFLUXDB METRICS
  75. TELEGRAF w/ STATSD
 PLUGIN SERVICE A UDP SERVICE B UDP

    HOST INSTANCES GRAFANA INFLUXDB TAGGED METRICS
  76. None
  77. Handling
 Failure

  78. Timing out and moving on

  79. Sensible Timeouts?

  80. Measure EVERYTHING!

  81. None
  82. TIMEOUT?

  83. api webhook api webhook service api webhook api webhook service

  84. api webhook api webhook service api webhook api webhook service

    Client Logic Server
  85. api webhook api webhook service api webhook api webhook service

    Client Logic Server
  86. None
  87. Client Remote Server Circuit Breaker

  88. Client Remote Server Circuit Breaker Error! Error!

  89. Client Remote Server Circuit Breaker Timeout! Timeout!

  90. Client Remote Server Circuit Breaker Circuit Open! OPEN

  91. None
  92. Client Remote Server Circuit Breaker Return Error or Cached Results

    OPEN
  93. Topology Management

  94. WEBHOOK API LOAD BALANCER HTTP API & ROUTING LAYER WEBHOOK

    SERVICE WEBHOOK SERVICE WEBHOOK SERVICE
  95. WEBHOOK API LOAD BALANCER HTTP API & ROUTING LAYER WEBHOOK

    SERVICE WEBHOOK SERVICE WEBHOOK SERVICE
  96. WEBHOOK API LOAD BALANCER HTTP API & ROUTING LAYER WEBHOOK

    SERVICE WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE
  97. WEBHOOK API LOAD BALANCER HTTP API & ROUTING LAYER WEBHOOK

    SERVICE WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE SLOW / ERRORS
  98. Fanout & Cancellation

  99. WEBHOOK API WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE

  100. WEBHOOK API WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE

  101. WEBHOOK API WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE

  102. WEBHOOK API WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE

  103. WEBHOOK API WEBHOOK SERVICE WEBHOOK SERVICE WEBHOOK SERVICE

  104. Event Driven Architectures

  105. API SERVICE SERVICE A SERVICE B LOAD BALANCER HTTP API

    & ROUTING LAYER
  106. API SERVICE SERVICE A SERVICE B LOAD BALANCER HTTP API

    & ROUTING LAYER
  107. API SERVICE SERVICE A SERVICE B LOAD BALANCER HTTP API

    & ROUTING LAYER SERVICE C SERVICE D E
  108. API SERVICE SERVICE A SERVICE B LOAD BALANCER HTTP API

    & ROUTING LAYER SERVICE C SERVICE D G E F
  109. Retry Strategies

  110. Bounded exponential backoff

  111. Bounded exponential backoff with Jitter

  112. Encouraging Failure?

  113. Antifragility

  114. Chaos Engineering

  115. None
  116. None
  117. Load Failure Degradation

  118. Eliminate primaries and special nodes

  119. Putting it all together

  120. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  121. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  122. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  123. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  124. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  125. None
  126. API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed

    apns API card-api card-processing cards transactions balance transaction-enrichment merchant feed-generator feed apns
  127. None
  128. None
  129. #ndclondon Thanks! @mattheath @getmondo

  130. ATM: Thomas Hawk
 Bank of Commerce: ABQ Museum Archives IBM

    System/360: IBM Absorbed: Saxbald Photography Orbital Ion Cannon: www.rom.ac Credits