Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Self-healing Software

Self-healing Software

Services that Handle Failure

Ben Darfler

May 07, 2019
Tweet

More Decks by Ben Darfler

Other Decks in Technology

Transcript

  1. A Toy System Mobile App API Browser API Search API

    Data API Search Index Data Store
  2. THE API IS DOWN Mobile App API Browser API Search

    API Data API Search Index Data Store
  3. Redundant Processes • Run multiple processes in case of failure

    • Load balance across the healthy processes • Processes should be physically separated • Example ◦ Heroku: Multiple Dynos ◦ Digital Ocean: Droplets in different datacenters ◦ AWS: Instances in different availability zones
  4. The API Recovered Search API Data API Search Index Data

    Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  5. THE API IS DOWN AGAIN Search API Data API Search

    Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  6. Platform Automation • Platform infrastructure monitors and replaces failed processes

    • Examples ◦ Heroku ◦ Kubernetes ◦ AWS Auto Scale Groups ◦ AWS Beanstalk
  7. The API Recovered Search API Data API Search Index Data

    Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  8. THAT’S A LOT OF USERS! Search API Data API Search

    Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  9. Platform Automation • Platform infrastructure monitors load and adds more

    instances • Examples ◦ Heroku ◦ Kubernetes ◦ AWS Auto Scale Groups ◦ AWS Beanstalk
  10. Moar Instances Search API Data API Search Index Data Store

    Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API Browser API Browser API
  11. OH NO l33t Hax0r! Search API Data API Search Index

    Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  12. Request Throttling • Throttle requests, typically by IP address or

    user credential • Protects the service from malicious attack but also bugs • Examples ◦ dryruby/rack-throttle ◦ jhurliman/node-rate-limiter
  13. Go Home Hax0r Search API Data API Search Index Data

    Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  14. THE NETWORK IS FLAKEY Search API Data API Search Index

    Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  15. Retries • Retry operations in the face of transient errors

    • Exponential backoff + jitter reduces load on the flakey component • Give up after a given number of retries or time retrying • Examples ◦ zeit/async-retry
  16. If At First You Don’t Succeed ... Search API Data

    API Search Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API
  17. SEARCH IS SLOW Search API Data API Search Index Data

    Store Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API
  18. Timeouts • Add timeouts to all operations that can block

    • Often combined with retries and backoff • Prevents the whole system from falling over
  19. Graceful Degradation Data API Search Index Data Store Mobile App

    API Browser API Load Balancer Load Balancer Data API Data API Search API Search API Search API
  20. SEARCH IS GETTING WORSE Data API Search Index Data Store

    Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API Search API
  21. Circuit Breakers • Code that wraps an operation which can

    fail or hang • Tracks error rates and request latency • “Opens” the circuit under high latency or error rates • When available, a fallback response can be returned • Allows some request through to test if it should “Close” • Often incorporates timeouts and is combined with retries
  22. Circuit Breakers • Prevents exhausting resources ◦ Client requests fail

    fast and don’t tie up resources waiting • Allows the system to recover ◦ Removes load from dependencies so they can recover • Examples ◦ Shopify/semian ◦ nodeshift/opossum
  23. Search API Search Recovers Data API Search Index Data Store

    Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API