Self-healing Software

Self-healing Software Services That Never Fail

Self-healing Software Services That N̶e̶v̶e̶r̶ ̶F̶a̶i̶l̶ Handle Failure

A Toy System

A Toy System Mobile App API Browser API Search API
Data API Search Index Data Store

Let’s Break It

THE API IS DOWN Mobile App API Browser API Search
API Data API Search Index Data Store

Redundant Processes • Run multiple processes in case of failure
• Load balance across the healthy processes • Processes should be physically separated • Example ◦ Heroku: Multiple Dynos ◦ Digital Ocean: Droplets in different datacenters ◦ AWS: Instances in different availability zones

The API Recovered Search API Data API Search Index Data
Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

THE API IS DOWN AGAIN Search API Data API Search
Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

Platform Automation • Platform infrastructure monitors and replaces failed processes
• Examples ◦ Heroku ◦ Kubernetes ◦ AWS Auto Scale Groups ◦ AWS Beanstalk

The API Recovered Search API Data API Search Index Data

Oh Hai Internet

THAT’S A LOT OF USERS! Search API Data API Search
Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

Platform Automation • Platform infrastructure monitors load and adds more
instances • Examples ◦ Heroku ◦ Kubernetes ◦ AWS Auto Scale Groups ◦ AWS Beanstalk

Moar Instances Search API Data API Search Index Data Store
Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API Browser API Browser API

OH NO l33t Hax0r! Search API Data API Search Index
Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

Request Throttling • Throttle requests, typically by IP address or
user credential • Protects the service from malicious attack but also bugs • Examples ◦ dryruby/rack-throttle ◦ jhurliman/node-rate-limiter

Go Home Hax0r Search API Data API Search Index Data

Let’s Get Unstable

THE NETWORK IS FLAKEY Search API Data API Search Index
Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

Retries • Retry operations in the face of transient errors
• Exponential backoff + jitter reduces load on the ﬂakey component • Give up after a given number of retries or time retrying • Examples ◦ zeit/async-retry

If At First You Don’t Succeed ... Search API Data
API Search Index Data Store Mobile App API Browser API Load Balancer Load Balancer Search API Search API Data API Data API

Let’s Slooow It Down

SEARCH IS SLOW Search API Data API Search Index Data
Store Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API

Timeouts • Add timeouts to all operations that can block
• Often combined with retries and backoff • Prevents the whole system from falling over

Graceful Degradation Data API Search Index Data Store Mobile App
API Browser API Load Balancer Load Balancer Data API Data API Search API Search API Search API

SEARCH IS GETTING WORSE Data API Search Index Data Store
Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API Search API

Circuit Breakers • Code that wraps an operation which can
fail or hang • Tracks error rates and request latency • “Opens” the circuit under high latency or error rates • When available, a fallback response can be returned • Allows some request through to test if it should “Close” • Often incorporates timeouts and is combined with retries

Circuit Breakers • Prevents exhausting resources ◦ Client requests fail
fast and don’t tie up resources waiting • Allows the system to recover ◦ Removes load from dependencies so they can recover • Examples ◦ Shopify/semian ◦ nodeshift/opossum

Search API Search Recovers Data API Search Index Data Store
Mobile App API Browser API Load Balancer Load Balancer Data API Data API Search API Search API

Go Build Resilient Systems!

Questions

Self-healing Software

Self-healing Software

Ben Darfler

More Decks by Ben Darfler

Other Decks in Technology

Featured

Transcript