that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system Wikipedia
that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system Wikipedia
web services, to avoid overloading the system and making it unavailable for all users. The idea is to ignore some requests rather than crashing a system and making it fail to serve any request. Wikipedia
is the regular capacity ✘ Test1: 15RPS for 3m (with 50RPS 1m middle spike) ✘ Test2: 60RPS for 15m ✘ Limits to 0.1 CPU and 50mb on memory ✘ Goresilience library (github.com/slok/goresilience) The tests demo can be found at https://github.com/slok/resilience-demo
encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties. Closed Open Half open Error limit exceeded Timeout Tests failed Tests succeeded Regular flow Fail fast Regular flow
concurrent handlings. ✘ Circuit breaker will release the load (fast). ✘ Hystrix style pattern. ✘ Needs to be configured . ✘ Will protect us from bursts/spikes. ✘ Circuit breaker wraps bulkhead.
on congestion Explanation and algorithm: https://queue.acm.org/detail.cfm?id=2839461 Original CoDel: https://queue.acm.org/detail.cfm?id=2209336 Airbnb uses also: https://medium.com/airbnb-engineering/building-services-at-airbnb-part-3-ac6d4972fc2d
timeouts: ✘ Regular (Interval): 100ms by default ✘ Aggressive (target): 5ms by default By default requests will have the interval timeout on queue. Measure when the queue was empty for the last time. If the duration since last time is greater than interval duration, congestion detected. If congested the requests will have the target timeout on queue.
✘ FIFO: On regular mode first in first out. ✘ LIFO: On congestion mode last in first out (unfair). When CoDel detects congestion it will change queue dequeue priority and the last requests will be served first (CoDel will clean old queued requests). The algorithm assumes that delayed queued requests are gone already and the new ones have more probability of being served. Dropbox proxy uses adaptive LIFO : https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy
CoDel algorithm and adapted by Facebook. ✘ Dynamic timeout and queue priority (Will adapt and change policies on congestion) . ✘ No configuration required (almost, safe defaults). ✘ Very aggressive timeouts on congestion.
to auto-detect concurrency limits for services in order to achieve optimal throughput with optimal latency. Concurrent requests Time Initial limit Discovered limit Real limit
AIMD but there are more like Vegas, Gradient...). ✘ Adaptive concurrency. ✘ Static queue timeout and priority. ✘ No configuration required. ✘ Adapts based on execution results (errors and latency).
and context. ✘ There is a loser: no protection (naked server). ✘ Adaptive algorithms add complexity (use libraries like goresilience) but are better for dynamic envs like cloud native. ✘ A bulkhead or circuit breaker can be enough. ✘ You can use a front proxy (or use sidecar pattern) ✘ Don’t trust your clients, protect yourself.