Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Failure is Not an Option: Processing Real Money at Monzo with Kubernetes and Linkerd

When Failure is Not an Option: Processing Real Money at Monzo with Kubernetes and Linkerd

In this talk, we describe how Monzo processes financial transactions involving real money and real people in way that's safe, secure, and resilient. We show how combining Kubernetes with Linkerd creates a highly adaptive system, where Kubernetes provides a baseline level of protection against hardware and software failures and Linkerd layers on request-level resilience, including including latency-aware load-balancing, intelligent retries, and service-level measures of success rates and latency. We show how the resulting system is resilient to a wide variety of failures and protects the financial transactions that flow through it from failure, yet still allows for a rapid pace of feature development and iteration.

Oliver Gould

March 29, 2017
Tweet

More Decks by Oliver Gould

Other Decks in Technology

Transcript

  1. Processing Real Money at Monzo with Kubernetes and Linkerd oliver

    beattie
 head of eng, monzo Kubecon EU, March 29, 2017 oliver gould
 creator, linkerd
 cto, buoyant inc. When Failure is Not an Option
  2. Load balancing Tracing Circuit breakers Retries Canarying Load shedding Error

    tracking Metrics Service discovery Logging Timeouts Expirations Security policies Back-offs Retry budgets Dynamic routing
  3. datacenter [1] physical [2] link [3] network [4] transport 


    
 kubernetes, mesos, swarm, … 
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application rpc [5] session [6] presentation json, protobuf, thrift, … linkerd
  4. datacenter [1] physical [2] link [3] network [4] transport linkerd-tcp

    
 kubernetes, mesos, swarm, … 
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application rpc [5] session [6] presentation json, protobuf, thrift, … linkerd
  5. host app: b app: a app: c service: a host

    app: a app: b app: a the new world of service discovery!
  6. logical naming applications refer to logical names
 requests are bound

    to concrete names
 delegations express routing /svc/users /#/io.l5d.zk/prod/users /#/io.l5d.k8s/staging/http/users /svc => /#/io.l5d.k8s/prod/http
  7. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db
  8. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db 800ms! 600ms!
  9. lb algorithms: • round-robin • fewest connections • queue depth

    • exponentially-weighted moving average (ewma) • aperture request-level load balancing
  10. github.com/linkerd/linkerd • Donated to CNCF in January 2017! • K8s

    Ingress API in the next release • gRPC and HTTP/2 battle testing • Fine-grained client policy API • Hitting 1.0 this next month! • Help us test RC1 this week
  11. github.com/linkerd/linkerd-tcp • LIghtweight, service-discovery-aware TCP LB • Supports endpoint weighting

    • Modern TLS: ALPN, SNI, forward secrecy, … • Written in Rust: native, safe, fast, & tiny! • Currently beta: get involved!