The service mesh: Distributed resilience for a cloud-native world

The Service Mesh Oliver Gould @olix0r, CTO, Buoyant

resilience The property of a material that enables it to
resume its original shape after being bent, stretched, or compressed.

operational stress variable load  hardware failure  bugs  thE uNExpeCteD  resilient
strategies dynamic orchestration  load balancing  timeouts & retries  circuit breaking 

2000 dedicated hardware with  configuration management dynamically scheduled  hybrid cloud
2017

containers orchestrators microservices

service  A service  B service  C runtime communication

service  A service  B service  C Twitter circa 2013

cloud native abstractions Virtual machines Data centers Hardware redundancy Servers
IP addresses, DNS Server monitoring Monolithic applications TCP/IP Containers Orchestrated envs Design for failure Services Service discovery Service monitoring Microservices gRPC, REST

service  A service  B service  C we need something more
?

the service mesh an infrastructure layer for managing service to
service communication

Apache Apache Apache PHP PHP PHP PHP PHP Mysql Mysql
Mysql LAMP

Nginx Nginx Nginx DB DB DB Fat clients svc svc
svc svc svc svc svc svc svc svc svc libraries

ingress DB DB DB The service mesh svc svc svc
svc svc svc svc svc svc svc svc service mesh service mesh

The Linkerd service mesh Service C Service B Service A
linkerd Service C Service B Service A linkerd Service C Service B Service A linkerd application HTTP proxied HTTP monitoring & control Node 1 Node 2 Node 3 Service C Service B Service A linkerd application HTTP proxied HTTP monitoring & control Node 1

visibility security flexibility reliability

If you’re building a cloud native application,  you need a
service mesh. CENSORED

linkerd

Linkers and Loaders, John R. Levine, Academic Press

datacenter [1] physical [2] link [3] network [4] transport linkerd-tcp
  kubernetes, mesos, swarm, …   canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application service [5] session [6] presentation json, protobuf, thrift, … linkerd

a historical perspective: tcp/ip 1975: Internet Protocol Suite Layer 3:
/etc/hosts Layer 4: /etc/services

a historical perspective: dns 1984: Domain Name Service /etc/hosts-as-a-service

host app: b app: a app: c service: a host
app: a app: b app: a the new world of service discovery!

what would a cloud native linker do?

logical naming applications refer to logical names  requests are bound
to concrete names  delegations express routing /svc/users /#/io.l5d.zk/prod/users /#/io.l5d.k8s/staging/http/users /svc => 2 * /#/io.l5d.zk/prod & 8 * /#/io.l5d.k8s/prod/http

centralized control

per-request: adhoc staging GET / HTTP/1.1  Host: mysite.com  l5d-dtab: /s/B
=> /s/B2

per-request routing: debug proxy GET / HTTP/1.1  Host: mysite.com  l5d-dtab:
/s/E => /s/P/s/E

observability counters (e.g. client/users/failures) histograms (e.g. client/users/latency/p99) tracing

timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms
retries=2 timeout=200ms retries=3 timelines users web db

timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms
retries=2 timeout=200ms retries=3 timelines users web db 800ms! 600ms!

deadlines timelines users web db timeout=400ms deadline=323ms deadline=210ms 77ms elapsed
113ms elapsed

retries typical: retries=3

retries typical: retries=3 worst-case: 300% more load!!!

budgets typical: retries=3 better:  retryBudget=20% worst-case: 300% more load!!! worst-case:
20% more load

load shedding via cancellation timelines users web db timelines users
web db timeout!

backpressure timelines users web db timelines users web db 1000
requests 100 requests 1000 requests

backpressure timelines users web db timelines users web db 1000
failed 1000 failed

backpressure timelines users web db 100 ok 100 ok 100
ok + 900 failed/redirected/etc

lb algorithms: • round-robin • fewest connections • queue depth
• exponentially- weighted moving average (ewma) • aperture request-level load balancing

demo!?

https://github.com/linkerd linkerd.io buoyant.io Buoyant is hiring! info.buoyant.io/careers

The service mesh: Distributed resilience for a ...

The service mesh: Distributed resilience for a cloud-native world

More Decks by Oliver Gould

Other Decks in Technology

Featured

Transcript