Slide 1

Slide 1 text

The Service Mesh Oliver Gould @olix0r, CTO, Buoyant

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

resilience The property of a material that enables it to resume its original shape after being bent, stretched, or compressed.

Slide 4

Slide 4 text

operational stress variable load
 hardware failure
 bugs
 thE uNExpeCteD
 resilient strategies dynamic orchestration
 load balancing
 timeouts & retries
 circuit breaking


Slide 5

Slide 5 text

2000 dedicated hardware with
 configuration management dynamically scheduled
 hybrid cloud 2017

Slide 6

Slide 6 text

containers orchestrators microservices

Slide 7

Slide 7 text

service
 A service
 B service
 C runtime communication

Slide 8

Slide 8 text

service
 A service
 B service
 C Twitter circa 2013

Slide 9

Slide 9 text

cloud native abstractions Virtual machines Data centers Hardware redundancy Servers IP addresses, DNS Server monitoring Monolithic applications TCP/IP Containers Orchestrated envs Design for failure Services Service discovery Service monitoring Microservices gRPC, REST

Slide 10

Slide 10 text

service
 A service
 B service
 C we need something more ?

Slide 11

Slide 11 text

the service mesh an infrastructure layer for managing service to service communication

Slide 12

Slide 12 text

Apache Apache Apache PHP PHP PHP PHP PHP Mysql Mysql Mysql LAMP

Slide 13

Slide 13 text

Nginx Nginx Nginx DB DB DB Fat clients svc svc svc svc svc svc svc svc svc svc svc libraries

Slide 14

Slide 14 text

ingress DB DB DB The service mesh svc svc svc svc svc svc svc svc svc svc svc service mesh service mesh

Slide 15

Slide 15 text

The Linkerd service mesh Service C Service B Service A linkerd Service C Service B Service A linkerd Service C Service B Service A linkerd application HTTP proxied HTTP monitoring & control Node 1 Node 2 Node 3 Service C Service B Service A linkerd application HTTP proxied HTTP monitoring & control Node 1

Slide 16

Slide 16 text

visibility security flexibility reliability

Slide 17

Slide 17 text

If you’re building a cloud native application,
 you need a service mesh. CENSORED

Slide 18

Slide 18 text

linkerd

Slide 19

Slide 19 text

Linkers and Loaders, John R. Levine, Academic Press

Slide 20

Slide 20 text

datacenter [1] physical [2] link [3] network [4] transport linkerd-tcp 
 kubernetes, mesos, swarm, … 
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application service [5] session [6] presentation json, protobuf, thrift, … linkerd

Slide 21

Slide 21 text

a historical perspective: tcp/ip 1975: Internet Protocol Suite Layer 3: /etc/hosts Layer 4: /etc/services

Slide 22

Slide 22 text

a historical perspective: dns 1984: Domain Name Service /etc/hosts-as-a-service

Slide 23

Slide 23 text

host app: b app: a app: c service: a host app: a app: b app: a the new world of service discovery!

Slide 24

Slide 24 text

what would a cloud native linker do?

Slide 25

Slide 25 text

logical naming applications refer to logical names
 requests are bound to concrete names
 delegations express routing /svc/users /#/io.l5d.zk/prod/users /#/io.l5d.k8s/staging/http/users /svc => 2 * /#/io.l5d.zk/prod & 8 * /#/io.l5d.k8s/prod/http

Slide 26

Slide 26 text

centralized control

Slide 27

Slide 27 text

per-request: adhoc staging GET / HTTP/1.1
 Host: mysite.com
 l5d-dtab: /s/B => /s/B2

Slide 28

Slide 28 text

per-request routing: debug proxy GET / HTTP/1.1
 Host: mysite.com
 l5d-dtab: /s/E => /s/P/s/E

Slide 29

Slide 29 text

observability counters (e.g. client/users/failures) histograms (e.g. client/users/latency/p99) tracing

Slide 30

Slide 30 text

timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms retries=2 timeout=200ms retries=3 timelines users web db

Slide 31

Slide 31 text

timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms retries=2 timeout=200ms retries=3 timelines users web db 800ms! 600ms!

Slide 32

Slide 32 text

deadlines timelines users web db timeout=400ms deadline=323ms deadline=210ms 77ms elapsed 113ms elapsed

Slide 33

Slide 33 text

retries typical: retries=3

Slide 34

Slide 34 text

retries typical: retries=3 worst-case: 300% more load!!!

Slide 35

Slide 35 text

budgets typical: retries=3 better:
 retryBudget=20% worst-case: 300% more load!!! worst-case: 20% more load

Slide 36

Slide 36 text

load shedding via cancellation timelines users web db timelines users web db timeout!

Slide 37

Slide 37 text

load shedding via cancellation timelines users web db timelines users web db timeout!

Slide 38

Slide 38 text

backpressure timelines users web db timelines users web db 1000 requests 100 requests 1000 requests

Slide 39

Slide 39 text

backpressure timelines users web db timelines users web db 1000 failed 1000 failed

Slide 40

Slide 40 text

backpressure timelines users web db 100 ok 100 ok 100 ok + 900 failed/redirected/etc

Slide 41

Slide 41 text

lb algorithms: • round-robin • fewest connections • queue depth • exponentially- weighted moving average (ewma) • aperture request-level load balancing

Slide 42

Slide 42 text

demo!?

Slide 43

Slide 43 text

https://github.com/linkerd linkerd.io buoyant.io Buoyant is hiring! info.buoyant.io/careers