Istio
and the Service Mesh
Architecture
DevOps BKK 2018
Slide 2
Slide 2 text
About me
● Manatsawin
Hanmongkolchai
● Junior Architect at
Wongnai
Slide 3
Slide 3 text
How I sold Istio to my
team
Slide 4
Slide 4 text
How Wongnai monitor
microservices
Slide 5
Slide 5 text
Microservice monitoring
● In-service metrics
eg. controller time
Slide 6
Slide 6 text
Microservice monitoring
● AWS X-Ray SDK
Slide 7
Slide 7 text
Microservice monitoring
● Sentry
Slide 8
Slide 8 text
Microservice monitoring
● ELB Error Rate
Slide 9
Slide 9 text
Microservice monitoring
These must be integrated into your
service
AWS X-Ray
Slide 10
Slide 10 text
Microservice monitoring
The problem in microservice world
● Service can be written in many
languages.
Not all tools support every
languages
Slide 11
Slide 11 text
Microservice monitoring
The problem in microservice world
● People in a rush skip
implementing proper monitoring
Slide 12
Slide 12 text
Meet Istio
Slide 13
Slide 13 text
Service mesh
Istio handle interservice connection
Sidecar
Slide 14
Slide 14 text
How Istio sidecar work?
Istio use admission controller to
install 2 containers in your pod
Slide 15
Slide 15 text
How Istio sidecar work?
1. Init container to setup transparent
proxy iptables rule (as root)
2. Envoy running alongside your app
as the transparent proxy
Slide 16
Slide 16 text
What Istio can do for you
Monitoring
● Network calls
● Tracing
Slide 17
Slide 17 text
Network monitoring
Istio provide insight into your
network in layer 7
Slide 18
Slide 18 text
Total
requests
4xx 5xx
Slide 19
Slide 19 text
Request count
of service
Response
time
Slide 20
Slide 20 text
Service network monitoring
Measured
client side
Request
count
Success
rate
Resp.
time
Speed
(for TCP)
Measured
server
side
Slide 21
Slide 21 text
Who call me?
Slide 22
Slide 22 text
Distributed Tracing
● All incoming/outgoing HTTP calls
are traced to Jaeger
● Needs to propagate OpenTracing
headers from incoming call to
outgoing call to track calls
correctly
Slide 23
Slide 23 text
Distributed Tracing
● Easiest way is to just integrate
Zipkin OpenTracing into your app
Slide 24
Slide 24 text
Distributed Tracing
Slide 25
Slide 25 text
Distributed Tracing
Slide 26
Slide 26 text
What Istio can do for you
● Traffic Management
○ Routing
■ Traffic Shifting
■ Mirror
○ Fault Injection
○ Circuit Breaker
Slide 27
Slide 27 text
Routing
● Kubernetes service operates in
Layer 4
Cluster
IP
Backend
Backend
Backend
Req
Req Req Req
Req
Req
Slide 28
Slide 28 text
Routing
● Istio operate in layer 7 and can do
per-call load balancing
Envoy
Req
Req Req Req
Req
Req
Backend
Backend
Backend
Slide 29
Slide 29 text
Split traffic
● Split traffic between service (eg.
1% to new version)
Slide 30
Slide 30 text
Mirror traffic
● Test in production by cloning
traffic
Envoy
Live version
Test version
Req
Slide 31
Slide 31 text
Fault Injection
● Intentionally making service worse
● Why? Let’s hear a story
Slide 32
Slide 32 text
Fault Injection
Site Reliability
Engineering
How Google runs
production systems
landing.google.com
/sre/book/
Slide 33
Slide 33 text
#WongnaiIsHiring
● Wongnai is looking for our first
Site Reliability Engineer
● careers.wongnai.com
Slide 34
Slide 34 text
Chubby
Slide 35
Slide 35 text
Fault Injection
Over time, we found that the failures
of the global instance of Chubby
consistently generated service
outages.
Slide 36
Slide 36 text
Fault Injection
As it turns out, true global Chubby
outages are so infrequent that
service owners began to add
dependencies to Chubby assuming
that it would never go down.
Slide 37
Slide 37 text
Fault Injection
The solution to this Chubby scenario
is interesting: SRE makes sure that
global Chubby meets, but does not
significantly exceed, its service level
objective.
Slide 38
Slide 38 text
Fault Injection
In any given quarter, if a true failure
has not dropped availability below
the target, a controlled outage will
be synthesized by intentionally
taking down the system.
Slide 39
Slide 39 text
Fault Injection
● Slow down services
○ Delay 80% of requests for 5
seconds
● Make errors
○ Return 500 error code for 80%
of requests
Slide 40
Slide 40 text
Circuit Breaker
Remove a backend from service if it
return too many errors in a row
Frontend Backend Work
Queue
503
Timeout
F5
Slide 41
Slide 41 text
Summary
Istio provide visibility and
configurability to your network.
This is traditionally done by adding
library, but in a microservice world
you need a cross language solution
Slide 42
Slide 42 text
The catch
Here’s what we found while moving
to Istio
● While requiring zero code
changes, your service must
already be well behaved cloud
application
Slide 43
Slide 43 text
The catch
● Do not connect directly to pod IP
(eg. no service discovery - just use
cluster IP and avoid headless
service)
Slide 44
Slide 44 text
The catch
● Do not mix port type in the cluster
(eg. don’t run HTTP server on port
6379 with another pod running
TCP service at the same port)
Slide 45
Slide 45 text
The catch
● Set the Host header to the
destination. Don’t connect to
gateway and set Host header to
cooking.
○ This case is really hard to
debug...
Slide 46
Slide 46 text
The catch
● External services (ie. outside
Kubernetes) but in the capturing
IP range must have ServiceEntry
defined
○ ServiceEntry is cluster-wide