Slide 1

Slide 1 text

Zero Downtimes with Faulty Solutions Health Checks for Microservices to the Rescue by Dimitris Kapanidis

Slide 2

Slide 2 text

+

Slide 3

Slide 3 text

@spiddy Dimitris Kapanidis ● Founder and Senior Consultant at Harbur.io ● Organizer of Docker BCN Meetup ● Organizer of Kubernetes BCN Meetup ● Member of Docker Captains ● Member of Google Developer Experts

Slide 4

Slide 4 text

Help modernize enterprise development workflows focusing on containers as first-class citizens 2 years running Docker Containers in Production

Slide 5

Slide 5 text

@spiddy Have you ever had an outage?

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

“Outage is due to a cascaded bug in one of our infrastructure components” Twitter 21 June 2012

Slide 10

Slide 10 text

@spiddy SYSTEMS CAN AND WILL FAIL ● Human Error ● 3rd-party Service Outage ● Distributed Denial of Service (DDoS) ● Introduced Bugs ● Many Many Other Reasons timeline outage

Slide 11

Slide 11 text

@spiddy DESIGN SYSTEMS FOR RESILIENCY Chaos Monkey randomly terminates VM instances and containers that run inside of your production environment. Exposing engineers to failures more frequently incentivizes them to build resilient services.

Slide 12

Slide 12 text

@spiddy WHAT TO DO WHEN SYSTEM FAILS ● Monitor ○ We need to identify failures ASAP ● React ○ We need to fix the system ASAP ○ Page Ops during midnight

Slide 13

Slide 13 text

@spiddy A Health Check is a process that monitors the health of the application WHAT IS A HEALTH CHECK - Is web server running? - Is application working? - Is database connected?

Slide 14

Slide 14 text

@spiddy App Health Check Production Monitoring CENTRALIZED MONITORING App B Health Check

Slide 15

Slide 15 text

@spiddy EMBEDDED HEALTH CHECK LOGIC App Production Monitoring /health Health Check App B Health Check /health

Slide 16

Slide 16 text

@spiddy HEALTH CHECKS IN CONTAINERS App Health Check Container on startup every 3 secs ( starting | healthy | unhealthy )

Slide 17

Slide 17 text

@spiddy HEALTH CHECKS WITH SWARM Container Container Container Service (unhealthy) (starting) (starting)

Slide 18

Slide 18 text

@spiddy https://github.com/spiddy/coin Icon by: https://opengameart.org/users/galangpiliang

Slide 19

Slide 19 text

@spiddy spiddy/coin - Each instance on startup it flips a coin (50% change) - On success: - It always respond 200 on a GET request - On failure: - I always respond 500 on a GET request

Slide 20

Slide 20 text

@spiddy

Slide 21

Slide 21 text

@spiddy HEALTHCHECK instruction ● HEALTHCHECK [OPTIONS] CMD command Check container health by running a command inside the container --interval=DURATION (default: 30s) --timeout=DURATION (default: 30s) --retries=N (default: 3)

Slide 22

Slide 22 text

@spiddy Thank You!