Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero Downtimes with Faulty Solutions

Zero Downtimes with Faulty Solutions

In this presentation we'll discuss what is a health check, why are they crucial when handling multiple docker containers of a service and how to use them with the new Docker Engine v1.13 using Docker Swarm. We'll use as an example a faulty solution with and without health checks and see the benefits they provide.

spiddy

May 16, 2017
Tweet

More Decks by spiddy

Other Decks in Technology

Transcript

  1. +

  2. @spiddy Dimitris Kapanidis • Founder and Senior Consultant at Harbur.io

    • Organizer of Docker BCN Meetup • Organizer of Kubernetes BCN Meetup • Member of Docker Captains • Member of Google Developer Experts
  3. Help modernize enterprise development workflows focusing on containers as first-class

    citizens 2 years running Docker Containers in Production
  4. “Outage is due to a cascaded bug in one of

    our infrastructure components” Twitter 21 June 2012
  5. @spiddy SYSTEMS CAN AND WILL FAIL • Human Error •

    3rd-party Service Outage • Distributed Denial of Service (DDoS) • Introduced Bugs • Many Many Other Reasons timeline outage
  6. @spiddy DESIGN SYSTEMS FOR RESILIENCY Chaos Monkey randomly terminates VM

    instances and containers that run inside of your production environment. Exposing engineers to failures more frequently incentivizes them to build resilient services.
  7. @spiddy WHAT TO DO WHEN SYSTEM FAILS • Monitor ◦

    We need to identify failures ASAP • React ◦ We need to fix the system ASAP ◦ Page Ops during midnight
  8. @spiddy A Health Check is a process that monitors the

    health of the application WHAT IS A HEALTH CHECK - Is web server running? - Is application working? - Is database connected?
  9. @spiddy HEALTH CHECKS IN CONTAINERS App Health Check Container on

    startup every 3 secs ( starting | healthy | unhealthy )
  10. @spiddy spiddy/coin - Each instance on startup it flips a

    coin (50% change) - On success: - It always respond 200 on a GET request - On failure: - I always respond 500 on a GET request
  11. @spiddy HEALTHCHECK instruction • HEALTHCHECK [OPTIONS] CMD command Check container

    health by running a command inside the container --interval=DURATION (default: 30s) --timeout=DURATION (default: 30s) --retries=N (default: 3)