Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero Downtimes with Faulty Solutions

Zero Downtimes with Faulty Solutions

In this presentation we'll discuss what is a health check, why are they crucial when handling multiple docker containers of a service and how to use them with the new Docker Engine v1.13 using Docker Swarm. We'll use as an example a faulty solution with and without health checks and see the benefits they provide.

spiddy

May 16, 2017
Tweet

More Decks by spiddy

Other Decks in Technology

Transcript

  1. Zero Downtimes with Faulty Solutions
    Health Checks for Microservices to the Rescue
    by Dimitris Kapanidis

    View Slide

  2. +

    View Slide

  3. @spiddy
    Dimitris Kapanidis
    ● Founder and Senior Consultant at Harbur.io
    ● Organizer of Docker BCN Meetup
    ● Organizer of Kubernetes BCN Meetup
    ● Member of Docker Captains
    ● Member of Google Developer Experts

    View Slide

  4. Help modernize enterprise development workflows focusing
    on containers as first-class citizens
    2 years running Docker Containers in Production

    View Slide

  5. @spiddy
    Have you ever had an outage?

    View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. “Outage is due to a
    cascaded bug in one of
    our infrastructure
    components”
    Twitter
    21 June 2012

    View Slide

  10. @spiddy
    SYSTEMS CAN AND WILL FAIL
    ● Human Error
    ● 3rd-party Service Outage
    ● Distributed Denial of Service (DDoS)
    ● Introduced Bugs
    ● Many Many Other Reasons
    timeline
    outage

    View Slide

  11. @spiddy
    DESIGN SYSTEMS FOR RESILIENCY
    Chaos Monkey randomly terminates VM instances
    and containers that run inside of your production
    environment.
    Exposing engineers to failures more frequently
    incentivizes them to build resilient services.

    View Slide

  12. @spiddy
    WHAT TO DO WHEN SYSTEM FAILS
    ● Monitor
    ○ We need to identify failures ASAP
    ● React
    ○ We need to fix the system ASAP
    ○ Page Ops during midnight

    View Slide

  13. @spiddy
    A Health Check is a process that monitors the
    health of the application
    WHAT IS A HEALTH CHECK
    - Is web server running?
    - Is application working?
    - Is database connected?

    View Slide

  14. @spiddy
    App Health Check
    Production Monitoring
    CENTRALIZED MONITORING
    App B Health Check

    View Slide

  15. @spiddy
    EMBEDDED HEALTH CHECK LOGIC
    App
    Production Monitoring
    /health
    Health Check
    App B Health Check
    /health

    View Slide

  16. @spiddy
    HEALTH CHECKS IN CONTAINERS
    App
    Health Check
    Container
    on startup every 3 secs
    ( starting | healthy | unhealthy )

    View Slide

  17. @spiddy
    HEALTH CHECKS WITH SWARM
    Container
    Container Container
    Service
    (unhealthy)
    (starting)
    (starting)

    View Slide

  18. @spiddy
    https://github.com/spiddy/coin
    Icon by: https://opengameart.org/users/galangpiliang

    View Slide

  19. @spiddy
    spiddy/coin
    - Each instance on startup it flips a coin (50% change)
    - On success:
    - It always respond 200 on a GET request
    - On failure:
    - I always respond 500 on a GET request

    View Slide

  20. @spiddy

    View Slide

  21. @spiddy
    HEALTHCHECK instruction
    ● HEALTHCHECK [OPTIONS] CMD command
    Check container health by running a command inside the container
    --interval=DURATION (default: 30s)
    --timeout=DURATION (default: 30s)
    --retries=N (default: 3)

    View Slide

  22. @spiddy
    Thank You!

    View Slide