Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CNCF JLM Meetup - Making Peace With the Grim Reaper

Komodor
April 24, 2022

CNCF JLM Meetup - Making Peace With the Grim Reaper

Learn all about liveness and readiness probes (done right) from Guy Menachem - Solution Architect at Komodor, the first Kubernetes-native troubleshooting platform, with vast experience working with DBs from old-timey mainframes to cloud-native systems.

Komodor

April 24, 2022
Tweet

More Decks by Komodor

Other Decks in Technology

Transcript

  1. Komodor <> Epsagon | May 2021
    Guy Menahem
    Solution Architect @ Komodor
    Making Peace With
    The Grim Reaper
    Liveness & Readiness Probes Done Right

    View Slide

  2. Cloud native | March 2021
    ● A Developer turned Solution Architect
    ● Working at Komodor, a startup building the
    first K8s-native troubleshooting platform
    ● Love everything in infrastructure: storage,
    networks & security - from 70’s era
    mainframes to cloud-native
    ● All about “plan well, sleep well”
    Who am I?

    View Slide

  3. Investors | January 2021
    WTF Are Probes?
    Probe Types
    ● Liveness - When to restart a container
    ● Readiness - When to send requests
    ● Startup - When the initial setup is done
    Overview
    A probe is a diagnostic performed periodically
    by the kubelet on a container. To perform a
    diagnostic, the kubelet either executes code
    within the container, or makes a network
    request.

    View Slide

  4. Investors | January 2021
    WTF Are Probes?
    Probes under the hood:
    The Happy Flow The Bad Flow

    View Slide

  5. Investors | January 2021
    WTF Are Probes?
    Why Should You Care?
    1. Has direct impact on the app’s availability
    2. Can be useful when troubleshooting

    View Slide

  6. Investors | January 2021
    What Could Possibly Go Wrong?
    1. Undetected Downtime - Your app is down but traffic is being sent to it anyway.
    (you forgot to call the Grim Reaper)

    View Slide

  7. Investors | January 2021
    What Could Possibly Go Wrong?
    1. Undetected Downtime - Your app is down but traffic is being sent to it anyway.
    (you forgot to call the Grim Reaper)
    2. Unwanted Downtime - Your app runs well but Kubernetes restarts it continuously
    or doesn’t send traffic (you sacrifice an innocent soul to the Reaper)
    3. Unexpected Behavour - Your app doesn’t match your needs - startup, delay,
    failure tolerance, etc. (you don’t know when to call the Reaper)

    View Slide

  8. Cloud native | March 2021
    Best Practices
    For Kubernetes Probes

    View Slide

  9. Investors | January 2021
    1. Plan It
    ● Liveness - Everything is loaded & runs fine
    ● Liveness - Can recover from external failures
    ● Readiness - Can serve the users
    ● Readiness - Fail following external failures

    View Slide

  10. Investors | January 2021
    2. Configure It
    ● Configure with failure thresholds
    ● Make it aligned with your business/service needs

    View Slide

  11. Investors | January 2021
    3. Code It
    ● Specific endpoint for each probe
    ● Understand the state
    ● Liveness - Everything that’s needed is loaded
    ● Liveness - Can the app autorecover or it needs help?
    ● Readiness - All the external services are accessible (DB,
    queues, other services)
    ● Readiness - the main job can be done
    ● Log everything
    ● Avoid long startups

    View Slide

  12. Troubleshoot Kubernetes
    With Confidence

    View Slide

  13. Meet
    Komodor
    Reducing MTTR
    A single source of truth to
    uncover root causes
    quickly.
    Empowering Developers
    Enable every team
    member to troubleshoot
    independently.
    Simplifying the K8s Chaos
    Gain the relevant context
    & insights you need to
    troubleshoot faster.
    We’re a team of devs on a mission to transform K8s troubleshooting by:

    View Slide

  14. Search through multiple
    tools to find the needle in
    the haystack
    Use kubectl to estimate your
    way to the root cause
    The Process of
    Troubleshooting K8s Today

    View Slide

  15. How Komodor Makes
    K8s Troubleshooting Simple
    Gain Change Intelligence
    Runs automated checks and
    delivers remediation suggestions
    empowering all responders to
    troubleshoot independently.
    A single source of truth for all
    code & config changes, code
    diffs, deploys, alerts - displaying
    both real time & historical data.
    Shift Troubleshooting Left
    Uncover K8s Dependencies
    Gain the relevant insights
    behind a symptom, from the
    app layer (code change) to the
    infra layer (node resources).
    Uncover related components &
    understand cross-component
    changes to troubleshoot with
    the relevant context.
    Have 360° Visibility

    View Slide

  16. Troubleshooting probes couldn’t be easier

    View Slide

  17. Thanks for listening!
    Want to learn how you can troubleshoot K8s issues with ease?
    Then check us out at: www.komodor.com

    View Slide