Upgrade to Pro — share decks privately, control downloads, hide ads and more …

5 Best Practices to Simplify Kubernetes Troubleshooting

5 Best Practices to Simplify Kubernetes Troubleshooting

Kubernetes is agile, flexible and scalable. It is also complex, with a multitude of moving parts on and underneath the surface. When things go wrong, as they always (eventually) do, understanding what caused the problem could be stressful and time-consuming.

But does it have to be? Join Mickael Alliel, DevOps engineer at Komodor, as he deep-dives into:

The obvious (and not so obvious) challenges of using Kubernetes
Ways to create a troubleshooting-friendly environment
Five best practices to simplify K8s troubleshooting.

To view on-demand: https://webinars.devops.com/5-best-practices-to-simplify-kubernetes-troubleshooting?utm_campaign=$9.2.21$_DO_Webinar_Komodor&utm_source=Komodor1

Komodor

April 26, 2022
Tweet

More Decks by Komodor

Other Decks in Programming

Transcript

  1. Komodor <> Epsagon | May 2021 5 Best Practices to

    Simplify K8s Troubleshooting Mickael Alliel, DevOps Engineer @ Komodor
  2. Cloud native | March 2021 • DevOps Engineer at Komodor,

    a startup building the first k8s-native troubleshooting platform • A self-taught developer turned DevOps • Passionate about automation & K8s • All about “working smarter, not harder” Who am I?
  3. Cloud native | March 2021 “Kubernetes has become the de

    facto container orchestrator, with over 88% having adopted K8s already.” Source: https://www.redhat.com/en/resources/kubernetes-adoption-security-market-trends-2021-overview
  4. Cloud native | March 2021 Kubernetes is great for obvious

    reasons Portability & Flexibility Multi-Cloud Capabilities Increased Developer Productivity Application Reliability Cost Savings
  5. Cloud native | March 2021 Among the top challenges are:

    • Security • Logging • Monitoring • Cultural & technical changes with development There is a common denominator amongst Kubernetes’ challenges ...but it brings its own, unique set of challenges
  6. Cloud native | March 2021 "Despite 6 years of progress,

    Kubernetes is still incredibly complex," said Drew Bradstock, product lead for Google Kubernetes Engine (GKE). "What we've seen in the past year or so is a lot of enterprises are embracing Kubernetes, but then they run headlong into the difficulty." The common enemy: Complexity
  7. Cloud native | March 2021 What Makes Troubleshooting K8s Complex?

    Issues happen on a daily basis and it’s almost impossible to understand what causes them. 85% of incidents can be traced to system changes: Blind spot Changes are unaudited or hidden Fragmented data Events are scattered between hundreds of different tools Butterfly effect Distributed systems makes it harder to understand the effect of a single change
  8. Cloud native | March 2021 What Makes Troubleshooting K8s Complex?

    There’s additional barriers to consider: Lack of K8s knowledge The knowledge & expertise around K8s is often held by only a few (for e.x Ops). Lack of permissions/access Developers often don’t have permissions to view logs, make rollbacks or general access to critical resources & tools
  9. Investors | January 2021 Investors | January 2021 Make sure

    to include important metadata, such as: • labels and annotations • environment variables • secrets • config maps that point to the proper objects and volumes • configuring liveness • readiness probes Best Practice #1: Maintain a Good YAML Hygiene
  10. Investors | January 2021 Best Practice #2: Stateless Apps FTW

    MariaDB ElasticSearch Kafka Prometheus Stateful Service Scaling Upgrade Discovery Graceful Shutdown Quorum Session Management Data Recovery Multi-Instances One per instance Storage API Gateway Reset API Frontend Auth Stateless Service Scaling Upgrade Graceful Shutdown Multi-Instances Stateless Application Features Stateful Application Features
  11. Investors | January 2021 Investors | January 2021 Tag and

    label your logs properly, by including the: • Proper service name (not the pod names!) • Version • Cluster environment information • Business-specific data Best Practice #3: Logging - Specifically for K8s
  12. Investors | January 2021 Investors | January 2021 There are

    several ways to separate your K8s environments: Option 1: Create an environment for each stage of the development process (development, QA, staging, production) Best Practice #4: Separate/Segregate Environments
  13. Investors | January 2021 Investors | January 2021 Option 2:

    Create an environment according to namespaces (special K8s resource) Best Practice #4: Separate/Segregate Environments
  14. Investors | January 2021 Investors | January 2021 Best Practice

    #5: Invest in Proper Monitoring Open Source Monitoring Solutions Step 1 Choose the right monitoring solution for your needs. Commercial Monitoring Solutions
  15. Investors | January 2021 Investors | January 2021 Best Practice

    #5: Invest in Proper Monitoring (Cont’d) Start monitoring the following metrics: • Resources: CPU / Memory Usage • Container Status: Up / Down / Errors / Probe Data / Restart count • Application Metrics: Application Performance Metrics - APMs Make sure to monitor these metrics in an automated way by setting up proper monitors and alerts
  16. Cloud native | March 2021 Ensuring the right foundations for

    your K8s environments from the get-go will easen the process of troubleshooting later down the line, ultimately enabling you and your team to move faster, increase ownership, and bring more value for your customers. Conclusion
  17. Q&A