Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GenAI Meets K8s

Komodor
September 26, 2024

GenAI Meets K8s

Komodor

September 26, 2024
Tweet

More Decks by Komodor

Other Decks in Technology

Transcript

  1. Meet Komodor’s Klaudia Why: To keep up with Kubernetes complexity

    you need to be able to holistically correlate real-time and historic data and understand their ripple effect across the system. What: “Klaudia” identifies the root cause of issues in Kubernetes, and provides meaningful explanations, helping teams understand why issues occur and how to prevent them in the future. How: Klaudia leverages Komodor's comprehensive dataset of past investigation flows, historical changes, events, and metrics. This knowledge base enables precise diagnostics and actionable insights, with AI enhancing our ability to scale across the Kubernetes stack.
  2. Klaudia Internal Flow 1. Problem Detection: Komodor identifies a Kubernetes

    issue 2. Model Selection: Klaudia chooses the most suitable AI model for the specific problem type 3. Autonomous Investigation: Independent root cause analysis agent is launched 4. Iterative Investigation: Agent forms hypotheses and tests them: a. Requests relevant data from Komodor API as needed b. Analyzes new information and refines investigation c. Repeats process, narrowing down to root cause 5. Analysis Completion: Klaudia generates precise root cause analysis, with clear, detailed explanations and actionable next steps 6. Presentation: The findings are displayed to users with supporting evidence
  3. Klaudia: Internal Flow Detection Komodor continuously monitors your entire K8s

    cluster fleet & detects any issues or potential risks.
  4. Model Selection Based on the issue detected, Klaudia picks an

    appropriate model from a catalog of rule engines & LLMs. Klaudia: Internal Flow
  5. Klaudia: Internal Flow Investigation The chosen model performs an iterative

    investigation process until the root cause analysis is fulfilled.
  6. Klaudia: Internal Flow Investigation Klaudia provides a clear, detailed RC

    analysis with actionable next steps to remediate & prevent in the future, as the final stage.
  7. 1. A pod failure occurred due to a corrupted key

    inside a changed ConfigMap. 2. Although the ConfigMap was properly mounted, it contained malformed data. This was not immediately obvious and required deep analysis to diagnose. 3. The GenAI agent took just a couple of seconds to digest logs, historical changes, K8s events, and flag the RC 4. Then provided clear instructions for remediation, including a direct link to the right ConfigMap Scenario from User’s POV:
  8. How Do We Compare? We tested the leading K8s AI

    troubleshooting agents on the market with a simple scenario: A deployment relying on a ConfigMap with an invalid value. Here are the results 👉
  9. Only Komodor’s Klaudia correctly detected the root-cause (including supporting evidence)

    & suggested exact instructions for remediation - Completing the troubleshooting cycle E2E 👇