Slide 3
Slide 3 text
Abstract
In the dynamic realm of Kubernetes infrastructure, it functions much like a living organism: pods spawn
and terminate like a heartbeat, nodes scale up and down like breaths, and events trigger like nerve
impulses.
While biological organisms are flawless, technical systems always have room for improvement. By
observing these systems closely over time, one can figure out the ways to enhance them. However, due to
human limitations in processing vast amounts of data manually, we have to use tools to analyze raw data
and provide actionable insights for k8s users.
Within the Komodor platform, data from hundreds of Kubernetes clusters flows continuously, placing us in
a unique position to analyze this data for the benefit of our customers. This led to the initiation of the
"Reliability Insights" project, which, after extensive research and experimentation, has become an integral
part of our main platform.
During our research, we identified a dozen of types of reliability-related insights. While only two-thirds of
these insights made it into the final product, all of them were valuable, and some of the unreleased ones
were particularly cool. In this talk, we will share our observations and findings from the insights research,
providing a deeper understanding of each type of insight and explaining the importance of analyzing the
life of clusters from a higher-level perspective.