Slide 1

Slide 1 text

Reliability-Driven Kubernetes Fleet Management

Slide 2

Slide 2 text

Agenda Intro Understanding the Complexity of K8s Fleets Cluster Groups Best Practices Fleet Management with Komodor Demo Time!

Slide 3

Slide 3 text

Hi, My Name Is Itiel Shwartz πŸ‘‹ ● The CTO and Co-Founder of Komodor ● A big believer in dev empowerment and moving fast! ● Backend Developer turned DevOps ● Worked at eBay, Forter, Rookout (first developer) ● K8S fanboy πŸ˜ƒ

Slide 4

Slide 4 text

What is a Kubernetes Fleet? ● The number of K8s clusters per organization is growing every year ● While in 2019 only 10% of organizations had 50+ clusters, in 2024 having 100s of K8s clusters is no longer considered an outlier ● The inherent complexities of managing a K8s cluster (lifecycle, monitoring, maintenance, troubleshooting, etc.) are multiplied ● The sheer scale of these multi-cluster deployments poses a new and unique set of challenges, heightened by the popularity of K8s on Edge ● To address those the concept of Fleet Management was invented.

Slide 5

Slide 5 text

What is Kubernetes Fleet Management? β€œA fleet provides a way to logically group and normalize Kubernetes clusters, helping you uplevel management from individual clusters to entire groups of clusters.” - GKE Enterprise Efficiency Security Standardization

Slide 6

Slide 6 text

Example: Fleet Management as a Platform PLATFORM TEAM BU 1 BU 2 BU 3 A B C A B C A B C AWS west-1 Clusters AWS east-2 Clusters On-Prem Clusters BU 1 PLATFORM TEAM

Slide 7

Slide 7 text

Cluster Lifecycle ● Upgrades & maintenance ● Infrastructure resiliency Cost & Resource Utilization ● Keeping the cost low across multi-cluster/cloud/on-prem & hybrid ● Efficient resource utilization on Edge nodes Reliability & Resiliency ● Resolving issues across different envs & AZs ● RCA is endless ● Knowledge gaps between Dev & Ops create bottlenecks Access Management ● RBAC for cluster access ● JiT access ● Edge locations Governance & Standardization ● Enforcing standards across the fleet ● Policy enforcement ● Security compliance Cross-Cluster Visibility ● Hard to correlate between issues ● Deviations in service performance What’s So Hard About Fleet Management?

Slide 8

Slide 8 text

The Human Aspect of Fleet Management ● Every persona in the organization has a different mindset and approach ● Different requirements and KPIs for different teams ● Different permissions and access required per persona or per use-case (JiT) ● Knowledge and skills gaps (K8s has a steep learning curve)

Slide 9

Slide 9 text

How to Start Thinking in Cluster Attributes? Region 1 Region 2 Region N Production Staging Development AWS Azure Google Cloud NS: frontend NS: backend NS: auth By Region πŸ‘‰ By Environment πŸ‘‰ By Cloud Provider πŸ‘‰ By Namespace πŸ‘‰

Slide 10

Slide 10 text

Fleet Management for K8s on Edge Chick-Fil-A Case Study Azure Google AWS Cloud-fil-a

Slide 11

Slide 11 text

Fleet Management For K8s on Edge HQ Europe North America West-US East-US Germany France Los Angeles San Francisco Berlin Munich Paris Location 1 Location 42 Location 1 Location 92 Location 1 Location 1

Slide 12

Slide 12 text

Fleet Management Best Practices 1. Leverage IaC 2. Implement GitOps 3. Use best-of-breed monitoring 4. Consolidate clusters in a unified single-pane-of-glass 5. Build or buy a dedicated Fleet Management solution

Slide 13

Slide 13 text

Golden Tip: Shift-Left Ops ● Abstract complexity - expose functionality and bubble up relevant data in the right context (i.e simplify K8s and reduce cognitive load on non-experts) ● Automate away toil in a manner that can circumvent human errors ● Template services, deployments, etc. (i.e enforce governance and standardization) ● Empower developers and other stakeholders to own K8s (i.e manage their workloads on K8s without having to learn K8s)

Slide 14

Slide 14 text

Creating Cluster Groups With Komodor

Slide 15

Slide 15 text

Managing Cluster Groups With Komodor

Slide 16

Slide 16 text

Demo Time!