Slide 1

Slide 1 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Running efficient EKS clusters with Karpenter

Slide 2

Slide 2 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS customers run containers at scale 80% Of all containerized applications running in the cloud run on AWS* 10X Amazon EKS usage growth in last year 8 Billion+ weekly container images pulled from Amazon ECR 100 Million ECS Fargate tasks running every week *https://nucleusresearch.com/research/single/guidebook-containers-and-kubernetes-on-aws/

Slide 3

Slide 3 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS container services landscape Management Deployment, Scheduling, Scaling & Management of containerized applications Hosting Where the containers run Amazon Elastic Container Service (ECS) Amazon Elastic Kubernetes Service (EKS) Amazon EC2 AWS Fargate Image Registry Container Image Repository Amazon Elastic Container Registry Red Hat Open Shift (ROSA)

Slide 4

Slide 4 text

© 2023, Amazon Web Services, Inc. or its affiliates. Kubernetes Architecture Kubernetes Control Node Kubernetes Worker Node Kubernetes Worker Node Kubernetes Worker Node etcd Control Plane Data Plane Where your workloads run ¢ $-$$$

Slide 5

Slide 5 text

© 2023, Amazon Web Services, Inc. or its affiliates. Capacity AWS can reclaim with 2-minute notice; interruptions happen when EC2 needs the spare capacity back Pricing Smooth, infrequent changes; more predictable EC2 Spot Instances Spare capacity from the same infrastructure Usage Choose different instance types, sizes, and Availability Zones in an Amazon EC2 Auto Scaling group EC2 Spot Instances are integrated in many AWS services, AWS Partner offerings, and open-source software Deep savings using Amazon EC2 Spot Instances

Slide 6

Slide 6 text

© 2023, Amazon Web Services, Inc. or its affiliates. Kubernetes Pod Autoscaling Overview 1. Horizontal Pod Autoscaling (HPA) 2. Vertical Pod Autoscaling (VPA) 3. Autoscaler Metrics Store HPA Pending pods Autoscaler X VPA

Slide 7

Slide 7 text

© 2023, Amazon Web Services, Inc. or its affiliates. Kubernetes Cluster Autoscaling Overview 1. Horizontal Pod Autoscaling (HPA) 2. Vertical Pod Autoscaling (VPA) 3. Autoscaler Metrics Store HPA Pending pods X VPA Autoscaler

Slide 8

Slide 8 text

© 2023, Amazon Web Services, Inc. or its affiliates. Cluster Autoscaling • Helps optimize resource usage and costs by automatically scaling a cluster up and down in line with demand. • EKS Support 2 Auto scaling mechanisms • Cluster Autoscaler • Karpenter

Slide 9

Slide 9 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS Kubernetes customers tell us that configuring cluster autoscaling is challenging for achieving: • Instance type flexibility • Spot capacity availability • Faster Autoscaling • Better resource utilization Cluster Autoscaling setup

Slide 10

Slide 10 text

© 2023, Amazon Web Services, Inc. or its affiliates. Cluster Autoscaling with Karpenter Karpenter is an open-source, flexible and high-performance Kubernetes cluster autoscaler. • Dynamically chooses best-suited compute resources • Automatically adds or removes compute resources required • Scale tested for high-performance • AWS Support in EKS Clusters

Slide 11

Slide 11 text

© 2023, Amazon Web Services, Inc. or its affiliates. Karpenter G R O U P L E S S P R O V I S I O N I N G A N D A U T O S C A L I N G • Improve the efficiency and cost of running workloads • Simplification of configuration • Kubernetes native • Flexible compute built in What if we remove the concept of node groups? • Provision capacity directly with “instant” EC2 Fleets • Choose instance types from pod resource requests • Provision nodes using K8s scheduling constraints • Track nodes using native Kubernetes labels

Slide 12

Slide 12 text

© 2023, Amazon Web Services, Inc. or its affiliates. How Karpenter provisions nodes on AWS Application Scheduler/HPA Pending Pods CA ASG EC2 API EC2 Fleet (Instance) Karpenter consolidates instance orchestration responsibilities within a single system

Slide 13

Slide 13 text

© 2023, Amazon Web Services, Inc. or its affiliates. How Karpenter works? Pending pods Existing capacity Just-in-time capacity Unschedulable pods Karpenter works in tandem with the Kube-scheduler and Compute provider • Kube scheduler places pending pods on existing capacity • Karpenter observes aggregate resource requests of unschedulable pods, computes and launches best-fit new capacity • Karpenter terminates empty nodes • Karpenter consolidates under-utilized nodes OR

Slide 14

Slide 14 text

© 2023, Amazon Web Services, Inc. or its affiliates. Karpenter B I N P A C K I N G 1 2 3 4 2 4 6 8 10 12 14 16 vCPUs Memory (GB) Online binpacking • karpenter.sh/capacity-type=spot • karpenter.k8s.aws/instance-family=m6i • kubernetes.io/arch=arm64 • topology.Kubernetes.io/zone=us-west-2a Well-known labels

Slide 15

Slide 15 text

© 2023, Amazon Web Services, Inc. or its affiliates. Provisioner CRD • Provisioner – custom resource to provision nodes with a set of attributes (taints, labels, requirements, TTL) • Single provisioner can manage compute for multiple teams and workloads • Can also have multiple provisioners for isolating compute for different needs

Slide 16

Slide 16 text

© 2023, Amazon Web Services, Inc. or its affiliates. Karpenter – Instance/AZ flexibility Instance type flexibility • No list à picks from all instance types in EC2 universe, excluding metal • Attribute-based requirements à sizes, families, generations, CPU architectures AZ flexibility • Provision in any AZ • Provision in specified AZs

Slide 17

Slide 17 text

© 2023, Amazon Web Services, Inc. or its affiliates. Simple Pod Scheduling: Node Selector Kubernetes deployment / replica set karpenter.sh/capacityType: on-demand apiVersion: v1 Kind: Deployment … spec: … nodeSelector: karpenter.sh/capacityType : spot nodeSelector karpenter.sh/capacityType: spot EC2 Fleet

Slide 18

Slide 18 text

© 2023, Amazon Web Services, Inc. or its affiliates. EC2 Allocation strategy 18 Spot allocation • Price-Capacity-Optimized • Reduce the cost of the instances • Reduce the frequency of Spot terminations On-demand allocation • Lowest-Price • Reduce the cost of the instances • Built-in Spot instance lifecycle management • Support Spot to On-demand fallback

Slide 19

Slide 19 text

© 2023, Amazon Web Services, Inc. or its affiliates. EKS cluster Karpenter provisioner AZ 1 AZ 2 AZ 3 g4 g5 P4 Groupless, Flexible, Simple Node Autoscaling with a Provisioner (CRD)

Slide 20

Slide 20 text

© 2023, Amazon Web Services, Inc. or its affiliates. Native to Kubernetes Pod scheduling constraints must fall within a provisioner’s constraints • Provision nodes using Kubernetes scheduling constraints • Track nodes using native Kubernetes labels Node selectors Node affinity Taints and tolerations Topology spread

Slide 21

Slide 21 text

© 2023, Amazon Web Services, Inc. or its affiliates. Karpenter scale-up 21 Default: all instance types, excluding metal Pending pods HPA/Application >> 1 vCPU request Node Karpenter OR instanceTypes: [m5.large, m5.2xlarge, …] 10 sec

Slide 22

Slide 22 text

© 2023, Amazon Web Services, Inc. or its affiliates. Karpenter scale-up 22 Default: all instance types, excluding metal Pending pods HPA/Application >> 1 vCPU request Node Karpenter New node OR instanceTypes: [m5.large, m5.2xlarge, …] 10 sec

Slide 23

Slide 23 text

© 2023, Amazon Web Services, Inc. or its affiliates. 1 vCPU request Node Consolidation • Deletes a node – When pods can run on free capacity of other nodes in the cluster • Deletes a node – When node is empty (no need of setting ttSecondsAfterEmpty with Consolidation) • Replaces a node – When pods can run on a combination of free capacity of other nodes in the cluster + more efficient replacement node apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: my-provisioner spec: consolidation: enabled: true Karpenter Consolidation

Slide 24

Slide 24 text

© 2023, Amazon Web Services, Inc. or its affiliates. 1 vCPU request Node Consolidation • Deletes a node – When pods can run on free capacity of other nodes in the cluster • Deletes a node – When node is empty (no need of setting ttSecondsAfterEmpty with Consolidation) • Replaces a node – When pods can run on a combination of free capacity of other nodes in the cluster + more efficient replacement node apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: my-provisioner spec: consolidation: enabled: true Karpenter Consolidation

Slide 25

Slide 25 text

© 2023, Amazon Web Services, Inc. or its affiliates. 1 vCPU request Node Consolidation • Deletes a node – When pods can run on free capacity of other nodes in the cluster • Deletes a node – When node is empty (no need of setting ttSecondsAfterEmpty with Consolidation) • Replaces a node – When pods can run on a combination of free capacity of other nodes in the cluster + more efficient replacement node apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: my-provisioner spec: consolidation: enabled: true Karpenter Consolidation

Slide 26

Slide 26 text

© 2023, Amazon Web Services, Inc. or its affiliates. 1 vCPU request Node Consolidation • Deletes a node – When pods can run on free capacity of other nodes in the cluster • Deletes a node – When node is empty (no need of setting ttSecondsAfterEmpty with Consolidation) • Replaces a node – When pods can run on a combination of free capacity of other nodes in the cluster + more efficient replacement node apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: my-provisioner spec: consolidation: enabled: true Karpenter Consolidation

Slide 27

Slide 27 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Thank You!