Slide 1

Slide 1 text

CLOUD NATIVE DAYS TOKYO 2023 AYA IGARASHI @LADICLE FINOPS! Reducing Kubernetes Costs With Karpenter

Slide 2

Slide 2 text

FinOps! 72% OF COMPANIES HAVE A DEDICATED FINOPS TEAM. SOURCE: FLEXERA 2023 STATE OF THE CLOUD REPORT

Slide 3

Slide 3 text

Source: Flexera 2023 State of the Cloud Report “WASTED AVG. 28% OF CLOUD COSTS”

Slide 4

Slide 4 text

Cost optimization in FinOps does not just mean using the cheapest infrastructure. Reduced infrastructure costs by removing unnecessary resources, etc. WHAT IS COST OPTIMIZATION? IT DOES NOT JUST MEAN SAVING INFRASTRUCTURE EXPENSES Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced opportunity loss due to improved availability (≈ increased sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration.

Slide 5

Slide 5 text

WHAT IS COST OPTIMIZATION? IT DOES NOT JUST MEAN SAVING INFRASTRUCTURE EXPENSES Cost optimization in FinOps does not just mean using the cheapest infrastructure. Reduced Infrastructure Costs by Removing Unnecessary Resources, Etc. Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced Opportunity Loss due to Improved Availability (≈ Increased Sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration. STRIKING THE OPTIMAL BALANCE

Slide 6

Slide 6 text

CLOUD NATIVE DAYS TOKYO 2023 AYA IGARASHI @LADICLE FINOPS! Reducing Optimizing Kubernetes Costs With Karpenter

Slide 7

Slide 7 text

AGENDA Optimize Cost How does Karpenter optimize cost? 04 WHAT YOU CAN LEARN TODAY! Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05

Slide 8

Slide 8 text

HOW TO OPTIMIZE K8S COST?

Slide 9

Slide 9 text

HOW TO OPTIMIZE K8S COST? Autoscaler

Slide 10

Slide 10 text

Resources are allocated based on current needs, which helps in reducing the cost of wasted resources. AUTOSCALER OPTIMIZE COST THE BENEFITS OF AUTOSCALER Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales)

Slide 11

Slide 11 text

AUTOSCALER OPTIMIZE COST THE BENEFITS OF AUTOSCALER Resources are allocated based on current needs, which helps in reducing the cost of wasted resources. Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales) AUTOSCALER REDUCES EACH COST!

Slide 12

Slide 12 text

AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05

Slide 13

Slide 13 text

K8S AUTOSCALERS Quick Recap

Slide 14

Slide 14 text

THE TYPES OF AUTOSCALERS KUBERNETES AUTOSCALING Workload(/Pod) Autoscaling Cluster(/Node) Autoscaling Pod Node scale Pod Pod Node scale Pod Pod Node scale Pod New! Node

Slide 15

Slide 15 text

K8S CLUSTER AUTOSCALER

Slide 16

Slide 16 text

K8S CLUSTER AUTOSCALER Karpenter

Slide 17

Slide 17 text

One of the k8s cluster autoscaler developed by AWS https://karpenter.sh/ https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter ONE OF KUBERNETES CLUSTER AUTOSCALER WHAT IS KARPENTER? 2021/11/26: General available 01 02 03 2023/11/06: Support Azure 2023/11/23: Migrate to sig-autoscaler

Slide 18

Slide 18 text

ONE OF KUBERNETES CLUSTER AUTOSCALER WHAT IS KARPENTER? aws/karpenter Azure/karpenter kuberentes-sigs/ karpenter v0.33.0 v0.2.0 v0.33.0 One of the k8s cluster autoscaler developed by AWS https://karpenter.sh/ https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter

Slide 19

Slide 19 text

VS. CLUSTER AUTOSCALER (CAS)

Slide 20

Slide 20 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod 1. Unschedulable Node 2. Create a new node from NG Node Group AWS ASG

Slide 21

Slide 21 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Node Group AWS ASG Large Node Group AWS ASG Small 1. Unschedulable 2. Create a new node from NG (small)

Slide 22

Slide 22 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Node 2. Create a new Node AWS Fleet Node Pool Node Group AWS ASG Pod 1. Unschedulable

Slide 23

Slide 23 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Pod Node 1. Unschedulable 2. Create a new (medium) Node AWS Fleet Node Pool Node Group AWS ASG

Slide 24

Slide 24 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter ● Nodes are added based on the NodeGroup settings of the CAS ● Only one node of information per NodeGroup is used to simulate scaling ● To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale ● Karpenter creates a new node based on the requirements con f igured in NodePool. ● One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. ● In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG.

Slide 25

Slide 25 text

LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter ● Nodes are added based on the NodeGroup settings of the CAS ● Only one node of information per NodeGroup is used to simulate scaling ● To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale ● Karpenter creates a new node based on the requirements con f igured in NodePool. ● One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. ● In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG. LESS MANUAL WORK, FASTER SCALING!

Slide 26

Slide 26 text

AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05

Slide 27

Slide 27 text

CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Provisioning Karpenter Node AWS Fleet Node Pool Pod Pod Pod Pod Pod Pod Pod Pod 1. Grouping pending Pods and Finding f itting candidates. 3. Create an optimal instance based on strategy ● Find a node that f its all pending pods using the bin-packing FFD algorithm. ● Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. 2. Send max 60 Candidates Save Costs! On-demand: lowest-price Spot: price-capacity-optimized Default Strategies: 27

Slide 28

Slide 28 text

CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers ● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy Karpenter Pod Node Pod Node Pod Node Delete Unnecessary or Costly Nodes Save Costs!

Slide 29

Slide 29 text

Node A Pod Node A Pod Pod CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Pod Expire 2h Delete after 2 hours ● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy

Slide 30

Slide 30 text

CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node B Pod Node A’ Pod Pod $0.05 $0.02 Recreate a new node Reallocate a Pod Pod ● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy

Slide 31

Slide 31 text

CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node A’ Pod Recreate Im age v1 Im age v2 Desired state of machine image was changed from v1 to v2. Node Pool v2 ● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy

Slide 32

Slide 32 text

Node A CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Spot Spot interruptions have a 2-minute notice Node A’ Pod Pod Recreate ● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy

Slide 33

Slide 33 text

● Expiration: delete after a speci f ic period ● Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. ● Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. ● Interruption: Replace when a Spot instance is interrupted or becomes unhealthy CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Provisioning ● Find a node that f its all pending pods using the bin-packing FFD algorithm. ● Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. On-demand: lowest-price Spot: price-capacity-optimized Default Strategies:

Slide 34

Slide 34 text

AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05

Slide 35

Slide 35 text

EC2 INSTANCE PURCHASE TYPES Quick Recap

Slide 36

Slide 36 text

CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES On-Demand On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.

Slide 37

Slide 37 text

CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES m5.large t2.medium c4.xlarge spot spot pool spot pool On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.

Slide 38

Slide 38 text

CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES 1-3 years max 72% off On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.

Slide 39

Slide 39 text

ARE WE OPTIMIZING COST ENOUGH?

Slide 40

Slide 40 text

ARE WE OPTIMIZING COST ENOUGH? Discount Plan NOT Supported https://github.com/aws/karpenter/issues/5163

Slide 41

Slide 41 text

KARPENTER USE PUBLIC ON-DEMAND AND SPOT INSTANCE RATE DISCOUNT PLAN NOT SUPPORTED Karpenter-Supported Purchase Types • Karpenter only supports On- Demand and Spot as instance purchase types. It does not care about discount plans. • Reserved Instance and Savings Plan is a popular way to reduce costs. These instances, unlike Spot, are not terminated by AWS. Flexera 2023 State of the Cloud Report

Slide 42

Slide 42 text

IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR EXAMPLE: COST IS NOT OPTIMIZED If you sign up for the following Savings Plans 3 years EC2 Savings Plan Compute Instance Max Savings 66% 72% t3.large Sample (On-Demand: $0.1088) $0.062 (43%) $0.047 (57%) Locked Single Instance Family & Region No Yes t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 Savings Plans for K8S Nodes $0.047 t3.large t3.large 1 hour t3.large t3.large • Instance Saving Plan, Tokyo Region, t3.large for 3 years • Purchase $0.047 commitment per hour 57% off

Slide 43

Slide 43 text

IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR EXAMPLE: COST IS NOT OPTIMIZED If a c5.large On-Demand instance is created: Instance Type Purchase Type Price (Discount) c5.large On-demand $0.096 t3.large On-demand $0.1088 SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • c5.large on-demand instance is cheaper than t3.large. • It will cost an additional $0.096. c5.large t3.large +$0.096 $0.047 Karpenter 1 hour < Compare with c5.larget and t3.large on-demand rate.

Slide 44

Slide 44 text

+$0.033 t3.large IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR EXAMPLE: COST IS NOT OPTIMIZED If a t3.large Spot instance is created: Instance Type Purchase Type Price (Discount) t3.large On-demand $0.1088 Spot $0.033 (73%) SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • t3.large spot is cheaper than t3.large on-demand. • It costs an additional $0.033. t3.large $0.047 Karpenter 1 hour < Compare with t3.larget spot and on-demand rate. spot

Slide 45

Slide 45 text

EXAMPLE: COST IS NOT OPTIMIZED IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR Unoptimized Optimized! 👍 Unused Savings Plans

Slide 46

Slide 46 text

Optimized! 👍 EXAMPLE: COST IS NOT OPTIMIZED IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR Unoptimized Unused Savings Plans Cost Optimization Requires Consideration of SavingsPlan Utilization!

Slide 47

Slide 47 text

HOW DO WE IMPROVE SAVINGS PLAN UTILIZATION?

Slide 48

Slide 48 text

● We need to consider other k8s clusters within the account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s ● Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. ● Only daily averages can be obtained from the GetSavingsPlansUtilization API ● Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data HOW CAN WE CALCULATE SAVINGS PLAN UTILIZATION? CHALLENGING PARTS No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster.

Slide 49

Slide 49 text

HOW CAN WE CALCULATE SAVINGS PLAN UTILIZATION? CHALLENGING PARTS ● Only daily averages can be obtained from the GetSavingsPlansUtilization API ● Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data ● We need to consider other k8s clusters within the account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s ● Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster. Aggregate Savings Plan Data From CUR ✕ Collect Current Instance Status → Simulate Savings Plan Utilization

Slide 50

Slide 50 text

PREDICT PLANS UTILIZATION FEEDBACK ON PREDICATED RESULTS TO KARPENTER S3 Agent Karpenter Node Pool Cost Usage Report Kubernetes Cluster 1. Process CUR & Predicate SavingsPlan Utilization Server Node Pool Node Pool 2. Send Instance Information & Get Predicate Result 3. Update NodePool according to predication

Slide 51

Slide 51 text

Demo

Slide 52

Slide 52 text

AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05 🏆

Slide 53

Slide 53 text

WHAT YOU LEARN TODAY! 1. FinOps aims for cost optimization that balances operational ef f iciency and availability, not just cheapness 2. Autoscaler is a useful tool for cost optimization of k8s, and Karpenter is one of the options 3. In some cases, cost optimization cannot be achieved simply by introducing a tool, such as instances not being created in response to a Savings Plan. KEY TAKEAWAYS

Slide 54

Slide 54 text

THANKS! AYA IGARASHI, @LADICLE CLOUDNATIX INC.