Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FinOps! Optimizing Kubernetes Costs with Karpenter

FinOps! Optimizing Kubernetes Costs with Karpenter

Aya (Igarashi) Ozawa

March 15, 2024
Tweet

More Decks by Aya (Igarashi) Ozawa

Other Decks in Technology

Transcript

  1. Cost optimization in FinOps does not just mean using the

    cheapest infrastructure. Reduced infrastructure costs by removing unnecessary resources, etc. WHAT IS COST OPTIMIZATION? IT DOES NOT JUST MEAN SAVING INFRASTRUCTURE EXPENSES Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced opportunity loss due to improved availability (≈ increased sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration.
  2. WHAT IS COST OPTIMIZATION? IT DOES NOT JUST MEAN SAVING

    INFRASTRUCTURE EXPENSES Cost optimization in FinOps does not just mean using the cheapest infrastructure. Reduced Infrastructure Costs by Removing Unnecessary Resources, Etc. Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced Opportunity Loss due to Improved Availability (≈ Increased Sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration. STRIKING THE OPTIMAL BALANCE
  3. CLOUD NATIVE DAYS TOKYO 2023 AYA IGARASHI @LADICLE FINOPS! Reducing

    Optimizing Kubernetes Costs With Karpenter
  4. AGENDA Optimize Cost How does Karpenter optimize cost? 04 WHAT

    YOU CAN LEARN TODAY! Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
  5. Resources are allocated based on current needs, which helps in

    reducing the cost of wasted resources. AUTOSCALER OPTIMIZE COST THE BENEFITS OF AUTOSCALER Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales)
  6. AUTOSCALER OPTIMIZE COST THE BENEFITS OF AUTOSCALER Resources are allocated

    based on current needs, which helps in reducing the cost of wasted resources. Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales) AUTOSCALER REDUCES EACH COST!
  7. AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does

    Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
  8. THE TYPES OF AUTOSCALERS KUBERNETES AUTOSCALING Workload(/Pod) Autoscaling Cluster(/Node) Autoscaling

    Pod Node scale Pod Pod Node scale Pod Pod Node scale Pod New! Node
  9. One of the k8s cluster autoscaler developed by AWS https://karpenter.sh/

    https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter ONE OF KUBERNETES CLUSTER AUTOSCALER WHAT IS KARPENTER? 2021/11/26: General available 01 02 03 2023/11/06: Support Azure 2023/11/23: Migrate to sig-autoscaler
  10. ONE OF KUBERNETES CLUSTER AUTOSCALER WHAT IS KARPENTER? aws/karpenter Azure/karpenter

    kuberentes-sigs/ karpenter v0.33.0 v0.2.0 v0.33.0 One of the k8s cluster autoscaler developed by AWS https://karpenter.sh/ https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter
  11. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod 1. Unschedulable Node 2. Create a new node from NG Node Group AWS ASG
  12. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Node Group AWS ASG Large Node Group AWS ASG Small 1. Unschedulable 2. Create a new node from NG (small)
  13. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Node 2. Create a new Node AWS Fleet Node Pool Node Group AWS ASG Pod 1. Unschedulable
  14. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Pod Node 1. Unschedulable 2. Create a new (medium) Node AWS Fleet Node Pool Node Group AWS ASG
  15. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter • Nodes are added based on the NodeGroup settings of the CAS • Only one node of information per NodeGroup is used to simulate scaling • To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale • Karpenter creates a new node based on the requirements con f igured in NodePool. • One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. • In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG.
  16. LESS MANUAL OPERATION, FASTER SCALE (AWS) WHAT IS DIFFERENT FROM

    CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter • Nodes are added based on the NodeGroup settings of the CAS • Only one node of information per NodeGroup is used to simulate scaling • To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale • Karpenter creates a new node based on the requirements con f igured in NodePool. • One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. • In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG. LESS MANUAL WORK, FASTER SCALING!
  17. AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does

    Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
  18. CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE

    KARPENTER OPTIMIZE COSTS! Provisioning Karpenter Node AWS Fleet Node Pool Pod Pod Pod Pod Pod Pod Pod Pod 1. Grouping pending Pods and Finding f itting candidates. 3. Create an optimal instance based on strategy • Find a node that f its all pending pods using the bin-packing FFD algorithm. • Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. 2. Send max 60 Candidates Save Costs! On-demand: lowest-price Spot: price-capacity-optimized Default Strategies: 27
  19. CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE

    KARPENTER OPTIMIZE COSTS! Disruption Controllers • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy Karpenter Pod Node Pod Node Pod Node Delete Unnecessary or Costly Nodes Save Costs!
  20. Node A Pod Node A Pod Pod CREATE THE RIGHT

    SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Pod Expire 2h Delete after 2 hours • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
  21. CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE

    KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node B Pod Node A’ Pod Pod $0.05 $0.02 Recreate a new node Reallocate a Pod Pod • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
  22. CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE

    KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node A’ Pod Recreate Im age v1 Im age v2 Desired state of machine image was changed from v1 to v2. Node Pool v2 • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
  23. Node A CREATE THE RIGHT SIZE INSTANCE AND DELETE THE

    UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Spot Spot interruptions have a 2-minute notice Node A’ Pod Pod Recreate • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
  24. • Expiration: delete after a speci f ic period •

    Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Provisioning • Find a node that f its all pending pods using the bin-packing FFD algorithm. • Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. On-demand: lowest-price Spot: price-capacity-optimized Default Strategies:
  25. AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does

    Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
  26. CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES

    On-Demand On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
  27. CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES

    m5.large t2.medium c4.xlarge spot spot pool spot pool On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
  28. CHARACTERISTICS OF PURCHASE TYPES AND COST EC2 INSTANCE PURCHASE TYPES

    1-3 years max 72% off On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
  29. KARPENTER USE PUBLIC ON-DEMAND AND SPOT INSTANCE RATE DISCOUNT PLAN

    NOT SUPPORTED Karpenter-Supported Purchase Types • Karpenter only supports On- Demand and Spot as instance purchase types. It does not care about discount plans. • Reserved Instance and Savings Plan is a popular way to reduce costs. These instances, unlike Spot, are not terminated by AWS. Flexera 2023 State of the Cloud Report
  30. IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR EXAMPLE:

    COST IS NOT OPTIMIZED If you sign up for the following Savings Plans 3 years EC2 Savings Plan Compute Instance Max Savings 66% 72% t3.large Sample (On-Demand: $0.1088) $0.062 (43%) $0.047 (57%) Locked Single Instance Family & Region No Yes t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 Savings Plans for K8S Nodes $0.047 t3.large t3.large 1 hour t3.large t3.large • Instance Saving Plan, Tokyo Region, t3.large for 3 years • Purchase $0.047 commitment per hour 57% off
  31. IN CASE YOU HAVE A SAVINGS PLAN FOR $0.047/HR EXAMPLE:

    COST IS NOT OPTIMIZED If a c5.large On-Demand instance is created: Instance Type Purchase Type Price (Discount) c5.large On-demand $0.096 t3.large On-demand $0.1088 SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • c5.large on-demand instance is cheaper than t3.large. • It will cost an additional $0.096. c5.large t3.large +$0.096 $0.047 Karpenter 1 hour < Compare with c5.larget and t3.large on-demand rate.
  32. +$0.033 t3.large IN CASE YOU HAVE A SAVINGS PLAN FOR

    $0.047/HR EXAMPLE: COST IS NOT OPTIMIZED If a t3.large Spot instance is created: Instance Type Purchase Type Price (Discount) t3.large On-demand $0.1088 Spot $0.033 (73%) SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • t3.large spot is cheaper than t3.large on-demand. • It costs an additional $0.033. t3.large $0.047 Karpenter 1 hour < Compare with t3.larget spot and on-demand rate. spot
  33. EXAMPLE: COST IS NOT OPTIMIZED IN CASE YOU HAVE A

    SAVINGS PLAN FOR $0.047/HR Unoptimized Optimized! 👍 Unused Savings Plans
  34. Optimized! 👍 EXAMPLE: COST IS NOT OPTIMIZED IN CASE YOU

    HAVE A SAVINGS PLAN FOR $0.047/HR Unoptimized Unused Savings Plans Cost Optimization Requires Consideration of SavingsPlan Utilization!
  35. • We need to consider other k8s clusters within the

    account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s • Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. • Only daily averages can be obtained from the GetSavingsPlansUtilization API • Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data HOW CAN WE CALCULATE SAVINGS PLAN UTILIZATION? CHALLENGING PARTS No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster.
  36. HOW CAN WE CALCULATE SAVINGS PLAN UTILIZATION? CHALLENGING PARTS •

    Only daily averages can be obtained from the GetSavingsPlansUtilization API • Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data • We need to consider other k8s clusters within the account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s • Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster. Aggregate Savings Plan Data From CUR ✕ Collect Current Instance Status → Simulate Savings Plan Utilization
  37. PREDICT PLANS UTILIZATION FEEDBACK ON PREDICATED RESULTS TO KARPENTER S3

    Agent Karpenter Node Pool Cost Usage Report Kubernetes Cluster 1. Process CUR & Predicate SavingsPlan Utilization Server Node Pool Node Pool 2. Send Instance Information & Get Predicate Result 3. Update NodePool according to predication
  38. AGENDA WHAT YOU CAN LEARN TODAY! Optimize Cost How does

    Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05 🏆
  39. WHAT YOU LEARN TODAY! 1. FinOps aims for cost optimization

    that balances operational ef f iciency and availability, not just cheapness 2. Autoscaler is a useful tool for cost optimization of k8s, and Karpenter is one of the options 3. In some cases, cost optimization cannot be achieved simply by introducing a tool, such as instances not being created in response to a Savings Plan. KEY TAKEAWAYS