cheapest infrastructure. Reduced infrastructure costs by removing unnecessary resources, etc. WHAT IS COST OPTIMIZATION? IT DOES NOT JUST MEAN SAVING INFRASTRUCTURE EXPENSES Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced opportunity loss due to improved availability (≈ increased sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration.
INFRASTRUCTURE EXPENSES Cost optimization in FinOps does not just mean using the cheapest infrastructure. Reduced Infrastructure Costs by Removing Unnecessary Resources, Etc. Reduced Operational Costs by Automation Even if the infrastructure cost can be reduced, it will be a problem if more manual operations are required to maintain it. Reduced Opportunity Loss due to Improved Availability (≈ Increased Sales) Simply replacing infrastructure with the cheapest option risks lost business due to service outages or quality deterioration. STRIKING THE OPTIMAL BALANCE
YOU CAN LEARN TODAY! Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
reducing the cost of wasted resources. AUTOSCALER OPTIMIZE COST THE BENEFITS OF AUTOSCALER Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales)
based on current needs, which helps in reducing the cost of wasted resources. Autoscaler automatically adjusts to resource requirement changes, reducing work e ff orts like monitoring and manual adjustments. Scaling without human intervention reduces the risk of human error and delays. Reduced infrastructure costs by removing unnecessary resources, etc. Reduced Operational Costs by Automation Reduced opportunity loss due to improved availability (≈ increased sales) AUTOSCALER REDUCES EACH COST!
Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter ONE OF KUBERNETES CLUSTER AUTOSCALER WHAT IS KARPENTER? 2021/11/26: General available 01 02 03 2023/11/06: Support Azure 2023/11/23: Migrate to sig-autoscaler
kuberentes-sigs/ karpenter v0.33.0 v0.2.0 v0.33.0 One of the k8s cluster autoscaler developed by AWS https://karpenter.sh/ https://github.com/aws/karpenter Supported CSPs are AWS and Azure https://github.com/Azure/karpenter Core features are maintained by k8s community https://github.com/kubernetes-sigs/karpenter
CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Node Group AWS ASG Large Node Group AWS ASG Small 1. Unschedulable 2. Create a new node from NG (small)
CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Node 2. Create a new Node AWS Fleet Node Pool Node Group AWS ASG Pod 1. Unschedulable
CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter CAS Pod Node Pod Pod Node Karpenter Pod Node Pod Pod Node 1. Unschedulable 2. Create a new (medium) Node AWS Fleet Node Pool Node Group AWS ASG
CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter • Nodes are added based on the NodeGroup settings of the CAS • Only one node of information per NodeGroup is used to simulate scaling • To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale • Karpenter creates a new node based on the requirements con f igured in NodePool. • One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. • In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG.
CAS? Kubernetes Cluster Autoscaler (CAS) Karpenter • Nodes are added based on the NodeGroup settings of the CAS • Only one node of information per NodeGroup is used to simulate scaling • To create a Node of a new instance type, NodeGroup must be added, and too many NodeGroups take time to scale • Karpenter creates a new node based on the requirements con f igured in NodePool. • One NodePool can have various types of node information, and Karpenter uses all of them in scaling simulation. • In the case of AWS, Karpenter launches an instance faster than CAS since it directly calls Fleet API without going through the EC2 ASG. LESS MANUAL WORK, FASTER SCALING!
Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
KARPENTER OPTIMIZE COSTS! Provisioning Karpenter Node AWS Fleet Node Pool Pod Pod Pod Pod Pod Pod Pod Pod 1. Grouping pending Pods and Finding f itting candidates. 3. Create an optimal instance based on strategy • Find a node that f its all pending pods using the bin-packing FFD algorithm. • Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. 2. Send max 60 Candidates Save Costs! On-demand: lowest-price Spot: price-capacity-optimized Default Strategies: 27
KARPENTER OPTIMIZE COSTS! Disruption Controllers • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy Karpenter Pod Node Pod Node Pod Node Delete Unnecessary or Costly Nodes Save Costs!
SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Pod Expire 2h Delete after 2 hours • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node B Pod Node A’ Pod Pod $0.05 $0.02 Recreate a new node Reallocate a Pod Pod • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
KARPENTER OPTIMIZE COSTS! Disruption Controllers Node A Pod Node A’ Pod Recreate Im age v1 Im age v2 Desired state of machine image was changed from v1 to v2. Node Pool v2 • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Spot Spot interruptions have a 2-minute notice Node A’ Pod Pod Recreate • Expiration: delete after a speci f ic period • Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy
Consolidation: delete all pods on a Node if they can be moved to another Node, or replace them if a cheaper instance is available. • Drift: Remove when a NodePool setting changes or the existing instance di ff ers from the desired state. • Interruption: Replace when a Spot instance is interrupted or becomes unhealthy CREATE THE RIGHT SIZE INSTANCE AND DELETE THE UNNECESSARY INSTANCE KARPENTER OPTIMIZE COSTS! Disruption Controllers Provisioning • Find a node that f its all pending pods using the bin-packing FFD algorithm. • Create the best instance according to the strategy with a maximum of 60 candidates larger than the f itted one. On-demand: lowest-price Spot: price-capacity-optimized Default Strategies:
Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05
On-Demand On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
m5.large t2.medium c4.xlarge spot spot pool spot pool On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
1-3 years max 72% off On-Demand Instance The standard price for EC2 instances. Spot Instance Price is determined by supply and demand, and instances may be interrupted, but large discounts often be o ff ered. SavingsPlans / Reserved Instance A long-term commitment of 1-3 years for On- Demand services with a large discount and no interruptions.
NOT SUPPORTED Karpenter-Supported Purchase Types • Karpenter only supports On- Demand and Spot as instance purchase types. It does not care about discount plans. • Reserved Instance and Savings Plan is a popular way to reduce costs. These instances, unlike Spot, are not terminated by AWS. Flexera 2023 State of the Cloud Report
COST IS NOT OPTIMIZED If you sign up for the following Savings Plans 3 years EC2 Savings Plan Compute Instance Max Savings 66% 72% t3.large Sample (On-Demand: $0.1088) $0.062 (43%) $0.047 (57%) Locked Single Instance Family & Region No Yes t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 Savings Plans for K8S Nodes $0.047 t3.large t3.large 1 hour t3.large t3.large • Instance Saving Plan, Tokyo Region, t3.large for 3 years • Purchase $0.047 commitment per hour 57% off
COST IS NOT OPTIMIZED If a c5.large On-Demand instance is created: Instance Type Purchase Type Price (Discount) c5.large On-demand $0.096 t3.large On-demand $0.1088 SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • c5.large on-demand instance is cheaper than t3.large. • It will cost an additional $0.096. c5.large t3.large +$0.096 $0.047 Karpenter 1 hour < Compare with c5.larget and t3.large on-demand rate.
$0.047/HR EXAMPLE: COST IS NOT OPTIMIZED If a t3.large Spot instance is created: Instance Type Purchase Type Price (Discount) t3.large On-demand $0.1088 Spot $0.033 (73%) SavingsPlan $0.047 (57%) t3.large, Tokyo Region, 3 years, No Upfront, Linux 2023/11/15 21:00 EC2 Instance Pricing List • t3.large spot is cheaper than t3.large on-demand. • It costs an additional $0.033. t3.large $0.047 Karpenter 1 hour < Compare with t3.larget spot and on-demand rate. spot
account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s • Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. • Only daily averages can be obtained from the GetSavingsPlansUtilization API • Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data HOW CAN WE CALCULATE SAVINGS PLAN UTILIZATION? CHALLENGING PARTS No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster.
Only daily averages can be obtained from the GetSavingsPlansUtilization API • Cost and Usage Report (AWS CUR) can get hourly utilization, but it’s not real-time, as it is basically the previous day's data • We need to consider other k8s clusters within the account (or shared organization) where the Savings Plan was purchased, as well as instances used outside of k8s • Savings Plans are often purchased at di ff erent times, making it challenging to manage the status of each plan. No API to obtain Savings Plan utilization in real-time Utilization cannot be calculated using only instance information in a K8s cluster. Aggregate Savings Plan Data From CUR ✕ Collect Current Instance Status → Simulate Savings Plan Utilization
Agent Karpenter Node Pool Cost Usage Report Kubernetes Cluster 1. Process CUR & Predicate SavingsPlan Utilization Server Node Pool Node Pool 2. Send Instance Information & Get Predicate Result 3. Update NodePool according to predication
Karpenter optimize cost? 04 Karpenter What is Karpenter? 03 Cost Optimization What is the cost optimization? 01 Autoscalers Bene f its of autoscaler in cost optimization 02 Optimize More! Optimize your costs more with Karpenter 05 🏆
that balances operational ef f iciency and availability, not just cheapness 2. Autoscaler is a useful tool for cost optimization of k8s, and Karpenter is one of the options 3. In some cases, cost optimization cannot be achieved simply by introducing a tool, such as instances not being created in response to a Savings Plan. KEY TAKEAWAYS