Slide 1

Slide 1 text

Karpenter @ Lightspeed

Slide 2

Slide 2 text

Charles Guertin Responsible for the reliability of three different Lightspeed Hospitality products Site Reliability Expert @ Lightspeed

Slide 3

Slide 3 text

1 2 3 4 5 3 Deploying Karpenter Karpenter Provisioning Node Consolidation Cluster Overprovisioning Q&A Agenda

Slide 4

Slide 4 text

Deploying Karpenter

Slide 5

Slide 5 text

Deploying Karpenter

Slide 6

Slide 6 text

Karpenter Provisioning

Slide 7

Slide 7 text

Karpenter Provisioning ● Different provisioners for each environment ● Settings based on needs per environment, reliability requirements, capacity & costs

Slide 8

Slide 8 text

Node Consolidation

Slide 9

Slide 9 text

Node Consolidation ● Allows rebalancing of your cluster nodes ○ When enabled, Karpenter will optimize capacity so that you don’t have under-utilized nodes ● Cost savings ○ Lower costs due to consolidation / packing ○ Less nodes = less money spent ● Requirements ○ Resources requests must be set, in order for Karpenter to properly resize a cluster

Slide 10

Slide 10 text

Node Consolidation ● What about services that can’t tolerate rescheduling? ○ Use the do-not-evict=true annotation ○ Karpenter will not rebalance nodes running pods marked with this annotation ● Scheduled Node consolidation in dev environment ○ Enable consolidation at night ○ Reduce rebalancing in busy clusters during the day - so that devs aren’t affected by it

Slide 11

Slide 11 text

Cluster Overprovisioning

Slide 12

Slide 12 text

Cluster Overprovisioning ● Complimentary to Karpenter, separate Deployment ○ image: registry.k8s.io/pause ○ Each replica of this deployment fills a single node ● Extra empty Nodes stand-by ○ These nodes are waiting for Karpenter scale up so that new workloads can be scheduled on these fresh nodes ○ This removes the waiting time of creating new nodes for upscaling workloads ● TL;DR: Faster workload upscaling ○ Karpenter upscaling time = 2 minutes ○ Overprovisioning upscaling time = 0 minutes.

Slide 13

Slide 13 text

13 Cluster Overprovisioning - how it works Node 2 Stand-by 1 Node 3 Node 4 Node 1 Stand-by 2

Slide 14

Slide 14 text

Q&A

Slide 15

Slide 15 text

Thank you :)