Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes on Spot Instances

Kubernetes on Spot Instances

How Delivery Hero uses Spot Instances in production. The caveats of running on spots can teach you a great deal about the resiliency of your applications.

Vojtěch Vondra

February 27, 2019
Tweet

More Decks by Vojtěch Vondra

Other Decks in Programming

Transcript

  1. Own delivery fleets in 40 countries, delivering over a million

    orders a week Workloads across 4 continents in 3 AWS regions Typically a few hundred Kubernetes nodes running for web and worker workloads Logistics Tech in Delivery Hero
  2. Rails and Spring for web workloads, Scala & Akka for

    real- time apps, Python, R and various solvers for batch tasks Mostly Amazon RDS for PostgreSQL and Amazon DynamoDB for persistence Kubernetes deployments currently transitioning from kops to Amazon EKS, usually 1-2 minor versions behind Logistics Tech in Delivery Hero
  3. Cloud has been an obvious choice for us sample country

    highlighted: we open the floodgates at 11.30am dealing with peaks and spikiness
  4. Focusing on big impact savings first Where are the largest

    marginal improvements? à with the biggest cost contributor EC2 EC2
  5. We could only reserve are base load without peaks or

    be less elastic The business it too volatile for capacity predictions Our workloads change over time unpredictably (memory vs. CPU intensive) … We use them for Kubernetes masters (at least until we’re migrated to EKS) Why not Reserved Instances?
  6. Refresher on spot fleets Instead of a fixed list price,

    there is a floating market price at which instance types are available. This price is typically less than 50% cheaper than the list price. The catch? 1. AWS can take the instance anytime from you with a 2min warning, if it’s needed elsewhere. 2. Some of your preferred instance types can be sold out.
  7. Refresher on spot fleets Spot instance markets are defined by

    an AZ and instance type. If you choose 2 instance types in 3 AZs, you are bidding in 6 different instance spot markets (pools). The more pools you specify, the lower the chance of not being able to maintain target capacity of a fleet.
  8. Termination handling Close all connections Finish any long polling Stop

    in-progress worker jobs Terminate your pods Remove from Load Balancer Re-scale to target capacity curl http://169.254.169.254/latest/meta-data/spot/termination-time
  9. - That’s like Chaos Monkey, right? Yes, but it happens

    24/7, not just business hours, and several instances might disappear at the same time. - My CTO’s never going to risk that! Termination handling
  10. Actual issues arising from the volatility Applications not terminating gracefully

    à abruptly terminated connections, stuck jobs Too much of target capacity being collocated on terminated nodes à too many pods of a deployment being affected New capacity not starting fast enough à a lot of apps starting at the same time can cause CPU starvation
  11. We spent some time making Java pods w/ Spring boot

    up more efficiently They have an ugly pattern of using 100% CPU until all classes are loaded and then idle under load Some help: -XX:TieredStopAtLevel=1 and remove bytecode instrumenting APM monitoring Case study: Spring Boot behavior on boot
  12. DaemonSet running on all nodes which drains the node immediately

    upon seeing the notice. Optional Slack notification to give you a log to correlate monitoring noise / job disruptions with terminations. github.com/helm/charts/tree/master/incubator/kube-spot-termination-notice-handler (don’t worry, you’ll find all the links on the last slide) Spot Termination Notice Handler DaemonSet
  13. Spot instances can stick around for a long time (~1

    year no problem) Pods will pile up on those nodes, Kubernetes won’t reschedule by itself. Descheduler
  14. Prevent pods of same deployment running on same node Target

    node CPU/memory utilization à redistributes pods to less utilized nodes github.com/kubernetes-incubator/descheduler Descheduler
  15. Goal: always have enough capacity to launch new pods Multiple

    strategies in Delivery Hero: 1. Scaling spot fleet based on CPU/RAM reservations, not usage 2. Overprovisioning using PodPriority github.com/helm/charts/tree/master/stable/cluster-overprovisioner Auto-scaling strategy
  16. Auto-scaling spot fleet based on custom metrics DaemonSet AWS CloudWatch

    % CPU/RAM reserved of node Spot Fleet Autoscaling policy 80% CPU reserved 50% CPU reserved
  17. Our dispatching algo runs on Akka It’s a stateful, actor-based

    framework for distributed and concurrent apps The volatility of spots forced us to fix very broken cluster formation and split brain situations Case Study: beyond stateless apps, resiliency with stateful components
  18. Take it step by step: move your most stateless, fast

    to boot up pods over first, then continue one by one and monitor for noise. We migrated from on-demand over to spot within 6 months. No need to rush it our current fleet composition