Scaling Kubernetes - Boston DevOps Meetup

Slide 1

Slide 1 text

S C A L I N G K U B E R N E T E S F O R T H E B O S E C L O U D P L A T F O R M B o s t o n D e v O p s M e e t u p A p r i l 2 4 t h , 2 0 1 9 1

Slide 2

Slide 2 text

Let’s go explore the Mariana Trench, shall we? 2

Slide 3

Slide 3 text

3 o I help businesses scale by narrating their strategy and the distributed systems they need to succeed. o Senior Cloud Engineer @ Bose o I help build and operate an IoT platform for cloud-connected speakers o Team of 1 = Gen 1 – Kube 1.5 o Team of 3 = Gen 2 – Kube 1.8 o Team of 6 = Gen 3 – Kube 1.15 (next) Hi, I’m Myles. @masteinhauser

Slide 4

Slide 4 text

“ E V E R Y T H I N G T H A T C A N B E I N V E N T E D H A S B E E N I N V E N T E D . ” C H A R L E S H . D U E L L , C O M M I S S I O N E R , U . S . P A T E N T O F F I C E , 1 8 9 9 4

Slide 5

Slide 5 text

A U T O M O T I V E W H Y D O E S B O S E E V E N N E E D A C L O U D A N Y W A Y ? 5

Slide 6

Slide 6 text

Slide 7

Slide 7 text

The Bose Kubernetes Platform 7 o Custom install, through and through o Terraform o Ansible o Helm (+ static manifests) o Custom Kubernetes Binaries o Kube 1.8.15 + custom CVE backports, Calico v2.6.8 o 3 Kube Masters (API, Controller Manager, Scheduler) o etcd-main is colocated on the masters (BAD!) - m4.4xlarge o etcd-events 6 nodes, over 3 AZs - m4.4xlarge o Running on AWS EC2 o US-EAST1 (Virginia)

Slide 8

Slide 8 text

Slide 9

Slide 9 text

# D E V O P S 9

Slide 10

Slide 10 text

Current Cluster Stats 10 • 465 Nodes, across many instance types each in "placement groups" for workload scheduling and segmentation • > 6,000 CPU Cores • > 26TB of RAM • > 6,000 Pods (13 pods/node, deeply skewed on the low end) • ~ 2,000 Services • ~ 2,000 Namespaces Yes, we run databases on Kubernetes. Dozens of them! https://kubernetes.io/docs/concepts/configuration/t aint-and-toleration/

Slide 11

Slide 11 text

Scaling Kubernetes 11 • I CANNOT POSSIBLY COVER EVERYTHING. Let's all chat afterwards, instead. • Surprisingly possible! • The defaults will get you far farther than you think • Design with the defaults in mind (more smaller clusters are "safer")

Slide 12

Slide 12 text

Technical Solutions won't fix your Organizational Problems 12 • The worst problem you will have is being successful... on a single cluster. • Strategic technical debt so the business could hit its deliverable (Late-Summer, 2018). • Technical debt + Organizational Fear we missed when we (as engineers) made those decisions. • Technical debt isn’t bad, but important for engineers to understand the greater environment for their daily decision making! • "Take on mortgages, not payday loans."

Slide 13

Slide 13 text

Technical Solutions won't fix your Organizational Problems 13 • If you don't plan for multi-cluster from early on (perhaps not Day 1, but Day 30?) you might never get there. • Updates and upgrades, of anything in the system, should be common place and accepted. • Determine what level of failure is acceptable to the business as something is always raining in the cloud.

Slide 14

Slide 14 text

A messy Cluster is a noisy Pager 14 • Rates-of-churn are the most important impact factor for the stability of the cluster. • New Deployments • Pods in CrashLoopBackoff • Nodes thrashing due to OOMKiller or shared disk and network I/O • Namespaces per-service, cluster per-environment per-region is likely an acceptable trade-off. • Programmatic RBAC and PodSecurityPolicy management is crucial to secure clusters (this has saved us from multiple CVEs) • https://github.com/reactiveops/rbac-manager • Every app that integrates with the Kube API gets a separate ServiceAccount, and disabling them by default is a big win. • We have not yet, but are thinking of deploying a proxy in front of Kube API for rate-limiting, etc.

Slide 15

Slide 15 text

Lets talk about "Defaults" 15 At v1.14, Kubernetes supports clusters with up to 5000 nodes. More specifically, we support configurations that meet all of the following criteria: No more than 5000 nodes No more than 150000 total pods No more than 300000 total containers No more than 100 pods per node You cannot do all of these all at once, it's a multi- dimensional constraint optimization problem. *** None of this includes or talks about the Rate-of- Churn problem! *** https://kubernetes.io/docs/setup/cluster-large/ http://www.tshirtvortex.net/charles-darwin-riding-a-tortoise/

Slide 16

Slide 16 text

Kubernetes Scalability Continuum 16 "Kubernetes Scalability: A Multi- Dimensional Analysis - Maciek Różacki, Google & Shyam Jeedigunta" https://kccna18.sched.com/event/GrXy https://youtu.be/t_Ww6ELKl4

Slide 17

Slide 17 text

Let's talk about Bottlenecks 17 Welcome to the Rodeo. Trust me, you’re going to need all the metrics and dashboards you can get. This is when all that Anomaly Detection comes in handy. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/

Slide 18

Slide 18 text

Let's talk about Bottlenecks 18 etcd: • Writes are full-quorum latency bound: slowest node = (network + disk) • Gets hit for every write /and/ every read with a full quorum • Multiple Watches get reduced down to 1, if they have the same arguments. • Higher redundancy explicitly reduces performance! https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/

Slide 19

Slide 19 text

Let's talk about Bottlenecks 19 kube-apiserver: • Easy to scale out, especially independently of etcd servers when they're not collocated! • --target-ram-mb • The biggest lever available to you • --max-requests-inflight • If you have a significant amount of Kube API integration, dedicated API servers for the Kube Control Plane and your integration is very helpful. kube-controller-manager: • Oh so many flags that can be tuned. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/

Slide 20

Slide 20 text

Let's talk about Bottlenecks 20 kube-apiserver: • Easy to scale out, especially independently of etcd servers when they're not collocated! • --target-ram-mb • The biggest lever available to you • --max-requests-inflight • If you have a significant amount of Kube API integration, dedicated API servers for the Kube Control Plane and your integration is very helpful. kube-controller-manager: • Oh so many flags that can be tuned. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/

Slide 21

Slide 21 text

Let's talk about Bottlenecks 21 kube-proxy: "It couldn't possibly be a DNS problem." • Transparent load balancer running on every Kube Worker to assist in Service Discovery and East-West request handling. • Helpful until it breaks in dark and mysterious ways • conntrack table failures • Delayed updates • Uncoordinated updates • Iptables locking with CNI plugin https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/

Slide 22

Slide 22 text

A U T O M O T I V E A T B O S E , W E ’ R E O B S E S S E D W I T H P E R F O R M A N C E O N W H A T M A T T E R S M O S T : T H E L I T T L E D E T A I L S T H A T M A K E A B I G D I F F E R E N C E A N D T H E B I G D E T A I L S T H A T A S T O N I S H . 22

Slide 23

Slide 23 text

BUTWHATABOUT 23 Kubernetes overall is: • largely indifferent from most traditional application stacks and data planes. • load balancers like HAProxy or Traefik still have their own non-Kube performance knowledge. • Applications within Pods scale and behave largely the same as a traditional system. • Except for Java… sometimes

Slide 24

Slide 24 text

Takeaways 24 • PLEASE, whatever you decide don't build it yourself in 2018 and beyond. • kubeadm has come a drastically long way • Use a vendor when you can, GKE is what I recommend personally (even though we don't use it) • Shout-out to Sarah @ ReactiveOps • - Many, many vendors in this space right now • Kubernetes *will* scale better than you might expect... or want it to. • Remember to focus on the business differentiators, does operating and scaling Kubernetes actually make sense for your BUSINESS?