Slide 1

Slide 1 text

@

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

Me @NU.nl

Slide 4

Slide 4 text

NU.nl About • First dutch digital news platform. • Unique visitors: • 7 mln. / month • 2.1 mln. / day • Page hits: ~12 mln / day • API: ~150k rpm / 2500rps

Slide 5

Slide 5 text

NU.nl Sanoma • Part of Sanoma • NL: NU.nl, Viva, Libelle, Scoopy • FI: Helsingin Sanomat • Reaching ~9.8 mln dutch people / month

Slide 6

Slide 6 text

IT organization Teams • NU.nl teams • Web 1 (application / front-end-ish) • Web 2 (application / back-end-ish / infra) • Feature 1 & 2 (cross-discipline) • iOS • Android • Sanoma teams • DevSupport, Mediatool, Content Aggregation

Slide 7

Slide 7 text

NU.nl Growing number of teams • Increased number of parallel workflows • Testing • Releasing • Roadmaps • Knowing about everything no longer possible • Aligning ‘procedures by agreement’ increasingly hard

Slide 8

Slide 8 text

Why Kubernetes?

Slide 9

Slide 9 text

Current infrastructure AWS accounts & VPCs VPC sanoma RDS Elasticache ALBs EC2 Cloudfront API CMS WWW XYZ VPC nu-test FOO K8S VPC nu-prod BAR K8S

Slide 10

Slide 10 text

Infrastructure provisioning Terrible (Terraform + Ansible) terrible plan terrible apply terrible ansible

Slide 11

Slide 11 text

Development workflow From code to release • Code • Automated tests • Code review • Manually initiated deploy to test • Feature test • Manually initiated deploy to staging • Exploratory test • Manually initiated deploy to production

Slide 12

Slide 12 text

DevOps practices Solid foundation • All infra in code • Terraform • Terrible providing mechanisms: • Authorization • Managing TF state files

Slide 13

Slide 13 text

DevOps practices But… • Setting up additional test environments slow • Slow feedback loop • Terraform plan vs apply (surprise surprise, it didn’t work) • Ansible (~20 minutes) • Vagrant? (but not fully representative of EC2) • Config drift • Hard to nail down every system package version • EC2 instances having different lifecycle

Slide 14

Slide 14 text

DevOps practices But… (part 2) • No scaling infra* • Heavily invested in Ansible • Config & secrets management problematic • GUIs time consuming • No change history • Or highly detached from code history • No context • Not overly secret *Yes, we know it’s 2019

Slide 15

Slide 15 text

DevOps practices But… (part 3) • Current deployment system assumes fixed set of servers • Possible alternatives include: • ASG rolling updates (can get slow) • Pull current application code on start-up (even slower) • Bake AMI • Periodically poll for application version to be deployed • Works quite well • …as long as new code combined with config doesn’t break. • So a certain level of orchestration would be needed.

Slide 16

Slide 16 text

Where to start? Everything’s connected

Slide 17

Slide 17 text

Timing What direction to move? • DevOps challenges • Desire to improve delivery process, having true artifacts • Early 2018 • Containers are a well-established way of ‘packaging’ an application • Kubernetes getting out of early-adopters phase • NU.nl (re-)launching a new product: NUjij

Slide 18

Slide 18 text

Improvement layers A journey or a destination? 1: Containers as artifacts • Versatile • Forces us to do certain things right • 12factor • Centralized logging • Easily moved through a pipeline • Lots of tooling

Slide 19

Slide 19 text

Improvement layers A journey or a destination? 2: A flexible platform to deploy and run containerized applications on • Tackling challenges at platform level instead of per-application: • Scaling • Security updates • Observability • Deployment & configuration process

Slide 20

Slide 20 text

Improvement layers A journey or a destination? 2: A flexible platform to deploy and run containerized applications on • Kubernetes • Rapidly increasing adoption • Short feedback loop • Ability to run locally (unlike, say, ECS) • Easily stamp out deployments for: • feature testing/demo-ing • e2e tests

Slide 21

Slide 21 text

Narrowing the scope Lets not get carried away The goal is not: • To chop up change all of our applications into nano- micro-services • They’re not that monolithic anyway • To put everything in Kubernetes • Managed AWS services where possible • Redis, RDS Focus on agility and efficiency of what we change most frequently: Code

Slide 22

Slide 22 text

Initial cluster setup The journey begins

Slide 23

Slide 23 text

Multiple clusters By criticality 3 AWS accounts, 3 clusters: • osc-nu-prod • production • osc-nu-test • test • staging • osc-nu-dev • proofing infra changes

Slide 24

Slide 24 text

Kops Why Kops? • Manages cluster upgrades • Rolling upgrade • Draining nodes • EKS not yet available • Let alone in eu-west-1

Slide 25

Slide 25 text

Kops Glueing together cluster setup and kube-system setup

Slide 26

Slide 26 text

Kops Upgrading a cluster

Slide 27

Slide 27 text

Kops Upgrading a cluster

Slide 28

Slide 28 text

Kops Templating Terraform and custom vars

Slide 29

Slide 29 text

Components kube-system • Networking • Calico • EFS • previousnext/k8s-aws-efs • No AZ-restrictions when re-scheduling pods • Creates new EFS filesystem for each PersistentVolumeClaim • Security & reliability (isolated IOPs budgets) • Slow on initial deploy

Slide 30

Slide 30 text

Components kube-system • AWS IAM Authenticator • The ‘Zalando suite’ • Skipper • Skipper Daemonset • kube-ingress-aws-controller Deployment • ExternalDNS • Configures PowerDNS (& others) based on ingress host

Slide 31

Slide 31 text

Components Zalando skipper • Skipper Daemonset • Feature rich (metrics, shadow traffic, blue/green) • kube-ingress-aws-controller Deployment • https://github.com/zalando-incubator/kube- ingress-aws-controller • Sets up & manages ALB • Finds appropriate ACM certificate • Supports multiple ACM certificates per ALB

Slide 32

Slide 32 text

Components Autoscaling • Horizontal Pod Autoscaler • Scales number of pods based on (CPU) utilization • Cluster autoscaler • Running on master nodes • Scales asg out when pods pending • Scales asg in when nodes underutilized

Slide 33

Slide 33 text

Components Logging & metrics • ELK • Prometheus / Grafana

Slide 34

Slide 34 text

Jenkins Build & Deploy pipeline

Slide 35

Slide 35 text

Jenkins Temporary deployment for running tests • Deploy to temp. namespace • Jenkins-SU • Run tests in deployment • Deploy to test/staging/production • By bumping image version • Production: Jenkins-SU • Clean up temp. namespace • Jenkins-SU

Slide 36

Slide 36 text

Jenkins Jenkins-SU • Sets up namespace • Adding RBAC for Jenkins • Only if ns name matches pattern ‘Jenkins-*’ • Deletes namespace • Only if ns name matches pattern ‘Jenkins-*’ • Avoids need for Jenkins to be able to delete every namespace curl -X POST --user ${JENKINS_SU_AUTH} --data '{\"name\": \"${K8S_BUILD_NS}\"}' http://su.jenkins-su/ns/ curl -X DELETE --user ${JENKINS_SU_AUTH} --data '{\"name\": \"${K8S_BUILD_NS}\"}' http://su.jenkins-su/ns/

Slide 37

Slide 37 text

Kubernetes in action

Slide 38

Slide 38 text

Kubernetes in action Questions • Will it be stable? • Will we be able to operate? • Should we wait for EKS? • Do we actually want EKS? What will EKS be like?

Slide 39

Slide 39 text

Learning from failure

Slide 40

Slide 40 text

1 No memory limits

Slide 41

Slide 41 text

Incident 1 Accidentally trying to load a ElasticSearch index of 90Gb • Misconfigured elast-alert (trying to read entire index) • No memory limit configured

Slide 42

Slide 42 text

Incident 1 Accidentally trying to load a ElasticSearch index of 90Gb • Required manual intervention: Yes • Stopping the bleeding: • Remove elast-alert • Permanent fixes: • Don’t load entire index • Apply limits

Slide 43

Slide 43 text

2 No CPU limits

Slide 44

Slide 44 text

Incident 2 Rapid traffic increase affecting core components • 2019-03-18 Utrecht shooting • 11:11 First article published • 11:56 breaking push • CPU burstable pods causing node 100% CPU • Core components (kubelet, ingress) suffering

Slide 45

Slide 45 text

Incident 2 Rapid traffic increase affecting core components

Slide 46

Slide 46 text

Incident 2 Rapid traffic increase affecting core components

Slide 47

Slide 47 text

Incident 2 Rapid traffic increase affecting core components

Slide 48

Slide 48 text

Incident 2 Rapid traffic increase affecting core components

Slide 49

Slide 49 text

Incident 2 Rapid traffic increase affecting core components pod pod kubelet skipper node Pods: 0.4 CPU req. 0.8 CPU limit 80% CPU utilization pod kubelet skipper node pod Pods: 0.4 CPU req. 0.8 CPU limit 120% CPU utilization problems

Slide 50

Slide 50 text

Incident 2 Rapid traffic increase affecting core components • Required manual intervention: No • Fixes: • Reduce CPU burstable amount of pods • Increase resource requests of skipper • Mind QoS: Guaranteed, Burstable, Best effort • Reserve cpu & memory for kubelet • --kube-reserved • --system-reserved

Slide 51

Slide 51 text

3 Memory limits

Slide 52

Slide 52 text

OOMkiller

Slide 53

Slide 53 text

Incident 3 Application update increasing memory footprint • Upgrade including moving from MongoDB 3 to MongoDB 4 • HorizontalPodAutoscaler based on CPU • Scaling based on CPU not kicking in • New increased memory footprint causing OOMkilled

Slide 54

Slide 54 text

Incident 3 Application update increasing memory footprint

Slide 55

Slide 55 text

Incident 3 Application update increasing memory footprint • Required manual intervention: Yes • Stopping the bleeding: • Increase memory limit of Talk pods • Permanent fixes: • Adjust CPU request/limit & HPA thresholds • Scale on both CPU and memory • Note: Not all applications ‘give back’ memory • Set memory limit higher than request to prevent ‘snowball effect’

Slide 56

Slide 56 text

Incident 3 OOMKilled snowball effect pod pod pod pod pod pod pod pod pod pod starting … 1 2 3 4

Slide 57

Slide 57 text

3 Memory limits !? (obligatory this-is-fine meme)

Slide 58

Slide 58 text

That’s not fine Is it? • On the positive side: • All are result of (lack of) resource limit configuration • This can be learned • On the negative side: • This needs to be learned • Note: ‘Availability bias’

Slide 59

Slide 59 text

Improving

Slide 60

Slide 60 text

Automation Improving the pipeline • Automating setting the image version is not enough • Rolling out Kubernetes manifests still manual task • Updating configuration & secrets still manual task • Duplication in manifests between stages • Not easily seen what parts are different • Differences intentional or accidental? • This actually slows us down • Does git represent the current state? kubectl -n talk get secrets env -o json |jq -r '.data | map_values(@base64d) | to_entries | .[] | .key + "=\"" + .value +"\""'

Slide 61

Slide 61 text

Helm The package manager for Kubernetes • Charts • Configured via values • It’s like Terraform modules • Or Ansible group_vars • Leveraging community knowledge and efforts • E.g. prometheus-operator • No need to copy charts, able to reference. • Helm v3

Slide 62

Slide 62 text

SOPS: Secrets OPerationS Secrets management stinks, use some sops! • By Mozilla • Manage AWS API access, not keys • Versatile • YAML, JSON, ENV, INI, binary (plain text) • Not limited to Kubernetes • Meaningful diffs • Alternatives considered: • Kamus • Bitnami SealedSecrets

Slide 63

Slide 63 text

Helmfile Wiring it together • Charts • Referenced from online chart sources or local • Environments • Test, staging, production • Referencing values and secrets • Releases • Release name • Reference to chart • Values (can be a templated file, using vars and secrets from environment)

Slide 64

Slide 64 text

Helmfile Wiring it together environment values secrets (SOPS) release X release Y release Z ENV values values values Helmfile

Slide 65

Slide 65 text

Helmfile Wiring it together • Advantages: • Meaningful git diffs • Easily manage multiple releases in single pipeline, e.g.: • Everything related to monitoring and logging • Kube-system • Declarative definition • Of what would otherwise be numerous helm args and steps in CI/CD pipeline

Slide 66

Slide 66 text

Helmfile Wiring it together • Advantages (continued): • Ability to pass in ENV vars • E.g. build result image tags • Ability to reference complex charts created by community • Charts as a building block allows re-use. Example: • Instead of plain yaml you write a chart • If fitting workflow, the chart can be a published artifact • Chart can be re-used e.g. in e2e tests

Slide 67

Slide 67 text

Helmfile Wiring it together • Disadvantages: • 2 levels of templating • Chart itself • Only if writing own charts • Environment & release values into Helm values • Template error message not overly clear • Or even misleading • At least it breaks

Slide 68

Slide 68 text

Helmfile Example

Slide 69

Slide 69 text

Helmfile Example

Slide 70

Slide 70 text

Helmfile Example

Slide 71

Slide 71 text

Helmfile Jenkins

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

Helmfile Final words But tiller? • Helm as a templating engine • Option: Using Helm 2 ‘Tillerless’ • Tiller outside of cluster, not by-passing RBAC • Start using Helm as package manager when Helm 3 settles down • Easy removal of temp. per-feature deploys • Diffs

Slide 74

Slide 74 text

Challenge Auto-scaling

Slide 75

Slide 75 text

scale fast… scale far…

Slide 76

Slide 76 text

Auto-scaling Breaking news push

Slide 77

Slide 77 text

Auto-scaling Types of scaling • Reactive • Breaking news • K8S cluster-autoscaler • Can’t schedule pod? Add nodes. • Predictive • Ticket sale start • Black Friday

Slide 78

Slide 78 text

Auto-scaling Types of scaling • From within cluster • K8S cluster-autoscaler • From outside of cluster • ASG scaling policies

Slide 79

Slide 79 text

Auto-scaling Scaling speed node spin-up duration node count 70% utilization

Slide 80

Slide 80 text

Auto-scaling Times 5 within 5 minutes?

Slide 81

Slide 81 text

Cluster auto-scaler Bag of tricks • Mix predictive and reactive • Add asg instances without telling cluster-autoscaler • Traffic expected to arrive by the time cluster-autoscaler starts to scale in, leaving plenty of resources as needed. • Pause pods • Lower priority pods that can safely be evicted • Effectively ‘creating headroom’ in cluster

Slide 82

Slide 82 text

Considerations When engaging ‘ludicrous mode’™ Can control-plane handle scale? • KOPS • Size master nodes for max. cluster size • Overhead cost • EKS • What’s behind the abstraction? • ELB 503s exist after all • Plan: Proof of concepts

Slide 83

Slide 83 text

Pending Not the pods…

Slide 84

Slide 84 text

Consider EKS Managed control plane EKS Kops Managed control plane Total control over setup Easier: EKS IAM roles for pods • Launched 2019-09-04 (yesterday)* Smooth rolling upgrade process Probably cheaper (2/3 of 3x m4.large) No VPC CNI Pod density limitations * https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/

Slide 85

Slide 85 text

EKS IAM roles for pods Also possible on DIY clusters, officially launched yesterday • OIDC federation access (OpenID Connect identity provider) • Assume role via Secure Token Service (STS) • Projected service account tokens (JWT) in pod • STS can validate JWT tokens against OIDC provider • Boils down to: • Enable/set-up prerequisites in cluster • Add ServiceAccount having IAM role annotation to pod • Use recent AWS SDK

Slide 86

Slide 86 text

Multiple clusters per AWS account Don’t lock ourselves in a corner. api.. api... Route53 zone 1 Route53 zone 1 Route53 zone 2 NS records

Slide 87

Slide 87 text

CI/CD to separate cluster Similar flows • No more taints and tolerations • Similar authorization mechanism to all deploy targets • Possibly IAM • No need for Jenkins-SU • Clusters should be cattle anyway

Slide 88

Slide 88 text

Pipelines GitOps • Manage namespaces via pipeline: • kube-system • monitor • Creation of application namespaces including RBAC • Helmfile

Slide 89

Slide 89 text

System applications Small improvements • Prometheus-operator • PrometheusRule resource type • Default dashboards • EFS • https://github.com/previousnext/k8s-aws-efs • Current. Works well but not a lot of active development. • 2 contributors. 46 stars. • https://github.com/kubernetes-incubator/external-storage • De facto EFS provisioner. 146 contributors. 1630 stars. • Bonus: No more time-consuming initial volume set-up

Slide 90

Slide 90 text

Expand Increase Return on Investment • Add more applications • Facilitate parallel testing & development workflows • Feature testing • Mobile app development • E2e tests

Slide 91

Slide 91 text

Links Further reading Scaling & spot instances: • https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767 EKS: • https://medium.com/glia-tech/productionproofing-eks-ed52951ffd6c QoS: • https://www.replex.io/blog/everything-you-need-to-know-about-kubernetes-quality-of-service-qos-classes Failure stories: • https://k8s.af/

Slide 92

Slide 92 text

Summary

Slide 93

Slide 93 text

Know your limits Automate all the things Everything code Kubernetes is a journey, not a destination All should be cattle. No pets allowed!

Slide 94

Slide 94 text

?