Creating a kubernetes distribution (DevOpsLisbon)

Slide 1

Slide 1 text

Creating a kubernetes distribution

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

By the end of this presentation you will know one more way to manage kubernetes applications setups in a standardized, repeatable and scalable way.

Slide 4

Slide 4 text

QR-CODE

Slide 5

Slide 5 text

Agenda ● $ whoami ● Acknowledgements ● Vocabulary ● The problem ● Our kubernetes distro ● Links and references

Slide 6

Slide 6 text

Whoami? Daniel Requena ● Dad and Husband ● Bachelor of Comp. Science / Masters Comp. Engineering ● 20+ years on IT (Sysadmin/DevOPS/SRE/…) ● Occasional speaker and podcasts participant ● Currently: SRE @ iFood * 🤔 ● Focus: Kubernetes / Service Mesh ● Living in Lisbon: 2.5 years

Slide 7

Slide 7 text

iFood? ● Food tech company - based in Brazil ● Employees: ~5300 ● Tech team: ~2000 ● Orders: +70.000.000 per month ● Requests: 250k rps ● Services: +3000 ● Deployments: +300 per day ● Kubernetes: ○ +98% services ○ +53 clusters

Slide 8

Slide 8 text

Acknowledgment Rodrigo Watanabe Gabriel Tiossi João Marques Henrique Dalssaso Carlos Motta Thiago de Francisco (Nullock) Thales Lopes Henrique Duran

Slide 9

Slide 9 text

Vocabulary Kubernetes distribution: ● EKS, GKE, AKS , KubeADM, Cluster-API, Kops, etc… Distribution: (in this context) ● A way to standardize setup and customization of packages. A tested set of software packages that, when installed, provide all (or almost all) of the functionalities for your needs.

Slide 10

Slide 10 text

The Problem Kubernetes project - 2018 ● 4 clusters (1 dev, 3 BUs) ○ Few apps, few tools Kubernetes project - 2019 ● 11 clusters (1 dev, 10 BUs) ○ ~20% apps, dozens of tools DEV BU-1 BU-2 BU-3

Slide 11

Slide 11 text

The Problem ● Logs ● Scaling (horizontal e vertical) ● Monitoring + Grafana + Alerting ● Ingress ● CNI ● Security ● Policies ● Caos ● Consul agent ● External DNS ● Kiam ● Secrets Manager ● Backup ● Auth ● Cost management ● CertManager ● External Controller ● Service Mesh ● Open Telemetry ● Custom controllers/Operators {

Slide 12

Slide 12 text

So many problems: ● Values updates ● Reconciliation ● Same package, multiple installs ● Lack of customizations ● Not testable (k8s upgrades and package upgrades) ● Slow (manual) ● Centralized (k8s team use only) ● Scalability ● … The Problem (how we used to set up machinery apps?) tools … 01-nginx.sh 02-prometheus.sh 03-grafana.sh 04-certmanager.s h 05-accounts.sh

Slide 13

Slide 13 text

The Problem BU-1 Sandbox account Sandbox cluster BU-1 Production account Production cluster BU-2 Sandbox account Sandbox cluster BU-2 Production account Production cluster BU-3 Sandbox account Sandbox cluster BU-3 Production account Production cluster BU-4 Sandbox account Sandbox cluster BU-4 Production account Production cluster …

Slide 14

Slide 14 text

The Problem tools … 01-nginx.sh 02-prometheus.sh 03-grafana.sh 04-certmanager.s h 05-accounts.sh

Slide 15

Slide 15 text

Our requirements: ● 100% based on Helm ● Standardized ● Simple (1 command preferible) ● Flexible ● Extensible ● Testable ● Git ﬂow oriented ● Scalable The Solution

Slide 16

Slide 16 text

A source of inspiration Linux Distros!

Slide 17

Slide 17 text

A source of inspiration Linux Distros strengths: ● Package Management * (Helm) ● Stable (tests) ● Standardized (interface) ● Life cycle / Releases ● Extensible (3rd party packages) ● Community Oriented Revolves around a “central point”: ● Linux Kernel / (Kubernetes)

Slide 18

Slide 18 text

Schematics Distribution Default Values Version: 1.0 Mem: 1G Cpu: 100m Labels: A=1 Version: 1.3 Mem: 2G Hpa: min 3 Version: 2.0 PVC: 10G NodeSelector: Infra

Slide 19

Slide 19 text

Default Values Schematics Distribution Version: 1.0 Mem: 1G Cpu: 100m Labels: A=1 Version: 1.3 Mem: 2G Hpa: min 3 Version: 2.0 PVC: 10G NodeSelector: Infra

Slide 20

Slide 20 text

Default Values Schematics Distribution Version: 1.0 Mem: 1G Cpu: 100m Labels: A=1 Version: 1.3 Mem: 2G Hpa: min 3 Version: 2.0 PVC: 10G NodeSelector: Infra PVC: 15G Mem: 3G PVC: 15G Mem: 3G

Slide 21

Slide 21 text

Default Values Schematics Distribution Version: 2.0 PVC: 10G NodeSelector: Infra Core Team Packages Version: 1.0 Mem: 1G Cpu: 100m Labels: A=1 Sec Team Packages Ingress Team Packages …

Slide 22

Slide 22 text

Default Values Schematics Distribution Stable Version: 2.0 PVC: 10G NodeSelector: Infra Core Team Packages Sec Team Packages Ingress Team Packages Version: 1.0 Mem: 1G Cpu: 100m Labels: A=1 …

Slide 23

Slide 23 text

Default Values Schematics Distribution Edge Version: 3.0 PVC: 10G NodeSelector: Infra Core Team Packages Sec Team Packages Ingress Team Packages Version: 1.5 Mem: 1G Cpu: 100m Labels: A=1 …

Slide 24

Slide 24 text

Building the distro ● What is Helmfile? ○ TL;DR - A “wrapper” to Helm with steroids, or a Chart of Charts. ● Some characteristics ○ Composable Files ○ Extended template - Sprig (including the VALUES file) ○ DOESN’T create a state file on its own. ○ Separation between logic and values ○ Multiple ways to do reference (s3, git, local, OCI…) ○ Transforms almost everything in a Helm Release ○ Defines dependencies between releases ⚠ ● jsonpatch after manifests rendered ● Multiple Environments ● Secrets backends integration ● Hooks ● Extra metadata / selectors Helmfile to the rescue…

Slide 25

Slide 25 text

Building the distro $ cat helmfile.yaml repositories: - name: bitnami url: https://charts.bitnami.com/bitnami - name: custom url: git+https://github.com/reactiveops/polaris@deploy/helm?ref=master releases: - name: external-dns namespace: machinery chart: bitnami/external-dns version: 3.2.6 values: - values/default.yaml - name: reactiveops chart: custom/reactiveops values: - image: tag: 1.4 - scheme: {{ env "SCHEME" | default "https" }} $ helmﬁle apply

Slide 26

Slide 26 text

Building the distro $ cat meta-helmfile.yaml helmfiles: - path: git::https://github.com/drequena/grafana-package.git@/helmfile.yaml?ref=main values: - grafana: enabled: true resources: request: cpu: “100m” - path: git::https://github.com/drequena/prometheus-package.git@/helmfile.yaml?ref=1.0.1 values: - prometheus: enabled: true labels: - owner: “secteam” ...

Slide 27

Slide 27 text

Building the distro Cluster Distribution Packages

Slide 28

Slide 28 text

$ cat myclusters/sales.yaml helmfiles: - path: git::https://github.com/drequena/distribution.git@/helmfile.yaml?ref=1.0 values: - prometheus: enabled: false - grafana: url: “graphs.company.net” Building the distro (Cluster) - path: git::https://github.com/drequena/cert-manager-package.git@/helmfile.yaml?ref=1.3 values: - cert-manager: resources: request: cpu: 250m

Slide 29

Slide 29 text

$ cat distribution/helmfile.yaml helmfiles: - path: git::https://github.com/drequena/grafana-package.git@/helmfile.yaml?ref=main values: - - {{ .Values | get "grafana" dict | toYaml | indent 6 | trim}} - path: git::https://github.com/drequena/prometheus-package.git@/helmfile.yaml?ref=1.0.1 values: - {{ .Values | get "prometheus" dict | toYaml | indent 6 | trim}} Building the distro (Distro)

Slide 30

Slide 30 text

$ cat packages/prometheus/helmfile.yaml repositories: - name: prometheus url: https://charts.prometheus.org/prometheus releases: - name: prometheus condition: prometheus.enabled needs: - prometheus-operator namespace: monitoring chart: prometheus/prometheus version: 2.49.1 values: - values/default.yaml - {{ .Values | get "prometheus" dict | toYaml | indent 6 | trim}} Building the distro (Package)

Slide 31

Slide 31 text

Building the distro Putting all together

Slide 32

Slide 32 text

Cluster workflow Cluster (sandbox) $ helmfile apply $ helmfile diff commit values: - prometheus: resources: request: mem: 2G [INFO ] + status: [INFO ] + prometheus:: [INFO ] + resources: [INFO ] + request: [INFO ] - memory: 1G [INFO ] + memory: 2G [WARN ] at least one change was identified [INFO ] + status: [INFO ] + prometheus:: [INFO ] + resources: [INFO ] + request: [INFO ] - memory: 1G [INFO ] + memory: 2G [INFO ] New release applied

Slide 33

Slide 33 text

commit helmfile: -path: git::https://… values: - metricserver: custom: “newvalue” Distro workflow Distribution (Edge) Clusters $ helmfile apply $ helmfile diff [INFO ] + status: [INFO ] + metricserver:: [INFO ] + labels: [INFO ] + custom: “newvalue” [WARN] at least one change was identified [INFO ] + status: [INFO ] + metricserver:: [INFO ] + labels: [INFO ] + custom: “newvalue” [INFO ] New release applied

Slide 34

Slide 34 text

Package workflow Distribution Cluster Package $ helmfile apply $ helmfile diff commit releases: - name: prometheus chart: prometheus/prometheus version: 2.38.0 Tag: 1.1.3 commit helmfile: -path: git::https://…?ref:1.1.3 Tag: 2.0.1 commit helmfile: -path: git::https://…?ref:2.0.1

Slide 35

Slide 35 text

Current workﬂow Clusters Distribution Sales Logistics cluster.yaml cluster.yaml Core Sec L7 helmfile.yaml nginx-ingress certmanager kubecost prometheus

Slide 36

Slide 36 text

How about tests? Our distro install/upgrade/reconcile packages and “that's all”. ● Helm packages are the real CORE ○ One repo per package (with A LOT of governance) ■ Divided by: Impact level ■ vCluster (test against all supported k8s version) ■ Pluto (check k8s API deprecations) ■ Terratest (unit test) ■ Golang (integration tests) ■ Gitlab-ci ■ Semantic Release ■ Renovate Bot

Slide 37

Slide 37 text

Conclusions The Good ● Helm ✅ ● Flexible ✅ ● Scalable ✅ ● Extensible ✅ ● Standardized ✅ ● The concept is modular and reusable ✅ The Bad and the Ugly ● Dependency check between splitted helmﬁles ⚠ ✅ ● Simple? Tracing values can be hard ⚠ ● Low parallelism depending on the repo/distro organization ⚠

Slide 38

Slide 38 text

● Blueprint repos: https://github.com/drequena/clusters ● Helm: https://helm.sh/ ● Helmfile Docs: https://helmfile.readthedocs.io/en/latest/ ● Helmfile git: https://github.com/helmfile/helmfile ● Terratest: https://terratest.gruntwork.io/ ● Renovatebot: https://github.com/renovatebot/renovate ● Helm-Unit-tests: https://github.com/anikin-aa/helm-unittest ● vCluster: https://www.vcluster.com/ ● Pluto: https://github.com/FairwindsOps/pluto ● Semantic Release: https://github.com/semantic-release/semantic-release Links and References

Slide 39

Slide 39 text

Thanks! Questions?

Slide 40

Slide 40 text

Where to ﬁnd me? https://www.linkedin.com/in/danielrequena/ https://bolha.us/@requena https://github.com/drequena/ https://speakerdeck.com/drequena https://twitter.com/Daniel_Requena