Slide 1

Slide 1 text

Learning From 2 Years Of Kubernetes In Production

Slide 2

Slide 2 text

Learning From 2 3 Years Of Kubernetes In Production

Slide 3

Slide 3 text

Tech Consultant, Fractional CTO/VPE Past Life: ● 6.5 Years at BlinkIt (formerly Grofers) ● 2 Years at Wingify (VWO.com) I’m Vaidik

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

You build it, you run it

Slide 6

Slide 6 text

Then came Kubernetes

Slide 7

Slide 7 text

State of Kubernetes at Grofers - 2022 PRODUCTION ● 90% of targeted workload migrated to Kubernetes ● Stateful services and 10% of targeted workload will not be migrated DEVELOPMENT ● Development in the cloud ● Kubernetes as extension of laptop for development ● On-demand dev environments under 10 minutes

Slide 8

Slide 8 text

For Developers TOOLING ● Ansible based tooling for stateful workloads in production ● Kubernetes for everything else in production and non-production YOU BUILD IT, YOU RUN IT ● Developers get better abstractions that are provided by platform teams ● Platform teams work towards reducing ops overhead at scale. Things like cost, security, reliability become “lesser” of a concern for developers. ● More ownership

Slide 9

Slide 9 text

2 years TIME TAKEN What it took us to get here 9 engineers INFRASTRUCTURE TEAM $500K AWS SPEND ~45 dev months PRODUCT ENGINEERING TEAMS

Slide 10

Slide 10 text

Early 2018 Illusion of Agility

Slide 11

Slide 11 text

Proliferation of Microservices

Slide 12

Slide 12 text

Distributed Monolith

Slide 13

Slide 13 text

Lesson: A DevOps strategy requires good to ensure accountability so that teams can truly benefit from autonomy.

Slide 14

Slide 14 text

Developers were... ● Unhappy about the drag caused by poor development experience ● Releasing software that was not tested enough ● Fire fighting because of high number of bugs and incidents in production ● Stressed because of sleepless nights and busy weekends

Slide 15

Slide 15 text

March 2018 Project ShipIt

Slide 16

Slide 16 text

Journey of Project ShipIt FEB 2018 Started with Docker and docker-compose JUN 2018 Still optimizing docker-compose and our orchestration tooling AUG 2018 Frustrated with dev/prod disparity. SEP 2018 Kubernetes decided as the future. JAN 2019 Kubernetes in production with a few services to prove scale MAR 2018 Ran first set of end-to-end tests on a fully orchestrated container based environment MAY 2019 We were running tests

Slide 17

Slide 17 text

What are your reasons? ● More agility - consistent developer and DevOps experience, internal DevOps platforms, CI/CD, abstractions for declarative infrastructure management ● Better reliability - better scalability, features for resilience, primitives for streamlined operations. ● Cost - reduced hosting cost and support cost, but most likely a high migration cost. ● Portability - deploy across varied environments (multi cloud, hybrid, etc.).

Slide 18

Slide 18 text

Is Kubernetes the only way to achieve all of this?

Slide 19

Slide 19 text

Building more incrementally vs rebuilding everything

Slide 20

Slide 20 text

Was migrating to Kubernetes worth the time and money we spent?

Slide 21

Slide 21 text

Was migrating to Kubernetes worth the time and money we spent? Was the process of adopting Kubernetes right for us?

Slide 22

Slide 22 text

An Alternate Way SHORT TERM TRACK ● Keep using EC2 VMs, Ansible and Consul ● Strengthen service discovery with Consul ● Speed up Ansible further ● Spin up new EC2 VMs LONG TERM TRACK ● Build a solid Kubernetes platform for organic adoption ● Migrate complex infrastructure first to streamline operation ● It’s fine to take more time

Slide 23

Slide 23 text

Do you really need Kubernetes and if yes, do you need it now?

Slide 24

Slide 24 text

Kubernetes is a new paradigm. Lift-and-shift is not a good idea.

Slide 25

Slide 25 text

There is a steep learning curve for your development and operations teams.

Slide 26

Slide 26 text

Out-of-the-box Kubernetes is almost never enough for anyone

Slide 27

Slide 27 text

It is not a PaaS solution but a building block to build your own PaaS solution

Slide 28

Slide 28 text

Your Kubernetes platform will shape according to your context REFLECTION OF YOUR: ● Product & Business Context ● Engineering Team ● Existing applications and their infrastructure ● Ability to spend money to migrate to a new way of working

Slide 29

Slide 29 text

PaaS with Kubernetes ● Infrastructure ● Developer Experience ● Configuration & Secret Management ● Local Development ● Packaging, Deployment ● Continuous Integration ● Continuous Delivery (+ GitOps) ● Metrics ● Logs ● Distributed Tracing ● Governance ● Cost ● Learning & Development

Slide 30

Slide 30 text

PaaS with Kubernetes ● Kops, migrated to EKS ● Built in-house on OSS, tutorials ● Consul, Vault, consul-template ● Skaffold, Telepresence.io ● Skaffold, Kustomize, Helm ● Tekton ● Tekton ● Prometheus, Grafana, NewRelic ● Loki, Grafana ● Not even solved yet ● OPA and Kyverno ● Self-managed spot instances ● Katacoda, Internally built ● Infrastructure ● Developer Experience ● Configuration & Secret Management ● Local Development ● Packaging, Deployment ● Continuous Integration ● Continuous Delivery (+ GitOps) ● Metrics ● Logs ● Distributed Tracing ● Governance ● Cost ● Training

Slide 31

Slide 31 text

Lesson: You are limited by your legacy applications.

Slide 32

Slide 32 text

What about adoption and migration?

Slide 33

Slide 33 text

Not enough incentive to migrate and take the risk

Slide 34

Slide 34 text

Lesson: Better platform is the best incentive to migrate

Slide 35

Slide 35 text

Lack of proactive migration support made the journey more painful

Slide 36

Slide 36 text

Lesson: Running parallel stacks is hard

Slide 37

Slide 37 text

Lesson: Operating a Kubernetes cluster is hard.

Slide 38

Slide 38 text

CRDs, Operators, Controllers, Mutating Webhooks

Slide 39

Slide 39 text

Reminder - do you need to do this now?

Slide 40

Slide 40 text

But if you must…

Slide 41

Slide 41 text

But if you must… DON’T do it yourself

Slide 42

Slide 42 text

̄Awesome solutions available today (not exhaustive) ● EKS, AKS, GKE, Civo ● Devtron, Porter, Shipa ● Hashicorp Vault, Doppler ● Devtron, Signodot, Ambassador ● Devtron, Gitlab, Harness, Circle CI ● Devtron, Harness, Circle CI, LaunchDarkly ● Honeycomb, Grafana Cloud,, DataDog ● Devtron, Opslevel ● Cast.ai, Kubecost, Spot.io ● CloudYuga PaaS with Kubernetes Today Big decisions to make ● Infrastructure ● Developer Experience ● Configuration & Secret Management ● Environments, Local Development ● Continuous Integration ● Continuous Delivery (+ GitOps) ● Observability (logs, metrics, tracing) ● Governance ● Cost ● Training & Labs

Slide 43

Slide 43 text

Thank You! Vaidik Kapoor Technology Consultant vaidik.in @vaidikkapoor Questions?