Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Increased velocity of code deployments with K...

Increased velocity of code deployments with Kubernetes

Presentation about Vinted's achievements using Kubernetes, GitOps and Canary Deployments.

Edgaras Apšega

April 12, 2024
Tweet

More Decks by Edgaras Apšega

Other Decks in Technology

Transcript

  1. Build Stuff 2023 11 16, Vilnius Increased velocity of code

    deployments with Kubernetes Edgaras Apšega Site Reliability Engineer
  2. Vinted infrastructure Peak load transactions between users Kubernetes production stats

    • All services running on Kubernetes* • 20k+ running pods • 700+ physical nodes (80k CPU cores; 330TB memory) *stateless
  3. Capistrano • Gathers physical server list from Consul SD •

    Does rolling update one-by-one (%) pre-Kubernetes Kubernetes GitOps Canary deployments
  4. Pain points • Services isolation issues • Physical servers with

    different CPUs • Ruby dependencies management • Custom in-house integrations pre-Kubernetes Kubernetes GitOps Canary deployments
  5. Kubernetes architecture • Physical Kubernetes nodes (AMD CPUs, 128 cores

    each) • 1 cluster stretched between 3 DCs that are close to each other and low latency between them • separate etcd cluster for /events endpoint Kubernetes cluster Data center #2 Data center #3 Data center #1 Control plane VM #1 Control plane VM #2 Control plane VM #3 etcd VM #1 etcd VM #2 etcd VM #3 etcd /events VM #1 etcd /events VM #1 etcd /events VM #1 Kubernetes physical node Kubernetes physical node Kubernetes physical node pre-Kubernetes Kubernetes GitOps Canary deployments
  6. Kubernetes integration with Data center’s network BGP for the win!

    On the last Pi day Reddit had a 5 hours outage because after Kubernetes cluster upgrade route reflectors became unavailable Our (simplified) architecture: • iBGP from server to Top Of the Rack switch • Data center network deployed as BGP fabric for route exchange • Every Kubernetes Node (server) has /24 network for it’s pods • Service IPs advertised as anycast inside data center Leaf switch 10.1.1.254/24 Spine switch Spine switch Leaf switch 10.1.15.254/24 Kubernetes Node eth0: 10.1.1.12/24 br0: 10.100.23.254/24 Pod 10.100.23.15/24 Pod 10.100.23.123/24 Kubernetes Node eth0: 10.1.15.12/24 br0: 10.100.48.254/24 Pod 10.100.48.7/24 Pod 10.100.48.203/24 Advertise: 10.100.23.0/24 10.200.1.0/24 Advertise: 10.100.48.0/24 10.200.1.0/24 pre-Kubernetes Kubernetes GitOps Canary deployments
  7. Build Docker push Test Git commit and push Git clone

    config repo Update manifests kubectl apply Git clone config repo Discover manifests Continuous Integration Continuous Deployment
  8. Deployments velocity • Deployment duration average is 20 minutes •

    Core code releases to production every 20-30 minutes pre-Kubernetes Kubernetes GitOps Canary deployments
  9. What about rollbacks? • Disable auto-sync for ArgoCD application •

    Apply known good image tag • Enable auto-sync for ArgoCD application ChatOps to the rescue! pre-Kubernetes Kubernetes GitOps Canary deployments
  10. Canary Deployments with analysis Argo Rollouts Plays well with ArgoCD

    Deployment strategies: Blue/Green Canary Analysis based on Prometheus, DataDog, etc. metrics pre-Kubernetes Kubernetes GitOps Canary deployments
  11. Canary Deployments in action (2) 3. new healthy version is

    live 1. rollback started 2. previous healthy version re-deployed pre-Kubernetes Kubernetes GitOps Canary deployments
  12. Core monolith application deployments Deployments to production per day 30

    Average duration of deployment 20 minutes Running during non-peak 1k pods Requests per second during the peak time 170k
  13. All applications deployments Deployments to production per day 2000 Average

    duration of deployment 30 seconds Running during non-peak 20k pods Kubernetes services 650
  14. Comparison: then and now Good old days Current solution •

    Difficult to manage services running on physical servers (resource isolation, dependency management) • Semi-automatic deployments with using ChatOps • Kubernetes centralizes and automates the management of application isolation, handling resources and dependencies efficiently across physical servers. • Fully automated deployments using GitOps workflow • Automated rollbacks using canary rollouts pre-Kubernetes Kubernetes GitOps Canary deployments