Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOpsDaysPortugal 2019 - João Vale, André Ferr...

DevOpsDaysPortugal 2019 - João Vale, André Ferreira - Going 100% Kubernetes in a Production component

The creation of the base cluster infrastructure, sitting outside of the deployment pipelines, was automated through Terraform. The Kubernetes deployment themselves were generated using Rudder, a simple internal templating engine (instead of Helm), and deployed using the same base processes and CD pipelines used in the on-premises infrastructure. Another aspect that needed some attention was connectivity between datacenters as the component both uses and is used by other components, so a VPN was necessary. We will walk the audience through our journey, also expanding on the problems we faced and how we overcame them.

Avatar for DevOpsDaysPortugal

DevOpsDaysPortugal

June 04, 2019
Tweet

More Decks by DevOpsDaysPortugal

Other Decks in Technology

Transcript

  1. Goal "Run a complex application, in production, on Kubernetes in

    GCP, securely, with proper logging, monitoring and alerting" • 8 weeks • Serve external traffic directly from GCP • Application should be platform agnostic • GCP rollout should be seamless for internal or external consumers When starting make sure you have: • Access to the cloud console – SSO • Network connectivity – VPN • Billing
  2. Current infrastructure overview • On-Premise Openstack instances • 2 active-active

    DCs • UltraDNS to balance external traffic • Citrix Netscaler to balance internal traffic • Nuage SDN • Immutable deployments with VMs (cattle) • Chef or Ansible config management • Currently supporting: ~600 applications and ~12k virtual machines
  3. Application overview • Java "micro-service" • 48GB of RAM •

    12 production instances • Handles ~7K reqs/sec per node on a busy day • Currently bundled into an RPM using Maven • 6 active developers • Minimize developer impact by keeping: • Same app codebase • Same CD/CI tooling • Same bundling process Config management is tricky to port!! Chef cookbooks were converted to k8s config maps
  4. Open problem: How to structure K8s clusters • What is

    the right size of a cluster? • How to distribute applications across multiple clusters? • How to implement inter-cluster communication? • How to minimize inter-cluster communication?
  5. Initial design We do need multiple clusters We follow the

    same hierarchy we have in our private cloud: a cluster for a business unit We want to have separate staging and production BTW: Folders are cool :)
  6. Shared VPC Share one VPN connection between multiple projects (k8s

    clusters) Centralize firewall management in a single project
  7. GCP bootstrapping with Terraform • Ansible support for GCP is

    (was?) very immature • Provider maintained by Google engineers • Beta features clearly separated in google-beta provider • Google Cloud Storage bucket can be used to store tf state • Works like a charm :) • Kubernetes provider would be a nice companion, but very immature at the time :(
  8. Overview of a deployment pipeline Rudder templates Docker image Config

    Maps Deployment manifest Rudder K8s Cluster Jenkins Build GoCD Jenkins Tests
  9. • Latency times through the VPN were lower than expected

    – P99 increase of 83ms • Low-traffic (~5%) runs to validate rollback strategy • Traffic throttled up gradually with Dev and Ops teams monitoring • 100% production traffic served from GCP for 1 hour (0% on-premise) - 1,143,036 requests Getting traffic into the clusters
  10. tl;dr • Ensure all pre-requisites are in place before getting

    your hands dirty • Use Terraform GCP provider to setup VPC, VPN, test labs, dev and prod envs • Use Terraform GKE provider to avoid managing k8s clusters by hand • Use affinity/anti-affinity k8s rules to ensure high availability • Ensure rollback strategies • Use JIB to put Java apps into containers • Big apps have feelings too, take them to the cloud with the others • Don't run config management in containers!! • Infra should serve the devs, not the other way around