Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Meetup Camptocamp: Exoscale SKS

Meetup Camptocamp: Exoscale SKS

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

April 14, 2021
Tweet

Transcript

  1. Scalable Kubernetes Service 03-2021

  2. Outline • Intro • Software at Exoscale • Kubernetes at

    Exoscale • Challenges met • SKS: Scalable Kubernetes Service
  3. Intro and context

  4. Exoscale in a nutshell • Infrastructure as a Service, 6

    zones throughout Europe • Now part of A1 group • Public cloud in Geneva since 2013
  5. The product

  6. Software at Exoscale

  7. What’s in a cloud provider? • Datacenter & Network Operations

    • Security • Automation • DBA • Software development
  8. The software we write • Object Storage controller • Internal

    SDN • Compute orchestrator • Load-balancer orchestrator • Kubernetes orchestrator • Web portal • Customer Management • Usage Metering • Billing • Integration tooling (CLI, terraform provider, …) • Command and control, automation support
  9. Things that didn’t exist in 2012 • Ansible • Terraform

    • Docker • Kubernetes • Wifi • Television
  10. Initial stack • Puppet for configuration management, in-house command and

    control • 5 large external facing services, databases, a number of batch processing tools • VM profiles per role, horizontal scaling where possible
  11. Why container orchestration then? • Puppet becomes a hot spot

    of activity ◦ Hard to convey the entire infrastructure need of an application in one place ◦ Configuration scattered across different places (load-balancing, firewalling, software, monitoring) • Always making allocation decisions “on what class of machines should this run?” • Overall low utilization, but contention during peaks! • Large MTTR for failed nodes
  12. Kubernetes at Exoscale

  13. Initial exploration • Strong interest in Apache Mesos (not tied

    to docker, distributed systems building toolbox) • Witnessed Kubernetes fast adoption • Swarm and nomad didn’t fit the bill for a number of reasons
  14. Going for Kubernetes • Traction • The kicker were the

    open-ended abstractions: Service, Ingress, CRI, CNI, CSI ◦ These allow providers to step in and provide a best in class implementation of the abstraction ◦ The abstraction allows for a much better shot at expressing infrastructure independent from the location • We decided to start with our API gateway ◦ One of the most active projects at the time ◦ Extremely sensitive to disruption
  15. Challenges met

  16. Keeping our promises in a containerized world • Config management

    ◦ Now next to the application: huge progress ◦ Added internal tooling to generate manifests • Deployments ◦ Registries vs. Debian repositories ◦ ArgoCD for managing deployments • Security ◦ Network and security policies ◦ OPA (wip)
  17. Container networking • Network used to be boring ◦ A

    public IP per VM ◦ Security groups to provide isolation • Exoscale private networks not ready for CNI • Performance analysis led to the use of Calico
  18. SKS: Scalable Kubernetes Service

  19. Redux of what we learnt • Network ◦ Calico •

    Security ◦ Several certificate authorities per cluster ◦ Encryption key for secrets, per cluster ◦ Wireguard available on the template ◦ Cluster access using certificates (support for users, groups, TTL) • Exoscale Cloud Controller Manager ◦ To validate worker nodes ◦ Network Load Balancer integration
  20. Full integration in the Exoscale stack • Network Load Balancer

    ◦ “LoadBalancer” Kubernetes services ◦ Configuration using annotations • Instance Pools ◦ We rely on instance pools for the nodepools ◦ Same properties (nodes cycling…) • Security groups (per nodepool) • Anti affinity groups (per nodepool) • API and tooling ◦ CLI, Terraform
  21. Product objectives • Speed ◦ Create clusters in ~100 seconds

    ◦ New nodes in the cluster in ~120 seconds (available in “kubectl get nodes”) ◦ Should be faster in the future • Seamless start • CNCF compliance • Reliability: two offerings ◦ starter: no SLA, non-HA control plane, free ◦ pro: SLA, HA control plane
  22. Demo!

  23. Kubernete dashboard Kubernetes “LoadBalancer” Service Exoscale Load Balancer Kubernetes Cluster

    Outside world
  24. Kubernete dashboard Kubernetes “LoadBalancer” Service Exoscale Load Balancer Kubernetes Cluster

    Outside world Kubernetes “LoadBalancer” Service Nginx ingress controller App App Exoscale Load Balancer
  25. Additional notes and future work

  26. Advanced use cases • Cluster lifecycle management ◦ Cluster upgrades

    (next patchs, next minor) • Certificate management ◦ You can retrieve various CA certificates in order to configure some components • Multiple nodepools ◦ Each nodepool is independant ◦ Can have different disk sizes, offerings, anti affinity groups, networking rules… ◦ Can be scaled independently
  27. Ongoing work • Cluster autoscaler (short-term) ◦ Automatically scale nodepools

    based on Kubernetes metrics • Web portal (short-term) • Blueprints (short-term) ◦ Manifests examples for common things • GPU nodepools • More add-ons: dashboard, ingress, metrics-server ◦ metrics-server should arrive soon • Persistent volumes: specific add-on • Automatic security group management • Managed container registry • Advanced IAM integration