Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Meetup Camptocamp: Exoscale SKS

Meetup Camptocamp: Exoscale SKS

Pierre-Yves Ritschard

April 14, 2021

More Decks by Pierre-Yves Ritschard

Other Decks in Technology


  1. Outline • Intro • Software at Exoscale • Kubernetes at

    Exoscale • Challenges met • SKS: Scalable Kubernetes Service
  2. Exoscale in a nutshell • Infrastructure as a Service, 6

    zones throughout Europe • Now part of A1 group • Public cloud in Geneva since 2013
  3. What’s in a cloud provider? • Datacenter & Network Operations

    • Security • Automation • DBA • Software development
  4. The software we write • Object Storage controller • Internal

    SDN • Compute orchestrator • Load-balancer orchestrator • Kubernetes orchestrator • Web portal • Customer Management • Usage Metering • Billing • Integration tooling (CLI, terraform provider, …) • Command and control, automation support
  5. Things that didn’t exist in 2012 • Ansible • Terraform

    • Docker • Kubernetes • Wifi • Television
  6. Initial stack • Puppet for configuration management, in-house command and

    control • 5 large external facing services, databases, a number of batch processing tools • VM profiles per role, horizontal scaling where possible
  7. Why container orchestration then? • Puppet becomes a hot spot

    of activity ◦ Hard to convey the entire infrastructure need of an application in one place ◦ Configuration scattered across different places (load-balancing, firewalling, software, monitoring) • Always making allocation decisions “on what class of machines should this run?” • Overall low utilization, but contention during peaks! • Large MTTR for failed nodes
  8. Initial exploration • Strong interest in Apache Mesos (not tied

    to docker, distributed systems building toolbox) • Witnessed Kubernetes fast adoption • Swarm and nomad didn’t fit the bill for a number of reasons
  9. Going for Kubernetes • Traction • The kicker were the

    open-ended abstractions: Service, Ingress, CRI, CNI, CSI ◦ These allow providers to step in and provide a best in class implementation of the abstraction ◦ The abstraction allows for a much better shot at expressing infrastructure independent from the location • We decided to start with our API gateway ◦ One of the most active projects at the time ◦ Extremely sensitive to disruption
  10. Keeping our promises in a containerized world • Config management

    ◦ Now next to the application: huge progress ◦ Added internal tooling to generate manifests • Deployments ◦ Registries vs. Debian repositories ◦ ArgoCD for managing deployments • Security ◦ Network and security policies ◦ OPA (wip)
  11. Container networking • Network used to be boring ◦ A

    public IP per VM ◦ Security groups to provide isolation • Exoscale private networks not ready for CNI • Performance analysis led to the use of Calico
  12. Redux of what we learnt • Network ◦ Calico •

    Security ◦ Several certificate authorities per cluster ◦ Encryption key for secrets, per cluster ◦ Wireguard available on the template ◦ Cluster access using certificates (support for users, groups, TTL) • Exoscale Cloud Controller Manager ◦ To validate worker nodes ◦ Network Load Balancer integration
  13. Full integration in the Exoscale stack • Network Load Balancer

    ◦ “LoadBalancer” Kubernetes services ◦ Configuration using annotations • Instance Pools ◦ We rely on instance pools for the nodepools ◦ Same properties (nodes cycling…) • Security groups (per nodepool) • Anti affinity groups (per nodepool) • API and tooling ◦ CLI, Terraform
  14. Product objectives • Speed ◦ Create clusters in ~100 seconds

    ◦ New nodes in the cluster in ~120 seconds (available in “kubectl get nodes”) ◦ Should be faster in the future • Seamless start • CNCF compliance • Reliability: two offerings ◦ starter: no SLA, non-HA control plane, free ◦ pro: SLA, HA control plane
  15. Kubernete dashboard Kubernetes “LoadBalancer” Service Exoscale Load Balancer Kubernetes Cluster

    Outside world Kubernetes “LoadBalancer” Service Nginx ingress controller App App Exoscale Load Balancer
  16. Advanced use cases • Cluster lifecycle management ◦ Cluster upgrades

    (next patchs, next minor) • Certificate management ◦ You can retrieve various CA certificates in order to configure some components • Multiple nodepools ◦ Each nodepool is independant ◦ Can have different disk sizes, offerings, anti affinity groups, networking rules… ◦ Can be scaled independently
  17. Ongoing work • Cluster autoscaler (short-term) ◦ Automatically scale nodepools

    based on Kubernetes metrics • Web portal (short-term) • Blueprints (short-term) ◦ Manifests examples for common things • GPU nodepools • More add-ons: dashboard, ingress, metrics-server ◦ metrics-server should arrive soon • Persistent volumes: specific add-on • Automatic security group management • Managed container registry • Advanced IAM integration