$30 off During Our Annual Pro Sale. View Details »

Meetup Camptocamp: Exoscale SKS

Meetup Camptocamp: Exoscale SKS

Pierre-Yves Ritschard

April 14, 2021
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Technology

Transcript

  1. Scalable
    Kubernetes Service
    03-2021

    View Slide

  2. Outline
    ● Intro
    ● Software at Exoscale
    ● Kubernetes at Exoscale
    ● Challenges met
    ● SKS: Scalable Kubernetes Service

    View Slide

  3. Intro and context

    View Slide

  4. Exoscale in a nutshell
    ● Infrastructure as a Service, 6 zones throughout
    Europe
    ● Now part of A1 group
    ● Public cloud in Geneva since 2013

    View Slide

  5. The product

    View Slide

  6. Software at
    Exoscale

    View Slide

  7. What’s in a cloud provider?
    ● Datacenter & Network Operations
    ● Security
    ● Automation
    ● DBA
    ● Software development

    View Slide

  8. The software we write
    ● Object Storage controller
    ● Internal SDN
    ● Compute orchestrator
    ● Load-balancer orchestrator
    ● Kubernetes orchestrator
    ● Web portal
    ● Customer Management
    ● Usage Metering
    ● Billing
    ● Integration tooling (CLI, terraform provider, …)
    ● Command and control, automation support

    View Slide

  9. Things that didn’t exist in 2012
    ● Ansible
    ● Terraform
    ● Docker
    ● Kubernetes
    ● Wifi
    ● Television

    View Slide

  10. Initial stack
    ● Puppet for configuration management, in-house
    command and control
    ● 5 large external facing services, databases, a
    number of batch processing tools
    ● VM profiles per role, horizontal scaling where
    possible

    View Slide

  11. Why container orchestration then?
    ● Puppet becomes a hot spot of activity
    ○ Hard to convey the entire infrastructure
    need of an application in one place
    ○ Configuration scattered across different
    places (load-balancing, firewalling,
    software, monitoring)
    ● Always making allocation decisions “on what class
    of machines should this run?”
    ● Overall low utilization, but contention during peaks!
    ● Large MTTR for failed nodes

    View Slide

  12. Kubernetes at
    Exoscale

    View Slide

  13. Initial exploration
    ● Strong interest in Apache Mesos (not tied to
    docker, distributed systems building toolbox)
    ● Witnessed Kubernetes fast adoption
    ● Swarm and nomad didn’t fit the bill for a number of
    reasons

    View Slide

  14. Going for Kubernetes
    ● Traction
    ● The kicker were the open-ended abstractions:
    Service, Ingress, CRI, CNI, CSI
    ○ These allow providers to step in and provide a best in
    class implementation of the abstraction
    ○ The abstraction allows for a much better shot at
    expressing infrastructure independent from the
    location
    ● We decided to start with our API gateway
    ○ One of the most active projects at the time
    ○ Extremely sensitive to disruption

    View Slide

  15. Challenges met

    View Slide

  16. Keeping our promises in a containerized world
    ● Config management
    ○ Now next to the application: huge progress
    ○ Added internal tooling to generate manifests
    ● Deployments
    ○ Registries vs. Debian repositories
    ○ ArgoCD for managing deployments
    ● Security
    ○ Network and security policies
    ○ OPA (wip)

    View Slide

  17. Container networking
    ● Network used to be boring
    ○ A public IP per VM
    ○ Security groups to provide isolation
    ● Exoscale private networks not ready for CNI
    ● Performance analysis led to the use of Calico

    View Slide

  18. SKS: Scalable
    Kubernetes Service

    View Slide

  19. Redux of what we learnt
    ● Network
    ○ Calico
    ● Security
    ○ Several certificate authorities per cluster
    ○ Encryption key for secrets, per cluster
    ○ Wireguard available on the template
    ○ Cluster access using certificates (support
    for users, groups, TTL)
    ● Exoscale Cloud Controller Manager
    ○ To validate worker nodes
    ○ Network Load Balancer integration

    View Slide

  20. Full integration in the Exoscale stack
    ● Network Load Balancer
    ○ “LoadBalancer” Kubernetes services
    ○ Configuration using annotations
    ● Instance Pools
    ○ We rely on instance pools for the nodepools
    ○ Same properties (nodes cycling…)
    ● Security groups (per nodepool)
    ● Anti affinity groups (per nodepool)
    ● API and tooling
    ○ CLI, Terraform

    View Slide

  21. Product objectives
    ● Speed
    ○ Create clusters in ~100 seconds
    ○ New nodes in the cluster in ~120 seconds
    (available in “kubectl get nodes”)
    ○ Should be faster in the future
    ● Seamless start
    ● CNCF compliance
    ● Reliability: two offerings
    ○ starter: no SLA, non-HA control plane, free
    ○ pro: SLA, HA control plane

    View Slide

  22. Demo!

    View Slide

  23. Kubernete
    dashboard
    Kubernetes
    “LoadBalancer”
    Service
    Exoscale
    Load Balancer
    Kubernetes Cluster
    Outside world

    View Slide

  24. Kubernete
    dashboard
    Kubernetes
    “LoadBalancer”
    Service
    Exoscale
    Load Balancer
    Kubernetes Cluster
    Outside world
    Kubernetes
    “LoadBalancer”
    Service
    Nginx ingress
    controller
    App App
    Exoscale
    Load Balancer

    View Slide

  25. Additional notes and
    future work

    View Slide

  26. Advanced use cases
    ● Cluster lifecycle management
    ○ Cluster upgrades (next patchs, next minor)
    ● Certificate management
    ○ You can retrieve various CA certificates in
    order to configure some components
    ● Multiple nodepools
    ○ Each nodepool is independant
    ○ Can have different disk sizes, offerings, anti
    affinity groups, networking rules…
    ○ Can be scaled independently

    View Slide

  27. Ongoing work
    ● Cluster autoscaler (short-term)
    ○ Automatically scale nodepools based on
    Kubernetes metrics
    ● Web portal (short-term)
    ● Blueprints (short-term)
    ○ Manifests examples for common things
    ● GPU nodepools
    ● More add-ons: dashboard, ingress, metrics-server
    ○ metrics-server should arrive soon
    ● Persistent volumes: specific add-on
    ● Automatic security group management
    ● Managed container registry
    ● Advanced IAM integration

    View Slide