Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Story of Managing Common Add-ons on 1000+ Kubernetes Clusters

The Story of Managing Common Add-ons on 1000+ Kubernetes Clusters

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Shinya Uemura / Z Lab The Story of Managing Common

    Add-ons on 1000+ Kubernetes Clusters
  2. Shinya Uemura / @uesyn › Z Lab Corporation › Software

    engineer › Customer Success Team › Organizing communities › "Kubernetes Meetup Tokyo" › "Kubernetes มߋ಺༰ڞ༗ձ" etc…
  3. Z Lab Corporation › A wholly owned subsidiary of Yahoo

    Japan Corporations, established in 2015 › Research and development of infrastructure technologies › Development of managed Kubernetes services/PaaS, etc. for Yahoo Japan Corporation › https://zlab.co.jp
  4. Agenda - Yahoo! JAPAN's Private Kubernetes as a Service -

    Management of add-ons on clusters - Monitoring of the add-on system
  5. Yahoo! JAPAN's Kubernetes as a Service - Provided in-house as

    Managed Kubernetes as a Service - Create and manage clusters on OpenStack on-premise - Commonly known internally as ZCP - Developed by Z Lab Corp., and operated by Yahoo Japan Corporation - One of the largest number of clusters/nodes in Japan 2017/7 2018/7 2019/7 2020/5 2021/7 2022/9 5 cluster 20 cluster 400+ cluster 680+ cluster 1000+ cluster 1200+ cluster ✓ Number of Clusters: 1207 ✓ Number of nodes(VMs): 41735 ✓ Number of containers: 748522 (As of 9/13/2022)
  6. Yahoo! JAPAN's Kubernetes as a Service Kubernetes as a Service

    Create Control Plane Control Plane Control Plane etcd Control Plane worker Use
  7. Characteristics of Yahoo! JAPAN's Private KaaS (1/2) - Secure by

    default - Do not allow users to create containers with privileges - Application with Ingress works out of the box - A DNS domain available for Ingress is allocated for each cluster Control Plane ingress Control Plane worker xxx.example.co.jp HTTP Request Privileged Container
  8. - Managed - Repairs nodes with failures or problems(self-healing) -

    Can upgrade clusters safely with zero downtime - Coordinates with monitoring services - Automatically backups or restores data stores(etcd), etc. - Scalable - Can freely change the cluster size within a defined quota as requested by the user Frees operators from complex Kubernetes operations Characteristics of Yahoo! JAPAN's Private KaaS (2/2)
  9. Cluster management in Private KaaS - 40,000+ servers of servers

    need to be operated - Automation is almost essential to manage them - Automation of operations with Kubernetes Controller - Clusters can be managed in a declarative configuration - "controllers are control loops that watch the state of your cluster, then make or request changes where needed. " - ref: https://kubernetes.io/docs/concepts/architecture/ controller/
  10. Reference information on Yahoo! JAPAN Private KaaS - There are

    many other features and ingenuities - Rereference links - https://www.slideshare.net/techblogyahoo/yahoo-japan-kubernetesasaservice - https://www.docswell.com/s/ydnjp/KJ1G3Z-2019-01-26-152927
  11. Providing add-ons - Some Kubernetes features require external dependencies -

    Resolve DNS domain in the Kubernetes cluster - Receive L7 traffic with Ingress Controller - Use Horizontal Pod Autoscaler etc... - Some applications are not essential to the operation of the cluster, but are desired to run on many clusters - Some applications are needed by KaaS to manage user clusters - KaaS-managed applications are also running on user clusters - These are called add-ons - Ex: Ingress Controller, CoreDNS, Prometheus etc…
  12. Why do we offer add-ons? - Improvement in convenience -

    Allow users to focus on application development - Security - Enforce our company's security policy - Monitoring - KaaS monitors user's clusters - Users monitor applications on clusters
  13. - Some add-ons are forcibly enabled and the others can

    be enabled at the user's discretion - Add-ons that are forcibly enabled - Related to cluster monitoring and security - External dependencies of Kubernetes - Add-ons that users can enable as needed - Ones for improving development and operational efficiency for users - Ones that are not required for all users, but are useful - Add-ons run on user Kubernetes just like the user's application - They run on a different namespace from the user's application User Kubernetes add-ons user apps admin namespaces user namespaces Add-ons running on user cluster
  14. Maintenance of add-ons - Add-ons need to be updated as

    needed, not just deployed - Add new features, fix bugs, address volunerabilities, etc. - Adding and removing add-ons - Adding add-ons that are strongly requested by user - Removing add-ons that are no longer needed or impossible to maintain - Each version of Kubernetes has different support - Need to deploy suitable add-ons which supports Kubernetes version We must maintain add-ons running on huge amount of user clusters
  15. Our requirements for add-on management system - Update add-ons automatically

    - Resilient to temporary deployment errors - Support to add and remove add-ons In order to reduce operational costs due to the large number of clusters… In order to provide highly convenient for users/administrators… - Enable and Disable add-ons - Customize add-ons to fit each cluster
  16. Kubernetes official addon-manager - Controller that continues to apply the

    pre-defined add-on manifests - https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/addon- manager - Simple implementation with kubectl apply —prune command loop - Written in Bash - Applies all manifests for a specific directory - Deploys add-ons if they do not exist - Even when changes are made to add-ons, they are modified to match the changed manifests - Runs on each cluster
  17. Different environment for each cluster (1/2) - Size or number

    of nodes, etc. vary by cluster - Add-ons that monitor nodes have more items to be monitored and consume more resources, the more nodes there are - Some add-ons demand more resources depending on usage - The more metrics an app has, the more resources add-ons that monitor metrics consume - Add-ons that record cluster events consume more resources, the more cluster operations there are, etc… - Some add-ons are needed by only a part of user Prometheus app usage Prometheus app usage app app
  18. Different environment for each cluster (2/2) - A different value

    exists for each cluster, such as a domain used by Ingress - Some add-ons expose Web UI, etc. to the public outside of the cluster through Ingress - Each cluster has a different DNS domain, so the Ingress resource must also change the value accordingly According to usage, a mechanism that allows a part of configuration to be changed or a mechanism that automatically generates manifests suitable for clusters is needed Kubernetes Cluster shopping.example.co.jp Kubernetes Cluster fashion.example.co.jp Kubernetes Cluster hobby.example.co.jp
  19. zcp-addon-manager - We developed zcp-addon-manager (abbreviated as ZAM) that meets

    the requirements of private KaaS - The main mechanism is the same as that of the official addon-manager - Differences from the official addon-manager - Implemented as Kubernetes Controller in Go language - Ability to enable/disable add-ons - Cluster users can control only some allowed add-ons - Templating add-on manifests - Automatically acquiring a different value for each cluster, such as a domain, and inserting the value into the template - Providing a mechanism where users can securely modify a part of the manifest
  20. Release of zcp-addon-manager (1/3) - ZAM and add-on manifests are

    released as a set - Deploy as a single container image - In order to treat as immutable to simplify updates coredns v1.10.0 node-exporter 1.4.0 Prometheus v2.39.0 ZAM v1.24.0 ZAM v1.24.1 coredns v1.10.0 node-exporter 1.4.0 Prometheus v2.39.1 updated!
  21. Release of zcp-addon-manager (2/3) - Release branches exist by each

    minor versions of Kubernetes - Kubernetes supports different APIs by each minor version - Add-ons and ZAM include operations that depend on the API of Kubernetes, so it is necessary to change contents to be released for each version main For Kubernetes v1.24 For Kubernetes v1.25 v1.24.0 v1.24.1 v1.24.2 v1.24.3 v1.25.0 v1.25.1 v1.25.2
  22. Release of zcp-addon-manager (3/3) - Breaking changes are only released

    in minor version updates wherever possible - Changes that change the sense of operating, decommissioning of add-ons, changes requiring user action, etc. - In order to avoid impacting the cluster usage experience - In order to automatically update the patch version main For Kubernetes v1.24 For Kubernetes v1.25 breaking change! v1.24.0 v1.24.1 v1.24.2 v1.24.3 v1.25.0 v1.25.1 v1.25.2
  23. Deployment management of ZAM - ZAM that matches the minor

    version of the Kubernetes needs to be deployed - Kubernetes can be upgraded at any time by users - ZAM needs to be released in conjunction with that operation - ZAM suitable for each of the approximately 1200 clusters needs to be deployed - There are cases where only ZAM for the minor version of specific Kubernetes are released - In a particular cluster, for testing purposes, it may be desirable to always apply the latest ZAM - In some clusters, it may be desirable to temporarily stop updating ZAM Kubernetes v1.24 Cluster add-ons ZAM v1.24.x Kubernetes v1.25 Cluster add-ons ZAM v1.25.y Cluster Upgrade new add-on
  24. Automated Deployment of ZAM - We developed zam-updater, a controller

    specialized for ZAM deployment - It detects the version of kubernetes in the cluster where it is running and deploys a suitable ZAM - Unlike ZAM, zam-updater is deployed in a version common across all clusters Kubernetes v1.24 Cluster add-ons ZAM v1.24.x Kubernetes v1.25 Cluster add-ons ZAM v1.25.y Cluster Upgrade new add-on zam-updater ZAM v1.24.x zam-updater
  25. Configuration of zam-updater(1/3) - 2 configs for zam-updater - Channels

    and applied ZAM version list - It must be exist - Configure the zam-updater behavior - Channel to be used - zam-updater to enable/disable updates, etc. - If It doesn't exists, default configuration is used. - Release the version of the channel written in the config file Kubernetes Minor Version Channel ZAM Version Version List zam-updater config
  26. Configuration of zam-updater(2/3) Kubernetes v1.24 Cluster add-ons ZAM v1.24.1 Kuberneter

    v1.24 Cluster add-ons ZAM v1.24.2 new add-on zam-updater ZAM v1.24.1 zam-updater Update Version List Version List Version List
  27. Version List Configuration of zam-updater(3/3) Kubernetes v1.24 Cluster add-ons ZAM

    v1.24.1 Kuberneter v1.24 Cluster new add-on zam-updater zam-updater Version List add-ons ZAM v1.24.3-rc.1 zam-updater config zam-updater config Version List
  28. Automated deployment of zam-updater - Developed a mechanism that continues

    to apply the common manifest to all clusters - Apply the same manifest regardless of the configuration or version of each cluster - Release zam-updater and version list with this - Updating ZAM is completed simply by updating the version list to be distributed Kubernetes v1.24 Cluster add-ons ZAM v1.24.x zam-updater Kubernetes v1.25 Cluster add-ons ZAM v1.25.x zam-updater Default Manifest Deployer Kubernetes as a Service Cluster Discovery Version List Version List
  29. Monitoring of add-ons and add-on systems - Add-ons, ZAM, and

    zam-updater are running in each cluster - They are not centrally controlled - As a KaaS developer/operator, we need to monitor whether they are running properly
  30. Monitoring of clusters - Prometheus is running as an add-on

    in each cluster - Prometheus monitors add-ons and user applications on the same cluster - KaaS Prometheus scrapes critical metrics/alerts from each User Prometheus Kubernetes as a Service KaaS Prometheus Cluster Discovery User Kubernetes Cluster User Prometheus add-ons User Kubernetes Cluster User Prometheus add-ons Federation
  31. - Usage of add-ons in all clusters can be checked

    - Add-ons actually applied by ZAM - Which and how many add-ons are applied to cluster - Applied zam-updater versions etc. - They are not centrally controlled, but monitored to check their normality Visualization of usage in zcp-addon-manager
  32. Summary - Yahoo Japan Corporation and Z Lab Corporation are

    developing and operating Private KaaS - Approximately 1200 clusters running on Private KaaS - Private KaaS maintains not only Kubernetes, but also common add-ons wchich runs on all clusters - Management of add-ons is automated with Kubernetes Controller - ZAM: Deploys and updates add-ons in a cluster - zam-updater: Deploy and Updates ZAM in a cluster - Default Manifest Deployer: Deploy zam-updater and it's configs in each cluster - We monitor add-ons to check their normality, not just distribute them These efforts allow us to manage add-ons on 1000+ clusters