Japan Corporations, established in 2015 › Research and development of infrastructure technologies › Development of managed Kubernetes services/PaaS, etc. for Yahoo Japan Corporation › https://zlab.co.jp
Managed Kubernetes as a Service - Create and manage clusters on OpenStack on-premise - Commonly known internally as ZCP - Developed by Z Lab Corp., and operated by Yahoo Japan Corporation - One of the largest number of clusters/nodes in Japan 2017/7 2018/7 2019/7 2020/5 2021/7 2022/9 5 cluster 20 cluster 400+ cluster 680+ cluster 1000+ cluster 1200+ cluster ✓ Number of Clusters: 1207 ✓ Number of nodes(VMs): 41735 ✓ Number of containers: 748522 (As of 9/13/2022)
default - Do not allow users to create containers with privileges - Application with Ingress works out of the box - A DNS domain available for Ingress is allocated for each cluster Control Plane ingress Control Plane worker xxx.example.co.jp HTTP Request Privileged Container
Can upgrade clusters safely with zero downtime - Coordinates with monitoring services - Automatically backups or restores data stores(etcd), etc. - Scalable - Can freely change the cluster size within a defined quota as requested by the user Frees operators from complex Kubernetes operations Characteristics of Yahoo! JAPAN's Private KaaS (2/2)
need to be operated - Automation is almost essential to manage them - Automation of operations with Kubernetes Controller - Clusters can be managed in a declarative configuration - "controllers are control loops that watch the state of your cluster, then make or request changes where needed. " - ref: https://kubernetes.io/docs/concepts/architecture/ controller/
many other features and ingenuities - Rereference links - https://www.slideshare.net/techblogyahoo/yahoo-japan-kubernetesasaservice - https://www.docswell.com/s/ydnjp/KJ1G3Z-2019-01-26-152927
Resolve DNS domain in the Kubernetes cluster - Receive L7 traffic with Ingress Controller - Use Horizontal Pod Autoscaler etc... - Some applications are not essential to the operation of the cluster, but are desired to run on many clusters - Some applications are needed by KaaS to manage user clusters - KaaS-managed applications are also running on user clusters - These are called add-ons - Ex: Ingress Controller, CoreDNS, Prometheus etc…
be enabled at the user's discretion - Add-ons that are forcibly enabled - Related to cluster monitoring and security - External dependencies of Kubernetes - Add-ons that users can enable as needed - Ones for improving development and operational efficiency for users - Ones that are not required for all users, but are useful - Add-ons run on user Kubernetes just like the user's application - They run on a different namespace from the user's application User Kubernetes add-ons user apps admin namespaces user namespaces Add-ons running on user cluster
needed, not just deployed - Add new features, fix bugs, address volunerabilities, etc. - Adding and removing add-ons - Adding add-ons that are strongly requested by user - Removing add-ons that are no longer needed or impossible to maintain - Each version of Kubernetes has different support - Need to deploy suitable add-ons which supports Kubernetes version We must maintain add-ons running on huge amount of user clusters
- Resilient to temporary deployment errors - Support to add and remove add-ons In order to reduce operational costs due to the large number of clusters… In order to provide highly convenient for users/administrators… - Enable and Disable add-ons - Customize add-ons to fit each cluster
pre-defined add-on manifests - https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/addon- manager - Simple implementation with kubectl apply —prune command loop - Written in Bash - Applies all manifests for a specific directory - Deploys add-ons if they do not exist - Even when changes are made to add-ons, they are modified to match the changed manifests - Runs on each cluster
of nodes, etc. vary by cluster - Add-ons that monitor nodes have more items to be monitored and consume more resources, the more nodes there are - Some add-ons demand more resources depending on usage - The more metrics an app has, the more resources add-ons that monitor metrics consume - Add-ons that record cluster events consume more resources, the more cluster operations there are, etc… - Some add-ons are needed by only a part of user Prometheus app usage Prometheus app usage app app
exists for each cluster, such as a domain used by Ingress - Some add-ons expose Web UI, etc. to the public outside of the cluster through Ingress - Each cluster has a different DNS domain, so the Ingress resource must also change the value accordingly According to usage, a mechanism that allows a part of configuration to be changed or a mechanism that automatically generates manifests suitable for clusters is needed Kubernetes Cluster shopping.example.co.jp Kubernetes Cluster fashion.example.co.jp Kubernetes Cluster hobby.example.co.jp
the requirements of private KaaS - The main mechanism is the same as that of the official addon-manager - Differences from the official addon-manager - Implemented as Kubernetes Controller in Go language - Ability to enable/disable add-ons - Cluster users can control only some allowed add-ons - Templating add-on manifests - Automatically acquiring a different value for each cluster, such as a domain, and inserting the value into the template - Providing a mechanism where users can securely modify a part of the manifest
released as a set - Deploy as a single container image - In order to treat as immutable to simplify updates coredns v1.10.0 node-exporter 1.4.0 Prometheus v2.39.0 ZAM v1.24.0 ZAM v1.24.1 coredns v1.10.0 node-exporter 1.4.0 Prometheus v2.39.1 updated!
minor versions of Kubernetes - Kubernetes supports different APIs by each minor version - Add-ons and ZAM include operations that depend on the API of Kubernetes, so it is necessary to change contents to be released for each version main For Kubernetes v1.24 For Kubernetes v1.25 v1.24.0 v1.24.1 v1.24.2 v1.24.3 v1.25.0 v1.25.1 v1.25.2
in minor version updates wherever possible - Changes that change the sense of operating, decommissioning of add-ons, changes requiring user action, etc. - In order to avoid impacting the cluster usage experience - In order to automatically update the patch version main For Kubernetes v1.24 For Kubernetes v1.25 breaking change! v1.24.0 v1.24.1 v1.24.2 v1.24.3 v1.25.0 v1.25.1 v1.25.2
version of the Kubernetes needs to be deployed - Kubernetes can be upgraded at any time by users - ZAM needs to be released in conjunction with that operation - ZAM suitable for each of the approximately 1200 clusters needs to be deployed - There are cases where only ZAM for the minor version of specific Kubernetes are released - In a particular cluster, for testing purposes, it may be desirable to always apply the latest ZAM - In some clusters, it may be desirable to temporarily stop updating ZAM Kubernetes v1.24 Cluster add-ons ZAM v1.24.x Kubernetes v1.25 Cluster add-ons ZAM v1.25.y Cluster Upgrade new add-on
specialized for ZAM deployment - It detects the version of kubernetes in the cluster where it is running and deploys a suitable ZAM - Unlike ZAM, zam-updater is deployed in a version common across all clusters Kubernetes v1.24 Cluster add-ons ZAM v1.24.x Kubernetes v1.25 Cluster add-ons ZAM v1.25.y Cluster Upgrade new add-on zam-updater ZAM v1.24.x zam-updater
and applied ZAM version list - It must be exist - Configure the zam-updater behavior - Channel to be used - zam-updater to enable/disable updates, etc. - If It doesn't exists, default configuration is used. - Release the version of the channel written in the config file Kubernetes Minor Version Channel ZAM Version Version List zam-updater config
v1.24.1 Kuberneter v1.24 Cluster new add-on zam-updater zam-updater Version List add-ons ZAM v1.24.3-rc.1 zam-updater config zam-updater config Version List
to apply the common manifest to all clusters - Apply the same manifest regardless of the configuration or version of each cluster - Release zam-updater and version list with this - Updating ZAM is completed simply by updating the version list to be distributed Kubernetes v1.24 Cluster add-ons ZAM v1.24.x zam-updater Kubernetes v1.25 Cluster add-ons ZAM v1.25.x zam-updater Default Manifest Deployer Kubernetes as a Service Cluster Discovery Version List Version List
zam-updater are running in each cluster - They are not centrally controlled - As a KaaS developer/operator, we need to monitor whether they are running properly
in each cluster - Prometheus monitors add-ons and user applications on the same cluster - KaaS Prometheus scrapes critical metrics/alerts from each User Prometheus Kubernetes as a Service KaaS Prometheus Cluster Discovery User Kubernetes Cluster User Prometheus add-ons User Kubernetes Cluster User Prometheus add-ons Federation
- Add-ons actually applied by ZAM - Which and how many add-ons are applied to cluster - Applied zam-updater versions etc. - They are not centrally controlled, but monitored to check their normality Visualization of usage in zcp-addon-manager
developing and operating Private KaaS - Approximately 1200 clusters running on Private KaaS - Private KaaS maintains not only Kubernetes, but also common add-ons wchich runs on all clusters - Management of add-ons is automated with Kubernetes Controller - ZAM: Deploys and updates add-ons in a cluster - zam-updater: Deploy and Updates ZAM in a cluster - Default Manifest Deployer: Deploy zam-updater and it's configs in each cluster - We monitor add-ons to check their normality, not just distribute them These efforts allow us to manage add-ons on 1000+ clusters