High Level Architecture of LINE Private Cloud IaaS Region 1 Region 2 Region 3 Identity Image DNS L4LB L7LB Block Storage Baremetal Object Storage Kubernetes VM Redis Mysql PaaS ElasticSearch FaaS Function as a Service Multiple Regions Support Scale is growing up - 1500 HV - 8000 Baremetals Provide different Level of abstraction PaaS
Today’s Topic IaaS Region 1 Region 2 Region 3 Identity Image DNS L4LB L7LB Block Storage Baremetal Object Storage VM Redis Mysql PaaS ElasticSearch FaaS Function as a Service Multiple Regions Support Scale is growing up - 1500 HV - 8000 Baremetals Provide different Level of abstraction Kubernetes
Kubernetes Cluster Performance Deployment / Update Private Cloud Collaboration Managed Kubernetes Mission of Our Managed Kubernetes Service For more than 2200 developers (100+ clusters) Kubernetes Operator Kubernetes Solution Architect High Availability Make an effort to keep Kubernetes Cluster stable Keep thinking How we can migrate existing application to Kubernetes 1 2 3
Kubernetes Cluster Performance Deployment / Update Private Cloud Collaboration Managed Kubernetes Mission of Our Managed Kubernetes Service For more than 2200 developers (100+ clusters) Kubernetes Operator Kubernetes Solution Architect High Availability Make an effort to keep Kubernetes Cluster stable Keep thinking How we can migrate existing application to Kubernetes Where we focus on Now Where we focus on Now
1. Deployment: Fully Managed Etcd, Controller etcd etcd etcd 5 etcd (Basic) etcd controller controller controller 3 • kube-apiserver • kube-controller- manager • kube-scheduler • kubelet • kube-proxy × × × worker worker worker N • kubelet • kube-proxy - Our Deployment × From From From OR Import Depending on the load of API, We increase the number of API Server or scale up servers 5 etcd (More disk, cpu) Scale Out Scale Up
1. Deployment: Support Import Worker etcd etcd etcd 5 2 etcd etcd controller controller controller 3 • kube-apiserver • kube-controller- manager • kube-scheduler • kubelet • kube-proxy × × × 2 worker worker worker N • kubelet • kube-proxy - Toleration Limit Our Deployment × From From From OR Import From GUI, CLI, User can increase the number of worker. For some specific users, We also allow user to import their own Servers.
1. Deployment: Why We support “Import” ? (1/3) Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Etcd Controlplane Etcd Controlplane Worker GPU Server GPU Server GPU Server Worker Web Application Dev Team A Create VMs Install Kubernetes
1. Deployment: Why We support “Import” ? (2/3) Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Etcd Controlplane Etcd Controlplane Worker GPU Server GPU Server GPU Server Worker Web Application Dev Team A We wanted to use our own GPU Server which is not maintained by Private Cloud Create VMs Install Kubernetes
1. Deployment: Why We support “Import” ? (3/3) Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs GPU Server GPU Server GPU Server Worker Worker Worker Etcd Controlplane Etcd Worker Allow to import only as a worker Controlplane Register Server As a worker Install Kubernetes Let them join k8s
1. Deployment: Summary Network Flannel + VXLAN Flannel+VXLAN: Although there is encapsulation overhead, We’ve chosen simple implementation for easy operation. NetworkPolicy: Each Project will have their own Kubernetes Cluster, that’s why we don’t provide Network Security Etcd OpenStack VMs (At default: 5 Nodes) To make it easy to scale up, replace the node, We use OpenStack VMs. This is fully managed by Private Cloud Operator Controller OpenStack VMs (At default: 5 Nodes) To make it easy to scale up, scale out, replace the node, We use OpenStack VMs. This is fully managed by Private Cloud Operator Workers OpenStack VMs or Import (At default: 3 Nodes) OpenStack VMs: Scale Out is required to perform by End- User but Healing, Replacement is done by Operator Import Servers: required to be managed by End-User
controller 2. Easy Auto-Healing: Key Concept(NodePool) etcd etcd etcd controller controller worker worker worker NodePool NodePool NodePool ● Roles ○ Etcd or Controller or Worker ● The number of Node ○ Integer (e.g. 10) ● Node Spec ○ Spec ID ■ OpenStack Image ID ■ Flavor ID ■ Network ID ● Node Labels ○ Label Information will be populated into K8s as a k8s node label ● Dockerd Option ○ Dockerd Option will be populated into dockerd option in all nodes
controller 2. Easy Auto-Healing: Key Concept(NodePool) etcd etcd etcd controller controller worker worker worker NodePool NodePool NodePool Periodically Check - If the number of node is enough - If any node have error state Create/Add VM with Spec to K8s if need
3. Addon Management Kubernetes Cluster User Cluster Monitoring Log Aggregation Etcd Backup Persistent Volume Provider Verda Officially Supported Addon Kubernetes Addon? To make Kubernetes Cluster production ready, We have many extra softwares to run. They are called as Kubernetes Addon - Cluster Monitoring - Log Aggregation - Persistent Volume Provider - Etcd Backup Software … Verda Kubernetes Certified Addon? For Common Use-case, Middleware, Private Cloud Team actively maintain instead of Private Cloud User - Addon Software Implementation itself - Configuration Best Practice
3. Addon Management: User Cluster Monitoring(1/3) Kubernetes Cluster Etcd Etcd Etcd Etcd Etcd Controller Etcd Etcd Worker User User can see Monitoring Dashboard Monitoring Control Plane for Kubernetes (Etcd, Controller) Trigger Alert Notify to Developper/Operator for Managed Kubernetes
Why we need simple API server in front of Rancher What ● Simple stateless API Server Responsibility in our Managed Kubernetes ● Provide interface of Managed Kubernetes Service for User ○ This is only place for User Interface Why ● Don’t expose Rancher API/GUI directly to User ○ Avoid strongly depending on Rancher ○ Restrict few Rancher Functionality ● Easy to add extra business logic API
is our core management functionality What is Rancher? ● OSS tools which is developed by Rancher Lab ● Implemented based on Kubernetes (Use CRD, client-go heavily) ● Provide multiple clusters management functionality Responsibility in our Managed Kubernetes ● Provision ● Update ● Keep Kubernetes Cluster Available/Healthy ○ Monitoring ○ Healing
Addon Manager(Addon Starter and Controller) What ● Provide Addon Management Functionality ● Addon Controller Starter run with Rancher ● Addon Controller run on each k8s Responsibility in our Managed Kubernetes ● Addon Controller Starter ○ Run Addon Controller for each cluster ● Addon Controller ○ Deploy Addon based on Addon definition ○ Update Addon based on Addon definition ○ Monitor Addon Addon Manager Addon Controller Addon Controller Starter
Rancher is making a lot of operation automated When Node got broken? When entire cluster got down? When cluster need to update? When certificate to update? 1 API Call to replace Node with broken Node 1 API Call to restore etcd From snapshot 1 API Call to update k8s 1 API Call to rotate certificate Events
2 types of Monitoring User Cluster Monitoring Simple Health Check 1. Kubernetes API is working 2. Have TCP Connectivity with Agent in User K8s Cluster a. Cluster Agent for each Cluster b. Node Agent for each Node 3. Sync kubelet status for each node 4. Sync componentstatus API result Advanced Health Check 1. Node Resource Usage (with Node Exporter) 2. Etcd /metrics API result 3. kube-XXXXX /metrics API result
We can easily notice something wrong quickly Even if the number of clusters are more than 90 Long Node Provisioning Long Cluster Provisioning Unhealthy Component Status Unhealthy Node Condition Unhealthy Cluster Condition We can notice each node, cluster’s status change
Advanced Health Check Overview => Repeat of P24 Kubernetes Cluster Etcd Etcd Etcd Etcd Etcd Controller Etcd Etcd Worker User User can see Monitoring Dashboard Monitoring Control Plane for Kubernetes (Etcd, Controller) Trigger Alert Notify to Developper/Operator for Managed Kubernetes
Kubernetes Cluster Kubernetes Cluster API Automate Operating Multiple Cluster Addon Controller Addon1 Addon2 Addon Controller Addon Controller Starter C-plane Monitoring Most Important It have many internal states to manage multiple k8s clusters