Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managed Kubernetes in Private Cloud using Rancher with more than 1000 nodes scale

Managed Kubernetes in Private Cloud using Rancher with more than 1000 nodes scale

LINE Developers

July 24, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Managed Kubernetes in Private Cloud using Rancher with more than

    1000 nodes scale ~ Part 1: How we are using Rancher ~ LINE Corporation Yuki Nishiwaki
  2. High Level Architecture of LINE Private Cloud IaaS Region 1

    Region 2 Region 3 Identity Image DNS L4LB L7LB Block Storage Baremetal Object Storage Kubernetes VM Redis Mysql PaaS ElasticSearch FaaS Function as a Service Multiple Regions Support Scale is growing up - 1500 HV - 8000 Baremetals Provide different Level of abstraction PaaS
  3. Today’s Topic IaaS Region 1 Region 2 Region 3 Identity

    Image DNS L4LB L7LB Block Storage Baremetal Object Storage VM Redis Mysql PaaS ElasticSearch FaaS Function as a Service Multiple Regions Support Scale is growing up - 1500 HV - 8000 Baremetals Provide different Level of abstraction Kubernetes
  4. Kubernetes Cluster Performance Deployment / Update Private Cloud Collaboration Managed

    Kubernetes Mission of Our Managed Kubernetes Service For more than 2200 developers (100+ clusters) Kubernetes Operator Kubernetes Solution Architect High Availability Make an effort to keep Kubernetes Cluster stable Keep thinking How we can migrate existing application to Kubernetes
  5. Kubernetes Cluster Performance Deployment / Update Private Cloud Collaboration Managed

    Kubernetes Mission of Our Managed Kubernetes Service For more than 2200 developers (100+ clusters) Kubernetes Operator Kubernetes Solution Architect High Availability Make an effort to keep Kubernetes Cluster stable Keep thinking How we can migrate existing application to Kubernetes Where we focus on Now Where we focus on Now
  6. Architecture of Managed Kubernetes Service Kubernetes Cluster Kubernetes Cluster Kubernetes

    Cluster API Automate Operating Multiple Cluster Cluster Operation - Cluster Create - Cluster Update - Add Worker Manage Cluster - Deploy - Update - Monitor Use Cluster - Deploy application - Scale application
  7. Why we need simple API server in front of Rancher

    Responsibility 1. Hide Rancher API/GUI from User 2. Aggregate Rancher API e.g. 1 Cluster Create API will internally make following Rancher API Calls - POST /v3/clusters - POST /v3/nodepools (multiple times) 3. Support Multiple Rancher deployments Why 1. Avoid strongly depending on Rancher 2. By limiting features, fixing user cluster deployment, Reduce risk for user to configure/use in wrong way 3. As a last resort to scale, we can support multiple rancher deployments by putting extra API in front of them API
  8. Architecture of Managed Kubernetes Service Kubernetes Cluster Kubernetes Cluster Kubernetes

    Cluster Automate Operating Multiple Cluster Cluster Operation - Cluster Create - Cluster Update - Add Worker Manage Cluster - Deploy - Update - Monitor Use Cluster - Deploy application - Scale application API
  9. is our core management functionality What is Rancher? • OSS

    tools which is developed by Rancher Lab • Implemented based on Kubernetes (Use CRD, client-go heavily) • Provide multiple clusters management functionality Responsibility in our Managed Kubernetes • Provision • Update • Keep Kubernetes Cluster Available/Healthy ◦ Monitoring ◦ Log Collecting ◦ Etcd Periodically Backup...
  10. Rancher 2.X architecture API Controller Kubernetes Cluster Kubernetes Cluster Cluster

    Agent Node Agent Node Agent Node Agent Node Agent Kubernetes Cluster Cluster Agent Node Agent Node Agent Node Agent Node Agent Rancher Server is needed to run on Kubernetes Rancher Server can be divided into “API part” and “Controller part” Kubernetes Cluster managed by Rancher Need to run “Cluster Agent” and “Node Agent” Websocket Websocket 1 2 3 4
  11. Rancher 2.X made use of Kubernetes Ecosystem 1. Use Kubernetes

    CRD as a Data Store 2. Implement logic as a controller by using informer, workqueue from client-go 3. Use ConfigMap based leader election from client-go 4. Use endpoints resource for Rancher Server Discovery 5. Use Kubernetes rolebinding, role for API Authorization
  12. Rancher 2.X made use of Kubernetes Ecosystem 1. Use Kubernetes

    CRD as a Data Store 2. Implement logic as a controller by using informer, workqueue from client-go 3. Use ConfigMap based leader election from client-go 4. Use endpoints resource for Rancher Server Discovery 5. Use Kubernetes rolebinding, role for API Authorization
  13. Implement all logics as a Kubernetes Controller API Controller ClusterA

    Watch Kubernetes Cluster Cluster Agent Node Agent NodeA NodeB CRD ・・・ Use CRD(Custom Resource Definition) to store Cluster, Node, User Informations…. Rancher API is just to create Kubernetes Custom Resource (kind of API proxy) When Controller detect new cluster resource, Do provisioning
  14. Custom Resource Definition(CRD) in Kubernetes? Kubernetes Native Resource Type Custom

    Resource Type CustomResourceDefinition ConfigMap Pod Nginx App A Nginx Config Cluster Node Cluster Node Cluster A Cluster B Node A Node B Kubernetes allow user to create custom resource type in addition to natively supported resource.
  15. Example of CRD for Rancher Resource: Cluster > kubectl get

    crd clusters.management.cattle.io -o yaml apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: creationTimestamp: 2018-10-26T13:49:37Z generation: 1 name: clusters.management.cattle.io resourceVersion: "1278" selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusters.management.cattle.io uid: fa628204-d925-11e8-b840-fa163e305e2c spec: group: management.cattle.io names: kind: Cluster listKind: ClusterList plural: clusters singular: cluster scope: Cluster version: v3 > kubectl get cluster NAME AGE local 1d CRD for Cluster Cluster Resource
  16. How we use Rancher? 1. Deploy Kubernetes with OpenStack 2.

    Easy to heal by replacing 3. Etcd Periodic backup 4. Basic Monitoring for Cluster
  17. 4 types of methods to deploy k8s from Rancher (2/2)

    1.Allocate Server 2.Install Docker 3.Register Node 4.Build Etcd, K8s 5.Deploy Rancher Agent Rancher Scope Rancher Scope Rancher Scope Rancher Scope Out of Rancher Scope Out of Rancher Scope Import Existing Cluster In a hosted Kubernetes provider Out of Rancher Scope From my own existing nodes From nodes in an infrastructure provider
  18. 4 types of methods to deploy k8s from Rancher (2/2)

    1.Allocate Server 2.Install Docker 3.Register Node 4.Build Etcd, K8s 5.Deploy Rancher Agent Rancher Scope Rancher Scope Rancher Scope Rancher Scope Out of Rancher Scope Out of Rancher Scope Import Existing Cluster In a hosted Kubernetes provider Out of Rancher Scope From my own existing nodes From nodes in an infrastructure provider Use 2 different ways in LINE Import Driver (From my own existing nodes) OpenStack Driver (From nodes in an infrastructure provider)
  19. Use OpenStack Node Driver for most of case Automate Operating

    Multiple Cluster Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Etcd Controlplane Etcd Controlplane Worker 1. Create VM by using OpenStack Driver 2. Run Rancher Agent GPU Server GPU Server GPU Server Worker docker-machine Cluster Agent Node Agent 1 2
  20. Use OpenStack Node Driver for most of case Automate Operating

    Multiple Cluster Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs We wanted to use our own GPU Server which is not maintained by Private Cloud Etcd Controlplane Etcd Controlplane Worker 1. Create VM by using OpenStack Driver 2. Run Rancher Agent GPU Server GPU Server GPU Server Worker docker-machine Cluster Agent Node Agent 1 2
  21. Use Import Driver for user who have special server Automate

    Operating Multiple Cluster Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs GPU Server GPU Server GPU Server Worker Worker Worker Etcd Controlplane Etcd Controlplane Worker Cluster Agent Node Agent 1. Create VM by using OpenStack Driver 2. Run Rancher Agent Node Agent 2 Allow to import only as a worker sudo docker run -d --privileged --restart=unless-stopped --net=host \ -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.1.5 \ --server <Rancher Server> --token <Token> --ca-checksum <CA Checksum> \ --worker
  22. Use Import Driver for user who have special server Automate

    Operating Multiple Cluster Web Application Dev Team A Machine Learning Team A Web Application Dev Team B Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs Kubernetes Cluster OpenStack VMs GPU Server GPU Server GPU Server Worker Worker Worker Etcd Controlplane Etcd Controlplane Worker Cluster Agent Node Agent 1. Create VM by using OpenStack Driver 2. Run Rancher Agent Node Agent 2 Allow to import only as a worker Note: In Rancher GUI, We can not use 2 different methodologies (cloud providers) to install k8s like OpenStack and Import Driver, AWS and OpenStack Driver… But Rancher Server implementation doesn’t restrict user from doing it. That’s why if you call API or do expected procedure by yourself, you can mix it.
  23. etcd etcd etcd 5 2 etcd etcd controller controller controller

    3 •kube-apiserver •kube-controller-manager •kube-scheduler •kubelet •kube-proxy × × × 2 worker worker worker N •kubelet •kube-proxy - Toleration Li Our Deployment × Summary: How we deployed Kubernetes with Rancher From From From OR Import
  24. How we use Rancher? 1. Deploy Kubernetes with OpenStack 2.

    Easy to heal by replacing 3. Etcd Periodic backup 4. Basic Monitoring for Cluster
  25. As node got increased, More nodes get broken... Kubernetes Cluster

    OpenStack VMs VM1 VM2 VM3 VM4 Hypervisor Failure Dockerd Bug... VM1 VM2 VM3 VM4 VM A VM B VM C
  26. As node got increased, More nodes get broken... Kubernetes Cluster

    OpenStack VMs VM1 VM2 VM3 VM4 Hypervisor Failure Dockerd Bug... VM1 VM2 VM3 VM4 VM A VM B VM C Let’s replace broken node with healthy node To keep providing enough compute resource
  27. Delete Node on GUI when Node get broken Kubernetes Cluster

    OpenStack VMs VM1 VM1 VM2 VM3 VM4 VM2 VM3 VM4 Automate Operating Multiple Cluster Delete VM3 docker-machine docker-machine rm VM3 Node Controller NodePool Controller Kind: Node VM3 Kind: NodePool NodePool1 Delete Node VM3 Detect Node VM3 Deleted Doing cleanup and remove finalizer
  28. NodePool Controller in Rancher will re-create Node Kubernetes Cluster OpenStack

    VMs VM1 VM1 VM2 VM3 VM4 VM2 VM3 VM4 Automate Operating Multiple Cluster docker-machine Node Controller NodePool Controller Compare The number of Node = 3 VS Quantity of NodePool = 4 Kind: Node VM3 Kind: NodePool NodePool1 Kind: Node VM3 Re-Create VM3 Node ... Spec: nodeTemplateName: XX quantity: 4 ...
  29. Node Controller provision new Node Kubernetes Cluster OpenStack VMs VM1

    VM1 VM2 VM3 VM4 VM2 VM4 Automate Operating Multiple Cluster docker-machine docker-machine create VM3 Node Controller NodePool Controller Kind: Node VM3 Kind: NodePool NodePool1 Detect new Node VM3 VM3 VM3 New
  30. Finally install k8s and Make new node join cluster Kubernetes

    Cluster OpenStack VMs VM1 VM1 VM2 VM3 VM4 VM2 VM4 Automate Operating Multiple Cluster Node Controller NodePool Controller Kind: Node VM3 Kind: NodePool NodePool1 After Node Provisioning Finished (Check Condition Fields), Run RKE VM3 VM3 New Cluster Provisioner Install/Update Kubernetes
  31. How we use Rancher? 1. Deploy Kubernetes with OpenStack 2.

    Easy to heal by replacing 3. Etcd Periodic backup 4. Basic Monitoring for Cluster
  32. HA deployment (multiple controlplane) is not always... etcd etcd etcd

    14:30 14:40 etcd etcd etcd × × × There is always risk for etcd data to be gone - Crush multiple etcd nodes - Accidently deleted data (Human Error) Kubernetes Cluster 1 Kubernetes Cluster 1
  33. Enable Periodic Backup for all of clusters s3_backup_config: access_key: "<access

    key>" bucket_name: "cluster1-bucket" endpoint: “<S3 API Endpoint>” region: "us-east-1" Create Bucket for each cluster
  34. How Periodic Backup works (1/2) etcd etcd etcd 14:30 Kubernetes

    Cluster 1 Object Storage (Ceph) cluster1-bucket cluster2-bucket etcd Automate Operating Multiple Cluster Kind: EtcdBackup EtcdBackup1430 Kind: Cluster Kubernetes Cluster 1 Kind: Cluster Kubernetes Cluster 1 Kind: Cluster Kubernetes Cluster 1 Kind: EtcdBackup EtcdBackup1430 Kind: EtcdBackup EtcdBackup1430 EtcdBackup Controller Check Backup Periodically Main Logic goroutine Check all clusters for every 5 min - Backup is enabled or not - When Last backup is took Create EtcdBackup Resource Represent actual snapshot
  35. How Periodic Backup works (2/2) etcd etcd etcd 14:30 Kubernetes

    Cluster 1 Object Storage (Ceph) cluster1-bucket cluster2-bucket etcd rke-etcd-backup Automate Operating Multiple Cluster Kind: EtcdBackup EtcdBackup1430 Kind: Cluster Kubernetes Cluster 1 Kind: Cluster Kubernetes Cluster 1 Kind: Cluster Kubernetes Cluster 1 Kind: EtcdBackup EtcdBackup1430 Kind: EtcdBackup EtcdBackup1430 EtcdBackup Controller Check Backup Periodically Main Logic goroutine Detect New EtcdBackup Run one-shot container To take snapshot Upload to Object Storage
  36. If we enable Periodic Backup, we can restore ! etcd

    etcd etcd 14:30 14:40 etcd etcd etcd × × × Kubernetes Cluster 1 Kubernetes Cluster 1 etcd etcd etcd Kubernetes Cluster 1 15:00 Object Storage (Ceph) cluster1-bucket cluster2-bucket Restore from snapshot Download
  37. Restore operation will be finished by 1 API Call Automate

    Operating Multiple Cluster Kind: Cluster Kubernetes Cluster 1 Kind: EtcdBackup EtcdBackup1430 Cluster Provisioner ... Spec: rancherKubernetesEngineConfig: restore: true snapshotName: EtcdBackup1430 ... Object Storage (Ceph) cluster1-bucket cluster2-bucket Restore from EtcdBackup1430 for Cluster 1 etcd etcd etcd Kubernetes Cluster 1 15:00 rke-etcd-backup Detect Cluster Change Run one-shot container to download snapshot Rancher API Update Cluster
  38. Restore operation will be finished by 1 API Call Automate

    Operating Multiple Cluster Kind: Cluster Kubernetes Cluster 1 Kind: EtcdBackup EtcdBackup1430 Cluster Provisioner ... Spec: rancherKubernetesEngineConfig: restore: true snapshotName: EtcdBackup1430 ... Object Storage (Ceph) cluster1-bucket cluster2-bucket etcd etcd etcd Kubernetes Cluster 1 15:00 etcdctl snapshot restore Run one-shot container to restore with snapshot
  39. How we use Rancher? 1. Deploy Kubernetes with OpenStack 2.

    Easy to heal by replacing 3. Etcd Periodic backup 4. Basic Monitoring for Cluster
  40. Rancher Provide 2 different monitoring Kubernetes Cluster Kubernetes Cluster Kubernetes

    Cluster 1. Basic Monitoring based on Kubernetes Standard Feature • Kubernetes Component Status • Kubernetes Node Condition 2. Advance Monitoring by deploying Grafana, Prometheus • kube-apiserver, kube-scheduler, kube-XXXX metrics API • coredns, kube-dns metrics API • node-exporter • kube-state-metrics Automate Operating Multiple Cluster
  41. 1. Basic Monitoring based on Kubernetes Features Kubernetes Cluster Kubernetes

    Cluster Kubernetes Cluster Periodically Call Component Status API (/api/v1/componentstatuses) Automate Operating Multiple Cluster HealthSyncer Monitoring Kind: Cluster Cluster1 Kind: Node Node1 NodeSyncer … omit ... Status: componentStatuses: …. omit ... Update based on API Response … omit ... Status: internalNodeStatus: …. omit ... Periodically Call Node API (/api/v1/node)
  42. 2. Advance Monitoring with Grafana, Prometheus Kubernetes Cluster Kubernetes Cluster

    Kubernetes Cluster Check if Cluster enable Extra Monitoring or not Automate Operating Multiple Cluster HealthSyncer Monitoring Kind: Cluster Cluster1 NodeSyncer Detect Cluster Change Deploy Grafana, Prometheus by Helm Chart https://github.com/rancher/system-charts
  43. Our Choice: Use Only Basic Monitoring Kubernetes Cluster Kubernetes Cluster

    Kubernetes Cluster 1. Basic Monitoring based on Kubernetes Standard Feature • Kubernetes Component Status • Kubernetes Node Condition 2. Advance Monitoring by deploying Grafana, Prometheus • kube-apiserver, kube-scheduler, kube-XXXX metrics API • coredns, kube-dns metrics API • node-exporter • kube-state-metrics Automate Operating Multiple Cluster • Use only Basic Monitoring ◦ Set alert for Node Status, Cluster Status on CRD • We don’t enable Rancher’s Advanced Monitoring ◦ We have/use our own Prometheus, Grafana Configuration
  44. Set Alert Resource Status updated by Basic Monitoring Automate Operating

    Multiple Cluster HealthSyncer Monitoring Kind: Cluster Cluster1 Kind: Node Node1 NodeSyncer … omit ... Status: componentStatuses: …. omit … condition: …. omit … … omit ... Status: internalNodeStatus: …. omit … condition: …. omit … Rancher State Metrics rancher_cluster_not_true_condition {cluster="c-2mf2d",condition="<Condition Name>"} {cluster="c-2mf2d",condition="NoMemoryPressure"} rancher_cluster_component_not_true_status {cluster="c-25f4n",exported_component="<Component Name>"} {cluster="c-25f4n",exported_component="controller-manager"} rancher_node_not_true_internal_condition {cluster="c-k87fq",condition="PIDPressure",node="m-2tch7"} rancher_node_not_true_codition {cluster="c-2f6gk",condition="Provisioned",node="m-kkrz6"} metrics
  45. Set Alert Resource Status updated by Basic Monitoring Automate Operating

    Multiple Cluster HealthSyncer Monitoring Kind: Cluster Cluster1 Kind: Node Node1 NodeSyncer … omit ... Status: componentStatuses: …. omit … condition: …. omit … … omit ... Status: internalNodeStatus: …. omit … condition: …. omit … Rancher State Metrics Long Node Provisioning Long Cluster Provisioning Unhealthy Component Status Unhealthy Node Condition Unhealthy Cluster Condition rancher_cluster_not_true_condition {cluster="c-2mf2d",condition="<Condition Name>"} {cluster="c-2mf2d",condition="NoMemoryPressure"} rancher_cluster_component_not_true_status {cluster="c-25f4n",exported_component="<Component Name>"} {cluster="c-25f4n",exported_component="controller-manager"} rancher_node_not_true_internal_condition {cluster="c-k87fq",condition="PIDPressure",node="m-2tch7"} rancher_node_not_true_codition {cluster="c-2f6gk",condition="Provisioned",node="m-kkrz6"} metrics Alerts Configure Alerts
  46. We can easily notice something wrong quickly Even if the

    number of clusters are more than 90 Long Node Provisioning Long Cluster Provisioning Unhealthy Component Status Unhealthy Node Condition Unhealthy Cluster Condition We can notice each node, cluster’s status change
  47. Using solve everything? Automate Operating Multiple Cluster “Yes as long

    as keep working well” Is it easy ? (・・;)。 。 。
  48. Problems We faced and “solved” 1. NodeSelector value should be

    always string (Failed to deploy ingress-nginx, kube-dns, coredns when specify "XXX.com: true" in nodeSelector 2. Rancher Cluster Agent, Node Agent might get hung when something goes wrong in the middle of WebSocket Session Handshake 3. Rancher Override/Delete the annotation of node flannel internally used to setup Vtep on the Host 4. Allow to configure additional tolerations for cluster-agent, node-agent rancher will deploy 5. Cluster with RKE driver always have error in "transitioning" field while provisioning (master, v2.0.8) 6. deployAgent in node-controller is always succeeded even if failed to run container(rancher/rancher-agent) 7. panic: "assignment to entry in nil map" when try to create node by calling POST /v3/nodes
  49. Problems We faced and “solved” 1. NodeSelector value should be

    always string (Failed to deploy ingress-nginx, kube-dns, coredns when specify "XXX.com: true" in nodeSelector 2. Rancher Cluster Agent, Node Agent might get hung when something goes wrong in the middle of WebSocket Session Handshake 3. Rancher Override/Delete the annotation of node flannel internally used to setup Vtep on the Host 4. Allow to configure additional tolerations for cluster-agent, node-agent rancher will deploy 5. Cluster with RKE driver always have error in "transitioning" field while provisioning (master, v2.0.8) 6. deployAgent in node-controller is always succeeded even if failed to run container(rancher/rancher-agent) 7. panic: "assignment to entry in nil map" when try to create node by calling POST /v3/nodes How we can detect the problem before become serious outage? How we troubleshoot? Where would be bottleneck in large scale? Where we should pay attention? How we extend Rancher ?
  50. Reached Time Limit Today 1. NodeSelector value should be always

    string (Failed to deploy ingress-nginx, kube-dns, coredns when specify "XXX.com: true" in nodeSelector 2. Rancher Cluster Agent, Node Agent might get hung when something goes wrong in the middle of WebSocket Session Handshake 3. Rancher Override/Delete the annotation of node flannel internally used to setup Vtep on the Host 4. Allow to configure additional tolerations for cluster-agent, node-agent rancher will deploy 5. Cluster with RKE driver always have error in "transitioning" field while provisioning (master, v2.0.8) 6. deployAgent in node-controller is always succeeded even if failed to run container(rancher/rancher-agent) 7. panic: "assignment to entry in nil map" when try to create node by calling POST /v3/nodes How we can detect the problem? How we troubleshoot? How we grasp what We submitted 1 CFP for North America 2019 “If it’s accepted”, let us talk about our story more detail Today Hopefully on KubeCon
  51. Where we are heading to Kubernetes Cluster Performance Deployment /

    Update Private Cloud Collaboration Managed Kubernetes Kubernetes Operator Kubernetes Solution Architect High Availability Make an effort to keep Kubernetes Cluster stable Keep thinking How we can migrate existing application to Kubernetes Where we focus on Next Where we focus on Next
  52. Quota for each Project & Logging Management 1. Cluster, Node

    Quota 2. Logging Management Automate Operating Multiple Cluster Project A Project B Can create only 1 cluster with 100 nodes Can create only 2 clusters with 200 nodes Database Quota Check Kubernetes Cluster Elasticsearch Maintained by other team Container Log Kubernetes Log Etcd Log Log Rotate Send Logs to Elasticsearch "log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "2" } /etc/docker/daemon.json API
  53. Addon Manager for Addons running on User Cluster 3. Addon

    Manager Kubernetes Cluster Addon Manager DNS Block Storage L4LB L7LB Redis Cinder CSI Provider Plugin LINE Ingress Controller LINE Type LB Implementation LINE Service Operator Verda Private Cloud Family Kubernetes Addons - Deploy Addons - Update Addons - Monitoring Addons