How We build Kubernetes service by Rancher in LINE

How We build Kubernetes service by Rancher in LINE

LINE Developer Meetup in Taiwan@7 / SDN x Cloud Native Meetup #14
https://cntug.kktix.cc/events/cntug-14

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers

March 28, 2019
Tweet

Transcript

  1. None
  2. How We Build k8s Service By Rancher In LINE LINE

    corporation Verda team Feixiang Li
  3. About me • Name: Luke • 2011 ~ 2015 Devops

    engineer at Rakuten • 2016/01 ~ Cloud engineer at LINE • Baremetal • k8s
  4. LINE Private Cloud • IaaS • VM • Baremetal •

    Managed Service • Database as service • Redis as service • Etc. • Multiple Regions • 20000 VM, 8000 Baremetal Tokyo Osaka Singapore
  5. Why Do We Need KaaS • Many teams are using

    their own k8s on LINE cloud • K8s isn’t easy to use • Deploy: 1.create servers 2. docker 3. kubeadm(or xxx) 4. etc. • Management: logging, monitoring, upgrade etc. • Recovery: etcd backup, restore etc. • Etc. • LINE cloud/infrastructure policy • DB ACL • Global IP • Use other cloud services • Etc.
  6. Goal • Provide stable k8s service • Release users from

    deployment, upgrade, management etc. • Make k8s easier to use • Reduce learning cost • Provide service template/guide (like how to build database service on k8s) • Consult/support etc. • Integration with LINE cloud services • Take care of dependency, configuration etc. • Make it easy to use other cloud service like Redis, object storage etc. LINE GKE
  7. How To Do From Scratch K8s Upgrade Auto Provision Etcd

    backup High Availability Recovery …… Use OSS OSS controller OSS
  8. Ideal vs Reality Web Application Engineer: 3? Kubernetes Engineer: 10?

    Etcd Engineer: 3? Engineers: 2
  9. Rancher • OpenStack support • No dependency on specific software

    • Already used by some teams on production • Active community • Architecture(k8s operator pattern)
  10. Rancher 2.X • OSS to create and manage multiple k8s

    clusters • Use k8s operator pattern for implementation API Controller ClusterA Watch Reconcile Get latest information from kube- apiserver Check if any difference Between desired and actual states Do something to make actual state desired Reconcile Loop Kubernetes Cluster Cluster Agent Node Agent Create
  11. How We Do API Server • Works as proxy •

    Integrate with private cloud • Limit Rancher function • Support multiple ranchers K8s provider Support multiple providers User k8s cluster • VM is created by OpenStack • K8s cluster is deployed by rancher
  12. LINE KaaS • 2018/06 ~ 2018/10 Development by 2 developers

    • 2018/11 Released on development environment • 47 clusters 472 nodes
  13. None
  14. None
  15. None
  16. Monitoring System Got Alert

  17. What’s happening? API Server Kubernetes Cluster Cluster Agent Kubernetes Cluster

    Cluster Agent WebSocket WebSocket Failed to establish WebSocket session
  18. Check log of cluster-agent $ kubectl logs -f cattle-cluster-agent-df7f69b68-s7mqg -n

    cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f2472cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-df7f69b68-s7mqg CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local ERROR: https://rancher.com/ping is not accessible (Could not resolve host: rancher.com) Somehow this container seems failed to resolve domain name. $ kubectl exec -it cattle-cluster-agent-df7f69b68-s7mqg -n cattle-system bash root@cattle-cluster-agent-df7f69b68-s7mqg:/# cat /etc/resolv.conf nameserver 172.19.0.10 #=> IP from Kubernetes Network search cattle-system.svc.cluster.local svc.cluster.local cluster.local $ kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) kube-dns ClusterIP 172.19.0.10 <none> 53/UDP,53/TCP
  19. Kube DNS problem? $ kubectl logs -f cattle-node-agent-c5ddm -n cattle-system

    INFO: https://rancher.com/ping is accessible INFO: Value from https://rancher.com/v3/settings/cacerts is an x509 certificate time="2019-02-26T21:30:21Z" level=info msg="Rancher agent version a9164915-dirty is starting" time="2019-02-26T21:30:21Z" level=info msg="Listening on /tmp/log.sock" time="2019-02-26T21:30:21Z" level=info msg="Option customConfig=map[address:XXXX internalAddress: roles:[] label:map[]]" time="2019-02-26T21:30:21Z" level=info msg="Option etcd=false" time="2019-02-26T21:30:21Z" level=info msg="Option controlPlane=false" time="2019-02-26T21:30:21Z" level=info msg="Option worker=false" time="2019-02-26T21:30:21Z" level=info msg="Option requestedHostname=yuki-testw1" time="2019-02-26T21:30:21Z" level=info msg="Connecting to wss://rancher.com/v3/connect with token" time="2019-02-26T21:30:21Z" level=info msg="Connecting to proxy" url="wss://rancher.com/v3/connect" time="2019-02-26T21:30:21Z" level=info msg="Starting plan monitor" It’s able to access from a container which isn’t using kube dns. kubectl exec -it cattle-node-agent-c5ddm -n cattle-system bash root@yuki-testc3:/# cat /etc/resolv.conf nameserver 8.8.8.8 • Something wrong with kube dns • Something wrong with that container
  20. Check kube-dns kube dns seems no problem ◦ No error

    in kube dns log ◦ Another container which is using kube-dns can resolve dns $ kubectl logs -l k8s-app=kube-dns -c kubedns -n kube- system|grep '^E' $ # no error detected $ kubectl run -it busybox --image busybox -- sh / # nslookup google.com Server: 172.19.0.10 Address: 172.19.0.10:53 Non-authoritative answer: Name: google.com Address: 216.58.196.238
  21. Container with problem • Container itself ◦ container image ◦

    container network policy etc. • Network ◦ nodes ◦ container network Node1 Container network eth0 Node2 Container network
  22. Deploy the container on another node $ kubectl logs -f

    cattle-cluster-agent-5d859cbb48-xb77r -n cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f24 72cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=c cattle-cluster-agent- 5d859cbb48-xb77r CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local INFO: https://rancher.com/ping is accessible Kubernetes Cluster Agent Cluster Agent node1 node2 Container itself seems no problem.
  23. Check the node Node1 Node2 busybox busybox $ kubectl exec

    -it busybox2 sh / # ping 172.18.9.3 -w1 2 packets transmitted, 0 packets received, 100% packet loss $ tcpdump -i eth0 port 8472 and src host <node 1> … # nothing output Container network has problem $ ping <node2> PING <> (<ip>): 56 data bytes 64 bytes from <ip>: icmp_seq=0 ttl=58 time=3.571 ms
  24. Who is responsible for container network • Building Container Network

    is just pre-condition of Kubernetes • It is supposed to be done outside of Kubernetes
  25. Look into container network What we use • Flannel which

    is used to connect Linux Containers • Flannel support multiple backends like vxlan, ipip…. Responsibility of Flannel in our case • Configure Linux kernel to create termination device of overlay network • Configure Linux kernel to route, bridge, related sub-system making container connected to other container • This software is not used to forward actual packet, just configuration
  26. How flannel works Pod A eth0: 172.17.1.2/24 cni0: 172.17.1.1/24 flannel.1:

    172.17.1.0/32 eth0: 10.0.0.1/24 $ ip n 172.17.2.0 flannel.1 lladdr cc.cc.cc.cc.cc.cc $bridge fdb show dev flannel.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.2 Pod B eth0: 172.17.2.2/24 cni0: 172.17.2.1/24 flannel.1: 172.17.2.0/32 mac: cc.cc.cc.cc.cc.cc eth0: 10.0.0.2/24 • Ping Pod B from Pod A • Pod subnet: 172.xx • Host subnet: 10.xx 1 route, 1 arp entry, 1 fdb entry per host ip r 172.17.2.0/24 via 172.17.2.0 dev flannel.1onlink
  27. Routing table 172.17.1.0/24 via 172.17.1.0 dev flannel.1 172.17.2.0/24 dev cni

    172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.1.0 flannel.1 lladdr aa:aa:aa:aa:aa:aa 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB aa.aa.aa.aa.aa.aa dev flannel.1 dst 10.0.0.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Routing table 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 dev cni ARP cache 172.17.2.0 flannel.1 lladdr bb.bb.bb.bb.bb.bb FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 Check network configuration Routing table 172.17.1.0/24 dev cni 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.2.0 flannel.1 lladdr bb:bb:bb:bb:bb:bb 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Node1 172.17.1.0/24 Node2 172.17.2.0/24 Node3 172.17.2.0/24 Node1 related information is missing
  28. Flannel problem? Check flannel agent log of target node, found

    nothing $ kubectl logs kube-flannel-knwd7 -n kube-system kube-flannel | grep -v ‘^E'
  29. Look into flannel $ kubectl get node yuki-testc1 -o yaml

    apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"9a:4f:ef:9c:2e:2f"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.1 flannel agent will • save node specific metadata into k8s node annotation when flannel start • setup route, fdb, arp cache if there’s a node with flannel annotations All Nodes should have these annotation
  30. Check k8s node annotation apiVersion: v1 kind: Node Metadata: Annotations:

    flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 Node2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 Node3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Node1 Missing Flannel related annotation flannel running on node3 could not configure for node1 because there’s no annotation ⇒why node1 doesn’t have flannel related annotation? ⇒why node2 has node1 network information?
  31. Annotation is changed by someone else apiVersion: v1 kind: Node

    Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Rancher also updates annotation
  32. How rancher works When Rancher build Kubernetes Nodes 1. Gets

    current node annotation 2. Build desired annotation 3. Get node resource 4. Replace annotation with desired one 5. Update node with desired annotation Function A Function B This operation is NOT Atomic. It ignores Optimistic Locking.
  33. Look back on what happened Node1 is up Flannel on

    node1 update node annotation Flannel on node2 add configuration of node1 Get node1 annotations Node3 is up override node1 annotations
  34. Write patch and Confirmed it works diff --git a/pkg/controllers/user/nodesyncer/nodessyncer.go b/pkg/controllers/user/nodesyncer/nodessyncer.go

    index 11cc9c4e..64526ccf 100644 --- a/pkg/controllers/user/nodesyncer/nodessyncer.go +++ b/pkg/controllers/user/nodesyncer/nodessyncer.go @@ -143,7 +143,19 @@ func (m *NodesSyncer) syncLabels(key string, obj *v3.Node) error { toUpdate.Labels = obj.Spec.DesiredNodeLabels } if updateAnnotations { - toUpdate.Annotations = obj.Spec.DesiredNodeAnnotations + // NOTE: This is just workaround. + // There are multiple solutions to solve the problem of https://github.com/rancher/rancher/issues/13644 + // and this problem is kind of desigin bugs. So solving the root cause of problem need design decistions. + // That's why for now we solved the problem by the soltion which don't have to change many places. Because + // We don't wanna create/maintain large change which have high possibility not to be merged in upstream + // Rancher Community tend to hesitate to merge big change created by engineer not from Rancher Lab + + // The solution is to change NodeSyncer so as not to replace annotation with desiredAnnotations but just update annotations which + // is specified in desiredAnnotation. This change have side-effects that disable for user to delete exisiting annotation + // via desiredAnnotation. but we belived this case is not so famous so we chose this solution + for k, v := range obj.Spec.DesiredNodeAnnotations { + toUpdate.Annotations[k] = v
  35. Reporting/Proposing to OSS Community

  36. Summary of troubleshooting One of agents failed to connect to

    rancher server Rancher Agent itself? Kube-dns? Kubernetes? Rancher Server? => Need to read code Flannel? => Need to read code Don’t stop to investigate/dive into problem until you understood root cause => There was chance to stop to dive into and Do just workaround for now • Kube DNS problem?: As some internet information described, If we deployed kube dns on all nodes, this problem seems be hidden • Flannel problem?: If we just thought flannel annotation get disappeared for trivial reason and manually fixed annotation, this problem seems be hidden.
  37. Essence of Operation for complicated system • Be aware of

    the responsibility of each software • Where problem happened is not always where the root cause is • Don’t stop investigating until you find root cause
  38. Community Contribution Pull Requests • Rancher (8件) ◦ https://github.com/rancher/norman/pull/201 ◦

    https://github.com/rancher/norman/pull/202 ◦ https://github.com/rancher/norman/pull/203 ◦ https://github.com/rancher/machine/pull/12 ◦ https://github.com/rancher/types/pull/525 ◦ https://github.com/rancher/rancher/pull/16044 ◦ https://github.com/rancher/rancher/pull/15991 ◦ https://github.com/rancher/rancher/pull/15909 • Ingress-Nginx (1件) ◦ https://github.com/kubernetes/ingress-nginx/pull/3270 Rancher Source Code Deep Dive https://www.slideshare.net/linecorp/lets-unbox-rancher-20-v200
  39. We Are Hiring • 2 offices , 4 members ◦

    Tokyo: 3 ◦ Kyoto: 1 ◦ Taiwan: waiting for you • English
  40. None