How We build Kubernetes service by Rancher in LINE

How We Build k8s Service By Rancher In LINE LINE
corporation Verda team Feixiang Li

About me • Name: Luke • 2011 ~ 2015 Devops
engineer at Rakuten • 2016/01 ~ Cloud engineer at LINE • Baremetal • k8s

LINE Private Cloud • IaaS • VM • Baremetal •
Managed Service • Database as service • Redis as service • Etc. • Multiple Regions • 20000 VM, 8000 Baremetal Tokyo Osaka Singapore

Why Do We Need KaaS • Many teams are using
their own k8s on LINE cloud • K8s isn’t easy to use • Deploy: 1.create servers 2. docker 3. kubeadm(or xxx) 4. etc. • Management: logging, monitoring, upgrade etc. • Recovery: etcd backup, restore etc. • Etc. • LINE cloud/infrastructure policy • DB ACL • Global IP • Use other cloud services • Etc.

Goal • Provide stable k8s service • Release users from
deployment, upgrade, management etc. • Make k8s easier to use • Reduce learning cost • Provide service template/guide (like how to build database service on k8s) • Consult/support etc. • Integration with LINE cloud services • Take care of dependency, configuration etc. • Make it easy to use other cloud service like Redis, object storage etc. LINE GKE

How To Do From Scratch K8s Upgrade Auto Provision Etcd
backup High Availability Recovery …… Use OSS OSS controller OSS

Ideal vs Reality Web Application Engineer: 3? Kubernetes Engineer: 10?
Etcd Engineer: 3? Engineers: 2

Rancher • OpenStack support • No dependency on specific software
• Already used by some teams on production • Active community • Architecture(k8s operator pattern)

Rancher 2.X • OSS to create and manage multiple k8s
clusters • Use k8s operator pattern for implementation API Controller ClusterA Watch Reconcile Get latest information from kube- apiserver Check if any difference Between desired and actual states Do something to make actual state desired Reconcile Loop Kubernetes Cluster Cluster Agent Node Agent Create

How We Do API Server • Works as proxy •
Integrate with private cloud • Limit Rancher function • Support multiple ranchers K8s provider Support multiple providers User k8s cluster • VM is created by OpenStack • K8s cluster is deployed by rancher

LINE KaaS • 2018/06 ~ 2018/10 Development by 2 developers
• 2018/11 Released on development environment • 47 clusters 472 nodes

Monitoring System Got Alert

What’s happening? API Server Kubernetes Cluster Cluster Agent Kubernetes Cluster
Cluster Agent WebSocket WebSocket Failed to establish WebSocket session

Check log of cluster-agent $ kubectl logs -f cattle-cluster-agent-df7f69b68-s7mqg -n
cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f2472cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-df7f69b68-s7mqg CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local ERROR: https://rancher.com/ping is not accessible (Could not resolve host: rancher.com) Somehow this container seems failed to resolve domain name. $ kubectl exec -it cattle-cluster-agent-df7f69b68-s7mqg -n cattle-system bash root@cattle-cluster-agent-df7f69b68-s7mqg:/# cat /etc/resolv.conf nameserver 172.19.0.10 #=> IP from Kubernetes Network search cattle-system.svc.cluster.local svc.cluster.local cluster.local $ kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) kube-dns ClusterIP 172.19.0.10 <none> 53/UDP,53/TCP

Kube DNS problem? $ kubectl logs -f cattle-node-agent-c5ddm -n cattle-system
INFO: https://rancher.com/ping is accessible INFO: Value from https://rancher.com/v3/settings/cacerts is an x509 certificate time="2019-02-26T21:30:21Z" level=info msg="Rancher agent version a9164915-dirty is starting" time="2019-02-26T21:30:21Z" level=info msg="Listening on /tmp/log.sock" time="2019-02-26T21:30:21Z" level=info msg="Option customConfig=map[address:XXXX internalAddress: roles:[] label:map[]]" time="2019-02-26T21:30:21Z" level=info msg="Option etcd=false" time="2019-02-26T21:30:21Z" level=info msg="Option controlPlane=false" time="2019-02-26T21:30:21Z" level=info msg="Option worker=false" time="2019-02-26T21:30:21Z" level=info msg="Option requestedHostname=yuki-testw1" time="2019-02-26T21:30:21Z" level=info msg="Connecting to wss://rancher.com/v3/connect with token" time="2019-02-26T21:30:21Z" level=info msg="Connecting to proxy" url="wss://rancher.com/v3/connect" time="2019-02-26T21:30:21Z" level=info msg="Starting plan monitor" It’s able to access from a container which isn’t using kube dns. kubectl exec -it cattle-node-agent-c5ddm -n cattle-system bash root@yuki-testc3:/# cat /etc/resolv.conf nameserver 8.8.8.8 • Something wrong with kube dns • Something wrong with that container

Check kube-dns kube dns seems no problem ◦ No error
in kube dns log ◦ Another container which is using kube-dns can resolve dns $ kubectl logs -l k8s-app=kube-dns -c kubedns -n kube- system|grep '^E' $ # no error detected $ kubectl run -it busybox --image busybox -- sh / # nslookup google.com Server: 172.19.0.10 Address: 172.19.0.10:53 Non-authoritative answer: Name: google.com Address: 216.58.196.238

Container with problem • Container itself ◦ container image ◦
container network policy etc. • Network ◦ nodes ◦ container network Node1 Container network eth0 Node2 Container network

Deploy the container on another node $ kubectl logs -f
cattle-cluster-agent-5d859cbb48-xb77r -n cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f24 72cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=c cattle-cluster-agent- 5d859cbb48-xb77r CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local INFO: https://rancher.com/ping is accessible Kubernetes Cluster Agent Cluster Agent node1 node2 Container itself seems no problem.

Check the node Node1 Node2 busybox busybox $ kubectl exec
-it busybox2 sh / # ping 172.18.9.3 -w1 2 packets transmitted, 0 packets received, 100% packet loss $ tcpdump -i eth0 port 8472 and src host <node 1> … # nothing output Container network has problem $ ping <node2> PING <> (<ip>): 56 data bytes 64 bytes from <ip>: icmp_seq=0 ttl=58 time=3.571 ms

Who is responsible for container network • Building Container Network
is just pre-condition of Kubernetes • It is supposed to be done outside of Kubernetes

Look into container network What we use • Flannel which
is used to connect Linux Containers • Flannel support multiple backends like vxlan, ipip…. Responsibility of Flannel in our case • Configure Linux kernel to create termination device of overlay network • Configure Linux kernel to route, bridge, related sub-system making container connected to other container • This software is not used to forward actual packet, just configuration

How flannel works Pod A eth0: 172.17.1.2/24 cni0: 172.17.1.1/24 flannel.1:
172.17.1.0/32 eth0: 10.0.0.1/24 $ ip n 172.17.2.0 flannel.1 lladdr cc.cc.cc.cc.cc.cc $bridge fdb show dev flannel.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.2 Pod B eth0: 172.17.2.2/24 cni0: 172.17.2.1/24 flannel.1: 172.17.2.0/32 mac: cc.cc.cc.cc.cc.cc eth0: 10.0.0.2/24 • Ping Pod B from Pod A • Pod subnet: 172.xx • Host subnet: 10.xx 1 route, 1 arp entry, 1 fdb entry per host ip r 172.17.2.0/24 via 172.17.2.0 dev flannel.1onlink

Routing table 172.17.1.0/24 via 172.17.1.0 dev flannel.1 172.17.2.0/24 dev cni
172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.1.0 flannel.1 lladdr aa:aa:aa:aa:aa:aa 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB aa.aa.aa.aa.aa.aa dev flannel.1 dst 10.0.0.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Routing table 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 dev cni ARP cache 172.17.2.0 flannel.1 lladdr bb.bb.bb.bb.bb.bb FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 Check network configuration Routing table 172.17.1.0/24 dev cni 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.2.0 flannel.1 lladdr bb:bb:bb:bb:bb:bb 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Node1 172.17.1.0/24 Node2 172.17.2.0/24 Node3 172.17.2.0/24 Node1 related information is missing

Flannel problem? Check flannel agent log of target node, found
nothing $ kubectl logs kube-flannel-knwd7 -n kube-system kube-flannel | grep -v ‘^E'

Look into flannel $ kubectl get node yuki-testc1 -o yaml
apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"9a:4f:ef:9c:2e:2f"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.1 flannel agent will • save node specific metadata into k8s node annotation when flannel start • setup route, fdb, arp cache if there’s a node with flannel annotations All Nodes should have these annotation

Check k8s node annotation apiVersion: v1 kind: Node Metadata: Annotations:
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 Node2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 Node3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Node1 Missing Flannel related annotation flannel running on node3 could not configure for node1 because there’s no annotation ⇒why node1 doesn’t have flannel related annotation? ⇒why node2 has node1 network information?

Annotation is changed by someone else apiVersion: v1 kind: Node
Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Rancher also updates annotation

How rancher works When Rancher build Kubernetes Nodes 1. Gets
current node annotation 2. Build desired annotation 3. Get node resource 4. Replace annotation with desired one 5. Update node with desired annotation Function A Function B This operation is NOT Atomic. It ignores Optimistic Locking.

Look back on what happened Node1 is up Flannel on
node1 update node annotation Flannel on node2 add configuration of node1 Get node1 annotations Node3 is up override node1 annotations

Write patch and Confirmed it works diff --git a/pkg/controllers/user/nodesyncer/nodessyncer.go b/pkg/controllers/user/nodesyncer/nodessyncer.go
index 11cc9c4e..64526ccf 100644 --- a/pkg/controllers/user/nodesyncer/nodessyncer.go +++ b/pkg/controllers/user/nodesyncer/nodessyncer.go @@ -143,7 +143,19 @@ func (m *NodesSyncer) syncLabels(key string, obj *v3.Node) error { toUpdate.Labels = obj.Spec.DesiredNodeLabels } if updateAnnotations { - toUpdate.Annotations = obj.Spec.DesiredNodeAnnotations + // NOTE: This is just workaround. + // There are multiple solutions to solve the problem of https://github.com/rancher/rancher/issues/13644 + // and this problem is kind of desigin bugs. So solving the root cause of problem need design decistions. + // That's why for now we solved the problem by the soltion which don't have to change many places. Because + // We don't wanna create/maintain large change which have high possibility not to be merged in upstream + // Rancher Community tend to hesitate to merge big change created by engineer not from Rancher Lab + + // The solution is to change NodeSyncer so as not to replace annotation with desiredAnnotations but just update annotations which + // is specified in desiredAnnotation. This change have side-effects that disable for user to delete exisiting annotation + // via desiredAnnotation. but we belived this case is not so famous so we chose this solution + for k, v := range obj.Spec.DesiredNodeAnnotations { + toUpdate.Annotations[k] = v

Reporting/Proposing to OSS Community

Summary of troubleshooting One of agents failed to connect to
rancher server Rancher Agent itself? Kube-dns? Kubernetes? Rancher Server? => Need to read code Flannel? => Need to read code Don’t stop to investigate/dive into problem until you understood root cause => There was chance to stop to dive into and Do just workaround for now • Kube DNS problem?: As some internet information described, If we deployed kube dns on all nodes, this problem seems be hidden • Flannel problem?: If we just thought flannel annotation get disappeared for trivial reason and manually fixed annotation, this problem seems be hidden.

Essence of Operation for complicated system • Be aware of
the responsibility of each software • Where problem happened is not always where the root cause is • Don’t stop investigating until you find root cause

Community Contribution Pull Requests • Rancher (8件) ◦ https://github.com/rancher/norman/pull/201 ◦
https://github.com/rancher/norman/pull/202 ◦ https://github.com/rancher/norman/pull/203 ◦ https://github.com/rancher/machine/pull/12 ◦ https://github.com/rancher/types/pull/525 ◦ https://github.com/rancher/rancher/pull/16044 ◦ https://github.com/rancher/rancher/pull/15991 ◦ https://github.com/rancher/rancher/pull/15909 • Ingress-Nginx (1件) ◦ https://github.com/kubernetes/ingress-nginx/pull/3270 Rancher Source Code Deep Dive https://www.slideshare.net/linecorp/lets-unbox-rancher-20-v200

We Are Hiring • 2 offices , 4 members ◦
Tokyo: 3 ◦ Kyoto: 1 ◦ Taiwan: waiting for you • English

How We build Kubernetes service by Rancher in LINE

How We build Kubernetes service by Rancher in LINE

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript