their own k8s on LINE cloud • K8s isn’t easy to use • Deploy: 1.create servers 2. docker 3. kubeadm(or xxx) 4. etc. • Management: logging, monitoring, upgrade etc. • Recovery: etcd backup, restore etc. • Etc. • LINE cloud/infrastructure policy • DB ACL • Global IP • Use other cloud services • Etc.
deployment, upgrade, management etc. • Make k8s easier to use • Reduce learning cost • Provide service template/guide (like how to build database service on k8s) • Consult/support etc. • Integration with LINE cloud services • Take care of dependency, configuration etc. • Make it easy to use other cloud service like Redis, object storage etc. LINE GKE
clusters • Use k8s operator pattern for implementation API Controller ClusterA Watch Reconcile Get latest information from kube- apiserver Check if any difference Between desired and actual states Do something to make actual state desired Reconcile Loop Kubernetes Cluster Cluster Agent Node Agent Create
Integrate with private cloud • Limit Rancher function • Support multiple ranchers K8s provider Support multiple providers User k8s cluster • VM is created by OpenStack • K8s cluster is deployed by rancher
INFO: https://rancher.com/ping is accessible INFO: Value from https://rancher.com/v3/settings/cacerts is an x509 certificate time="2019-02-26T21:30:21Z" level=info msg="Rancher agent version a9164915-dirty is starting" time="2019-02-26T21:30:21Z" level=info msg="Listening on /tmp/log.sock" time="2019-02-26T21:30:21Z" level=info msg="Option customConfig=map[address:XXXX internalAddress: roles:[] label:map[]]" time="2019-02-26T21:30:21Z" level=info msg="Option etcd=false" time="2019-02-26T21:30:21Z" level=info msg="Option controlPlane=false" time="2019-02-26T21:30:21Z" level=info msg="Option worker=false" time="2019-02-26T21:30:21Z" level=info msg="Option requestedHostname=yuki-testw1" time="2019-02-26T21:30:21Z" level=info msg="Connecting to wss://rancher.com/v3/connect with token" time="2019-02-26T21:30:21Z" level=info msg="Connecting to proxy" url="wss://rancher.com/v3/connect" time="2019-02-26T21:30:21Z" level=info msg="Starting plan monitor" It’s able to access from a container which isn’t using kube dns. kubectl exec -it cattle-node-agent-c5ddm -n cattle-system bash root@yuki-testc3:/# cat /etc/resolv.conf nameserver 8.8.8.8 • Something wrong with kube dns • Something wrong with that container
in kube dns log ◦ Another container which is using kube-dns can resolve dns $ kubectl logs -l k8s-app=kube-dns -c kubedns -n kube- system|grep '^E' $ # no error detected $ kubectl run -it busybox --image busybox -- sh / # nslookup google.com Server: 172.19.0.10 Address: 172.19.0.10:53 Non-authoritative answer: Name: google.com Address: 216.58.196.238
is used to connect Linux Containers • Flannel support multiple backends like vxlan, ipip…. Responsibility of Flannel in our case • Configure Linux kernel to create termination device of overlay network • Configure Linux kernel to route, bridge, related sub-system making container connected to other container • This software is not used to forward actual packet, just configuration
172.17.1.0/32 eth0: 10.0.0.1/24 $ ip n 172.17.2.0 flannel.1 lladdr cc.cc.cc.cc.cc.cc $bridge fdb show dev flannel.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.2 Pod B eth0: 172.17.2.2/24 cni0: 172.17.2.1/24 flannel.1: 172.17.2.0/32 mac: cc.cc.cc.cc.cc.cc eth0: 10.0.0.2/24 • Ping Pod B from Pod A • Pod subnet: 172.xx • Host subnet: 10.xx 1 route, 1 arp entry, 1 fdb entry per host ip r 172.17.2.0/24 via 172.17.2.0 dev flannel.1onlink
apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"9a:4f:ef:9c:2e:2f"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.1 flannel agent will • save node specific metadata into k8s node annotation when flannel start • setup route, fdb, arp cache if there’s a node with flannel annotations All Nodes should have these annotation
current node annotation 2. Build desired annotation 3. Get node resource 4. Replace annotation with desired one 5. Update node with desired annotation Function A Function B This operation is NOT Atomic. It ignores Optimistic Locking.
index 11cc9c4e..64526ccf 100644 --- a/pkg/controllers/user/nodesyncer/nodessyncer.go +++ b/pkg/controllers/user/nodesyncer/nodessyncer.go @@ -143,7 +143,19 @@ func (m *NodesSyncer) syncLabels(key string, obj *v3.Node) error { toUpdate.Labels = obj.Spec.DesiredNodeLabels } if updateAnnotations { - toUpdate.Annotations = obj.Spec.DesiredNodeAnnotations + // NOTE: This is just workaround. + // There are multiple solutions to solve the problem of https://github.com/rancher/rancher/issues/13644 + // and this problem is kind of desigin bugs. So solving the root cause of problem need design decistions. + // That's why for now we solved the problem by the soltion which don't have to change many places. Because + // We don't wanna create/maintain large change which have high possibility not to be merged in upstream + // Rancher Community tend to hesitate to merge big change created by engineer not from Rancher Lab + + // The solution is to change NodeSyncer so as not to replace annotation with desiredAnnotations but just update annotations which + // is specified in desiredAnnotation. This change have side-effects that disable for user to delete exisiting annotation + // via desiredAnnotation. but we belived this case is not so famous so we chose this solution + for k, v := range obj.Spec.DesiredNodeAnnotations { + toUpdate.Annotations[k] = v
rancher server Rancher Agent itself? Kube-dns? Kubernetes? Rancher Server? => Need to read code Flannel? => Need to read code Don’t stop to investigate/dive into problem until you understood root cause => There was chance to stop to dive into and Do just workaround for now • Kube DNS problem?: As some internet information described, If we deployed kube dns on all nodes, this problem seems be hidden • Flannel problem?: If we just thought flannel annotation get disappeared for trivial reason and manually fixed annotation, this problem seems be hidden.
the responsibility of each software • Where problem happened is not always where the root cause is • Don’t stop investigating until you find root cause