OpenStack, Kubernetes How we should face them / LINE Campus Talk in Hong Kong by Yuki Nishiwaki

OpenStack, Kubernetes How we should face them LINE Corp Yuki
Nishiwaki

About Me • Name : Yuki Nishiwaki • Title :
Private Cloud Platform Team Lead in LINE • Experience : ◦ OSS Contribution: ▪ rancher/rancher, kubernetes/ingress-nginx, coreos/etcd-operator, openstack/neutron ◦ Presentation: ▪ Japan Container Days v18.12 Keynote (Future of LINE CaaS Platform) ▪ Japan Container Days v18.12 (How we can develop managed k8s service with Rancher) ▪ OpenStack Summit 2018 Vancouver (Excitingly simple multi-path OpenStack Network) ▪ OpenStack Summit 2016 Austin (Swift Private Endpoint) ▪ OpenStack Summit 2015 Tokyo (Automate Deployment & Benchmark) ▪ ….

What is (Private) Cloud ? Kubernetes Cluster VM Load balancer
Infrastructure Resources API or GUI “Provide the controllability of infrastructure resource via API/GUI”

Private Cloud Platform Team? Responsibility - Develop/Maintain Common/Fundamental Function for
Private Cloud (IaaS) - OpenStack based Private Cloud - Managed Kubernetes Service Network Service Operation Platform Storage Maintain/Develop Private Cloud

LINE Private Cloud Cloud Service Catalog OpenStack VM (Nova) Image
Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Platform Service Network Storage

OpenStack ? MicroServices Architecture Pros: We can just deploy what
we need Cons: Operation Cost for multiple different processes

VM Creation in OpenStack? nova-api nova-scheduler nova-conductor neutron-server nova-compute neutron-agent
neutron-dhcp-agent glance-api Libvirt VM dnsmasq 1. VM Create API request 2. Ask to do VM creation 3. Decide which host to use 4. Ask to create VM 5. Download Image 6. Create Port 9. Detect tap device 10. Get Port Detail 7. Update dhcp 8. Configure dhcp 11. Configure bridge,tap 12. Ask to create VM 13. Create VM 14. DHCP provide

Kubernetes ?

Container Orchestrating? Node1 Node2 Node3 Web Service DevOps Team Load
balancer 1. Understand which container running on which node 2. Update container image one by one to reduce down time 3. Configure Loadbalancer to distribute traffic for multiple nodes 4. If container get dead, re-create container 5. If traffic get increased dramatically need to add new docker node, container 6. To share some secret data with container, setup NFS or distribute it to all nodes What to do Skip if audience know about kubernetes

Container Orchestrating with Kubernetes Node1 Node2 Node3 Load balancer Just
Ask, Check! Skip if audience know about kubernetes 1. Understand which container running on which node 2. Update container image one by one to reduce down time 3. Configure Loadbalancer to distribute traffic for multiple nodes 4. If container get dead, re-create container 5. If traffic get increased dramatically need to add new docker node, container 6. To share some secret data with container, setup NFS or distribute it to all nodes What to do

Difficulty of DevOps for Kubernetes, OpenStack 1. 1 Deployment include
“Networking”, “Virtualization”, “Storage”.... 1. 1 Deployment include “Networking”, “Virtualization”, “Storage”.... 2. Composed of Multiple Process / Node 1. 1 Deployment include “Networking”, “Virtualization”, “Storage”.... 2. Composed of Multiple Process / Node 3. Many Dependent OSS

Many Dependent OSS ? • Keystone (User Controller) • Nova
(VM Controller) ◦ More than 8 different processes • Neutron (Networking Controller) ◦ More than 4 different processes • Glance (Image Service) ◦ More than 2 different processes • Designate (DHCP Controller) ◦ More than 4 processes • Dnsmasq (DHCP Service) • Libvirt (VM Monitor) • RabbitMQ (Messaging Bus) • Qemu (Hardware Emulator) OpenStack • Kubernetes ◦ More than 5 different processes • Flannel (Networking Controller) • Rancher (Managing Kubernetes Software) ◦ More than 3 different processes • Docker • Docker Machine • Etcd (KVS) Managed Kubernetes

We need to walk through…. • Keystone (User Controller) •
Nova (VM Controller) ◦ More than 8 different processes • Neutron (Networking Controller) ◦ More than 4 different processes • Glance (Image Service) ◦ More than 2 different processes • Designate (DHCP Controller) ◦ More than 4 processes • Dnsmasq (DHCP Service) • Libvirt (VM Monitor) • RabbitMQ (Messaging Bus) • Qemu (Hardware Emulator) OpenStack • Kubernetes ◦ More than 5 different processes • Flannel (Networking Controller) • Rancher (Managing Kubernetes Software) ◦ More than 3 different processes • Docker • Docker Machine • Etcd (KVS) Managed Kubernetes +0.1M line +1.5M line +1.5M line +0.09M line +0.07M line +0.03M line +0.6M line +3-4M line +1M line +0.01M line +0.1M line +2-3M line Require Reading Code Require Reading Code +1M line +0.03M line

Essence of Good Operation for oss distributed system 1. Read
Code until you understood, Don’t believe just document, bug report. 1. Read Code until you understood, Don’t believe just document, bug report. 2. Grasp internal state of software/process running 1. Read Code until you understood, Don’t believe just document, bug report. 2. Grasp internal state of software/process running 3. Understand where problem happened is not always where there is root cause. 1. Read Code until you understood, Don’t believe just document, bug report. 2. Grasp internal state of software/process running 3. Understand where problem happened is not always where there is root cause. 4. Understand which software have what responsibility 1. Read Code until you understood, Don’t believe just document, bug report. 2. Grasp internal state of software/process running 3. Understand where problem happened is not always where there is root cause. 4. Understand which software have what responsibility 5. Don’t stop to investigate/dive into problem until you understood root cause

Example of Essence: Reading Code not just Document 1. Read
Code until you understood, Don’t believe just document Architecture/Design Understanding - Documenting is not catching up with Coding - To know more about risk in operation - To improve OSS (not always perfect to us) Bug or Weird Behaviour - Don’t rely on just google, Do effort to solve it by ourselves - Discussion/Communication inside team is based on code.

Code Reading Related Activity... Code Contribution (Only Merged) • OpenStack
◦ openstack/neutron (1) • Kubernetes Related ◦ coreos/etcd-operator (1) ◦ kubernetes/ingress-controller (1) ◦ rancher/norman (3) ◦ rancher/types (1) ◦ docker/machine (1) Code Reading Document • https://github.com/ukinau/rancher-analyse • https://www.slideshare.net/linecorp/lets-unbox-rancher-20-v200 We are user for OSS But at the same time, We are developer for OSS 1. Read Code until you understood, Don’t believe just document

Example of Essence: Grasp Internal State of process All of
Software have internal states like... • Dnsmasq-dhcp (DHCP Server) ◦ The number of DHCP entry ◦ How many times DHCP NACK to be issued ◦ ... • Nginx (WEB Server) ◦ Average Request Processing Time ◦ Average Number of Request ◦ How many requests are pending ◦ …. Dnsmasq-dhcp • 3 entries of dhcp • 1000 times DHCP NACK issued Nginx • 30msec average req processing time • 10k req in min • 200 req are pending (backlog queue) 2. Grasp Internal State of Software/Process running

Help us detect small problem before becoming big Dnsmasq-dhcp •
3 entries of dhcp • 1000 times DHCP NACK issued Nginx • 30msec average req processing time • 10k req in min • 200 req are pending (backlog queue) There seems be client to send unexpected DHCP REQUEST. => Need to review configuration / troubleshoot The number of worker is not enough for actual requests. => Need to run more nginx or Increase worker 2. Grasp Internal State of Software/Process running

OSS is not always operator friendly... Access to “https://<rancher-server>/metrics” Rancher
v2.0.8 doesn’t export any internal metrics 2. Grasp Internal State of Software/Process running

Contribute OSS to expose Internal State 2. Grasp Internal State
of Software/Process running

Confirm we can access internal metrics After our patch We
can now visualize internal state 2. Grasp Internal State of Software/Process running

Example of Essence: Grasp Software Responsibility 4. Understand which software
have what responsibility Haproxy (Layer 7 LB) Application Server Application Server Database

When some clients get something error from system 4. Understand
which software have what responsibility Haproxy (Layer 7 LB) Application Server Application Server Database Client ERROR: Failed to establish TCP connection

If we understood the responsibility for each software 4. Understand
which software have what responsibility Haproxy (Layer 7 LB) Application Server Application Server Database Client ERROR: Failed to establish TCP connection Responsibility - Establish TCP connection with client - Distribute HTTP request into different application servers

If we didn’t understand responsibility…. Haproxy (Layer 7 LB) Application
Server Application Server Database Client ERROR: Failed to establish TCP connection DevOps Team Oh, Something is happening. Check Check everything !!!! which need 1 day...2 day.... 3day... 4. Understand which software have what responsibility

OpenStack, Kubernetes is more complicated than... 4. Understand which software
have what responsibility What if this process get down? How this process affect others? Understanding of responsibility is more important than usual web system

How usually troubleshoot/bug fixes?

Monitoring System Detect Something Happened

What’s happening? Kubernetes Cluster Server Cluster Agent Kubernetes Cluster Cluster
Agent Failed to establish websocket session Websocket Websocket

Check log of Cluster Agent $ kubectl logs -f cattle-cluster-agent-df7f69b68-s7mqg
-n cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f2472cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-df7f69b68-s7mqg CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local ERROR: https://rancher.com/ping is not accessible (Could not resolve host: rancher.com) Somehow this container seems failed to resolve domain name. Kubernetes Cluster Cluster Agent

Check log of Cluster Agent $ kubectl logs -f cattle-cluster-agent-df7f69b68-s7mqg
-n cattle-system INFO: Environment: CATTLE_ADDRESS=172.18.6.6 CATTLE_CA_CHECKSUM=8b791af7a1dd5f28ca19f8dd689bb816d399ed02753f2472cf25d1eea5c20be1 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-df7f69b68-s7mqg CATTLE_SERVER=https://rancher.com INFO: Using resolv.conf: nameserver 172.19.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local ERROR: https://rancher.com/ping is not accessible (Could not resolve host: rancher.com) Somehow this container seems failed to resolve domain name. Kubernetes Cluster Cluster Agent $ kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) kube-dns ClusterIP 172.19.0.10 <none> 53/UDP,53/TCP

Kube DNS Problem? $ kubectl logs kube-dns-5ccb66df65-dhrqx -n kube-system kubedns
| grep '^E' $ # no error detected Kubernetes Cluster Cluster Agent Kube DNS Busybox $ kubectl run -it busybox --image busybox -- sh / # nslookup google.com Server: 172.19.0.10 Address: 172.19.0.10:53 Non-authoritative answer: Name: google.com Address: 216.58.196.238 Check log of Kube DNS Check if other container can resolve dns

Only Container running on Node1 failed to resolve Kubernetes Cluster
Node1 Node2 Node3 Cluster Agent BusyBox Kube DNS

What about network connectivity between Node1, 3? Node1 Node2 Node3
$ tcpdump -i eth0 port 8472 and src host <node 1> … # nothing output $ tcpdump -i eth0 port 8472 and src host <node 2> 23:11:32.017389 IP 10.0.0.2.56312 > 10.0.0.3.otv: OTV, flags [I] (0x08), overlay 0, instance 1 IP 172.18.5.7 > 172.18.6.5: ICMP echo request, id 15872, seq 273, length 64 eth0 VXLAN UDP Port eth0 eth0 dstIP: Node3 dstPort: 8472 srcIP: Node2 srcPort: 56312 Container Ether frame vxlan header VXLAN Overlay Network dstIP: Node3 dstPort: 8472 srcIP: Node1 srcPort: 56312 Container Ether frame vxlan header IP Header UDP Header VXLAN Header Container Ether Frame Missing

Who is responsible for Container Network? • Building Container Network
is just pre-condition of Kubernetes • It is supposed to be done outside of Kubernetes

What we use? What we use? • Flannel which is
used to connect Linux Containers • Flannel support multiple backends like vxlan, ipip…. Responsibility of Flannel in our case? • Configure Linux kernel to create termination device of overlay network • Configure Linux kernel to route, bridge, related sub-system

Understand how flannel configure linux networking zz:zz:zz:zz:zz:zz eth0: 10.0.0.1/24 Flannel.1:
172.17.1.0/32 cni: 172.17.1.1/24 yy:yy:yy:yy:yy:yy Pod A xx:xx:xx:xx:xx:xx eth0: 172.17.1.2/24 cc:cc:cc:cc:cc:cc eth0: 10.0.0.2/24 Flannel.1: 172.17.2.0/32 cni: 172.17.2.1/24 bb:bb:bb:bb:bb:bb Pod B aa:aa:aa:aa:aa:aa eth0: 172.17.2.2/24 $ip r 172.17.2.0/24 via 172.17.2.0 dev flannel.1 $ip n 172.17.2.0 flannel.1 lladdr cc.cc.cc.cc.cc.cc $bridge fdb show dev flannel.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.2 Configure 1 route, 1 arp entry, 1 fdb entry per host route arp entry fdb entry

Check Linux Network Configuration Routing table 172.17.1.0/24 via 172.17.1.0 dev
flannel.1 172.17.2.0/24 dev cni 172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.1.0 flannel.1 lladdr aa:aa:aa:aa:aa:aa 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB aa.aa.aa.aa.aa.aa dev flannel.1 dst 10.0.0.1 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Routing table 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 dev cni ARP cache 172.17.2.0 flannel.1 lladdr bb.bb.bb.bb.bb.bb FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 Routing table 172.17.1.0/24 dev cni 172.17.2.0/24 via 172.17.2.0 dev flannel.1 172.17.3.0/24 via 172.17.3.0 dev flannel.1 ARP cache 172.17.2.0 flannel.1 lladdr bb:bb:bb:bb:bb:bb 172.17.3.0 flannel.1 lladdr cc:cc:cc:cc:cc:cc FDB bb.bb.bb.bb.bb.bb dev flannel.1 dst 10.0.0.2 cc.cc.cc.cc.cc.cc dev flannel.1 dst 10.0.0.3 Node1 Node2 Node3 Node1 related information has been missing • 172.17.1.0/24 via 172.17.1.0 dev flannel.1 • 172.17.1.0 flannel.1 lladdr aa:aa:aa:aa:aa:aa • Aa.aa.aa.aa.aa.aa dev flannel.1 dst 10.0.0.1

Flannel having something problem? $ kubectl logs kube-flannel-knwd7 -n kube-system
kube-flannel | grep -v '^I' # exclude info log $ Node1 Node2 Node3 eth0 eth0 eth0 Flannel Flannel Flannel But There is no error log….

Reading Code / Understand How flannel works deeply • Store
node specific metadata into k8s node annotation when flannel start • flannel setup route, fdb, arp cache if there is a node with flannel annotation Flannel Agent will $ kubectl get node yuki-testc1 -o yaml apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"9a:4f:ef:9c:2e:2f"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.1 All Nodes should have these annotation

Check kubernetes node annotation apiVersion: v1 kind: Node Metadata: Annotations:
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Node1 Node2 Node3 Missing Flannel related annotation flannel running on node3 could not configure for node1 because there’s no annotation => why node1 doesn’t have flannel related annotation? => why node2 has node1 network information?

Annotation has been changed by someone else... apiVersion: v1 kind:
Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"bb:bb:bb:bb:bb:bb"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.2 rke.cattle.io/external-ip: 10.0.0.2 apiVersion: v1 kind: Node Metadata: Annotations: flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"cc:cc:cc:cc:cc:cc"}' flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: "true" flannel.alpha.coreos.com/public-ip: 10.0.0.3 rke.cattle.io/external-ip: 10.0.0.3 apiVersion: v1 kind: Node Metadata: Annotations: rke.cattle.io/external-ip: 10.0.0.1 Node1 Node2 Node3 Server Rancher Server also update kubernetes node annotations => Why flannel annotations on only node1 has gone...

Reading Code / Understand How rancher works When Rancher build
Kubernetes Nodes 1. Gets current node annotation 2. Build desired annotation 3. Get node resource 4. Replace annotation with desired one 5. Update node with desired annotation FunctionA FunctionB Few sec interval This logic ignore Optimistic Locking

How this logic cause problem? Rancher update annotation Server Node2
is up Node2 is booting Flannel update annotations Rancher update annotation Server Node1 is up Node1 is booting Flannel update annotations Create Node 2 Update Annotation Rancher’s Annotation Rancher’s Annotation Flannel’s Annotation Flannel’s Annotation Rancher’s Annotation Update Annotation Create Node 1 Drop existing annotation because of logic

Write patch and Confirmed it works diff --git a/pkg/controllers/user/nodesyncer/nodessyncer.go b/pkg/controllers/user/nodesyncer/nodessyncer.go
index 11cc9c4e..64526ccf 100644 --- a/pkg/controllers/user/nodesyncer/nodessyncer.go +++ b/pkg/controllers/user/nodesyncer/nodessyncer.go @@ -143,7 +143,19 @@ func (m *NodesSyncer) syncLabels(key string, obj *v3.Node) error { toUpdate.Labels = obj.Spec.DesiredNodeLabels } if updateAnnotations { - toUpdate.Annotations = obj.Spec.DesiredNodeAnnotations + // NOTE: This is just workaround. + // There are multiple solutions to solve the problem of https://github.com/rancher/rancher/issues/13644 + // and this problem is kind of desigin bugs. So solving the root cause of problem need design decistions. + // That's why for now we solved the problem by the soltion which don't have to change many places. Because + // We don't wanna create/maintain large change which have high possibility not to be merged in upstream + // Rancher Community tend to hesitate to merge big change created by engineer not from Rancher Lab + + // The solution is to change NodeSyncer so as not to replace annotation with desiredAnnotations but just update annotations which + // is specified in desiredAnnotation. This change have side-effects that disable for user to delete exisiting annotation + // via desiredAnnotation. but we belived this case is not so famous so we chose this solution + for k, v := range obj.Spec.DesiredNodeAnnotations { + toUpdate.Annotations[k] = v

Reporting/Proposing to OSS Community

Look back of troubleshooting One of agents failed to connect
to rancher server First suspect Rancher Agent itself? Second suspect Kube-dns? Third suspect Kubernetes? Fifth suspect Rancher Server? => Need to read code Fourth suspect Flannel? => Need to read code What we observed

to rancher server First suspect Rancher Agent itself? Second suspect Kube-dns? Third suspect Kubernetes? Fifth suspect Rancher Server? => Need to read code Fourth suspect Flannel? => Need to read code What we observed 3. Understand where problem happened is not always where there is root cause

to rancher server First suspect Rancher Agent itself? Second suspect Kube-dns? Third suspect Kubernetes? Fifth suspect Rancher Server? => Need to read code Fourth suspect Flannel? => Need to read code What we observed 3. Understand where problem happened is not always where there is root cause 4. Understand which software have what responsibility

to rancher server First suspect Rancher Agent itself? Second suspect Kube-dns? Third suspect Kubernetes? Fifth suspect Rancher Server? => Need to read code Fourth suspect Flannel? => Need to read code What we observed 3. Understand where problem happened is not always where there is root cause 4. Understand which software have what responsibility 5. Don’t stop to investigate/dive into problem until you understood root cause => There was chance to stop to dive into and Do just workaround for now • Second Suspect (Kube DNS): As some internet information described, If we deployed kube dns on all nodes, this problem seems be hidden • Fourth Suspect (Flannel): If we just thought flannel annotation get disappeared for trivial reason and manually fixed annotation, this problem seems be hidden.

Essence of Good Operation for oss distributed system 1. Read
Code until you understood, Don’t believe just document, bug report. 2. Grasp internal state of software/process running 3. Understand where problem happened is not always where there is root cause. 4. Understand which software have what responsibility 5. Don’t stop to investigate/dive into problem until you understood root cause Remind:

Being developer in Private Cloud Platform Team • Wide range
stack of technology ◦ Microservice/Distributed System Operation Knowledge ◦ Reading Large Amount of Code ◦ Many Dependent OSS e.g. OpenStack, Kubernetes, Rancher, Docker…. ◦ Networking e.g. OS Networking, Overlay Network ◦ Virtualization e.g. Container, Libvirt/KVM • Strong problem solving skill ◦ Troubleshooting tend to be complicated like cascade disaster • Mind to communicate/contribute to OSS Community ◦ As much as possible, we want to follow upstream development to reduce costs • Continuously learning new tech deeply ◦ Don’t finish just playing for new tech! There are many chances to improve ourselves

What’s coming in the future? • Our Private Cloud Scale
getting bigger ◦ More Region (Current: 3 Regions) ◦ More Hypervisors (Current: +1000 HV) ◦ More Cluster for Specific Use Case • Enhance Operation (Making Operation Easy) ◦ To be able to Operate Large Scale of Cloud with small team • Make Cloud Native Related Component Production Ready ◦ Managed Kubernetes Service is not production ready yet...

Thanks for listening

OpenStack, Kubernetes How we should face them /...

OpenStack, Kubernetes How we should face them / LINE Campus Talk in Hong Kong by Yuki Nishiwaki

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript