Slide 1

Slide 1 text

The ins and outs of networking in Google Container Engine Michael Rubin Tim Hockin

Slide 2

Slide 2 text

Kubernetes is about clusters Because of that, networking is pretty important Most of Kubernetes centers on network concepts Our job is to make sure your applications can communicate: ● With each other ● With the world outside your cluster ● Only where you want

Slide 3

Slide 3 text

It’s easy to get overwhelmed Many people are comfortable with TCP/IP, but containers bring new concepts: ● Namespaces ● Virtual interfaces ● IP forwarding ● Underlays ● Overlays ● iptables ● NAT It’s enough to make your head spin

Slide 4

Slide 4 text

Kubernetes is a very API-centric system - everything communicates through the API ● No private APIs ● No “system only” calls REST: Defined in terms of “resources” (nouns, aka “objects”) and methods (verbs) Background: API server

Slide 5

Slide 5 text

A small group of tightly-coupled containers & volumes, composed together The atom of Kubernetes Shared lifecycle and fate Shared networking - a shared “real” IP, containers see each other as localhost Background: Pods

Slide 6

Slide 6 text

A piece of code that watches the Kubernetes API and reacts The defining pattern of Kubernetes, used everywhere Self-healing, aka rectification Examples: ReplicaSet, Services, DNS, Kubelet Background: Controllers

Slide 7

Slide 7 text

Background: Labels Metadata (key-value) which can be attached to any API resource Labels: identification ● Allow users to define how to group resources ● Examples: app name, tier (frontend/backend), stage (dev/test/prod) Annotations: data that “rides along” with objects ● Third-party or internal state that isn’t part of an object’s schema role: fe stage: prod app: store

Slide 8

Slide 8 text

Background: Selectors Expresses which objects to act upon ● Think “select ... where” Provides very loose coupling Users can manage groups however they need Examples: services, deployments

Slide 9

Slide 9 text

Background: Selectors role: fe stage: test role: be stage: test role: fe stage: prod role: be stage: prod app: store app: store app: store app: store

Slide 10

Slide 10 text

Background: Selectors role: fe stage: test role: be stage: test role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, role=fe

Slide 11

Slide 11 text

Background: Selectors role: fe stage: test role: be stage: test role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, stage=test

Slide 12

Slide 12 text

Background: Selectors role: fe stage: test role: be stage: test role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store

Slide 13

Slide 13 text

The IP-per-pod model

Slide 14

Slide 14 text

Every pod has a real IP address This is different from the out-of-the-box model Docker offers ● No machine-private IPs ● No port-mapping Pod IPs are accessible from other pods, regardless of which VM they are on Linux “network namespaces” (aka “netns”) and virtual interfaces

Slide 15

Slide 15 text

VM Network namespaces eth0

Slide 16

Slide 16 text

VM Network namespaces root netns eth0

Slide 17

Slide 17 text

VM Network namespaces root netns eth0 pod1 netns

Slide 18

Slide 18 text

VM Network namespaces root netns eth0 pod1 netns vethxy vethxx

Slide 19

Slide 19 text

VM Network namespaces root netns eth0 pod1 netns eth0 vethxx

Slide 20

Slide 20 text

VM Network namespaces root netns eth0 pod1 netns pod2 netns eth0 eth0 vethxx vethyy

Slide 21

Slide 21 text

VM Network namespaces root netns eth0 pod1 netns pod2 netns eth0 eth0 vethxx vethyy cbr0

Slide 22

Slide 22 text

VM Life of a packet: pod-to-pod, same node root netns eth0 vethxx vethyy cbr0 pod1 netns pod2 netns eth0 eth0

Slide 23

Slide 23 text

VM Life of a packet: pod-to-pod, same node root netns pod1 netns pod2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 eth0

Slide 24

Slide 24 text

VM Life of a packet: pod-to-pod, same node root netns eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0

Slide 25

Slide 25 text

VM Life of a packet: pod-to-pod, same node root netns eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0

Slide 26

Slide 26 text

VM Life of a packet: pod-to-pod, same node root netns eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0

Slide 27

Slide 27 text

Flat network space Pods must be reachable across VMs, too Kubernetes doesn’t care HOW, but this is a requirement ● L2, L3, or overlay Assign a CIDR (IP block) to each VM GCP: Teach the network how to route packets

Slide 28

Slide 28 text

VM1 Life of a packet: pod-to-pod, across nodes root eth0 vethxx vethyy cbr0 VM2 root eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 29

Slide 29 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4

Slide 30

Slide 30 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 31

Slide 31 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 32

Slide 32 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 33

Slide 33 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 Anti-spoofing: only allow known source IPs (i.e. VMs) pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 34

Slide 34 text

Programming GCP’s network GKE automatically sets up routing for you using every trick it needs All VMs are created as “routers” ● --can-ip-forward ● Disable anti-spoof protection for this VM Add one GCP static route for each VM ● gcloud compute routes create vm2 --destination-range=x.y.z.0/24 --next-hop-instance=vm2 The GCP network does the rest

Slide 35

Slide 35 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 36

Slide 36 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 37

Slide 37 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 38

Slide 38 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0

Slide 39

Slide 39 text

VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0 ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4

Slide 40

Slide 40 text

Dealing with change You need something more durable than a pod IP A real cluster changes over time: ● Scale-up and scale-down events ● Rolling updates ● Pods crash or hang ● VMs reboot The pod addresses you want to talk to can change without warning

Slide 41

Slide 41 text

Services

Slide 42

Slide 42 text

The service abstraction A service is a group of endpoints (usually pods) Services provide a stable VIP VIP automatically routes to backend pods ● Implementations can vary ● We will examine the default implementation The set of pods “behind” a service can change Clients only need the VIP, which doesn’t change

Slide 43

Slide 43 text

Service What you submit is simple ● Other fields will be defaulted or assigned kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80

Slide 44

Slide 44 text

Service What you submit is simple ● Other fields will be defaulted or assigned The ‘selector’ field chooses which pods to balance across kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80

Slide 45

Slide 45 text

Service What you get back has more information Automatically creates a distributed load balancer kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None

Slide 46

Slide 46 text

Service What you get back has more information Automatically creates a distributed load balancer The default is to allocate an in-cluster IP kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None

Slide 47

Slide 47 text

Endpoints selector: app: store role: be app: store role: be 10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11

Slide 48

Slide 48 text

Endpoints selector: app: store role: be app: store role: be 10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11

Slide 49

Slide 49 text

Endpoints When you create a service, a controller wakes up kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP

Slide 50

Slide 50 text

Endpoints When you create a service, a controller wakes up Holds the IPs of the pod backends kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP

Slide 51

Slide 51 text

Life of a packet: pod-to-service root netns eth0 pod1 netns eth0 vethxx cbr0

Slide 52

Slide 52 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0

Slide 53

Slide 53 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0

Slide 54

Slide 54 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 src: pod1 dst: svc1 iptables pod1 netns eth0

Slide 55

Slide 55 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: pod1 dst: svc1 dst: pod99 DNAT, conntrack pod1 netns eth0

Slide 56

Slide 56 text

Conntrack Linux kernel connection-tracking Remembers address translations ● Based on the 5-tuple Does a lot more, but not very relevant here Reversed on the return path { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = svc1 dst_port = 80 } => { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = pod99 dst_port = 80 }

Slide 57

Slide 57 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: pod1 dst: pod99 pod1 netns eth0

Slide 58

Slide 58 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: pod99 dst: pod1 pod1 netns eth0

Slide 59

Slide 59 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: pod99 src: svc1 dst: pod1 un-DNAT pod1 netns eth0

Slide 60

Slide 60 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0

Slide 61

Slide 61 text

Life of a packet: pod-to-service root netns eth0 ctr1 netns eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0

Slide 62

Slide 62 text

The iptables rules look scary, but are actually simple: Configured by ‘kube-proxy’ - a pod running on each VM ● Not actually a proxy ● Not in the data path Kube-proxy is a controller - it watches the API for services if dest.ip == svc1.ip && dest.port == svc1.port { pick one of the backends at random rewrite destination IP } A bit more on iptables

Slide 63

Slide 63 text

DNS Even easier: services are added to an in-cluster DNS server You would never hardcode an IP, but you might hardcode a hostname and port Serves “A” and “SRV” records DNS itself runs as pods and a service

Slide 64

Slide 64 text

DNS Service Requests a particular cluster IP Pods are auto-scaled with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP

Slide 65

Slide 65 text

DNS Service Requests a particular cluster IP Pods are auto-scaled with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP

Slide 66

Slide 66 text

Simple and powerful Can use any port you want, no conflicts Can request a particular ‘clusterIP’ Can remap ports

Slide 67

Slide 67 text

That’s all there is to it Services are an abstraction - the API is a VIP No running process or intercepting the data-path All a client needs to do is hit the service IP:port

Slide 68

Slide 68 text

Sending external traffic Services are within a cluster What happens if you want your pod to reach google.com?

Slide 69

Slide 69 text

Egress

Slide 70

Slide 70 text

Leaving the GCP project VMs get private IPs (in 10.0.0.0/8) VMs can have public IPs, too GCP: Public IPs are provided by 1-to-1 NAT

Slide 71

Slide 71 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT

Slide 72

Slide 72 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: VM-internal dst: 8.8.8.8

Slide 73

Slide 73 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8

Slide 74

Slide 74 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: VM-external dst: 8.8.8.8

Slide 75

Slide 75 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: 8.8.8.8 dst: VM-external

Slide 76

Slide 76 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal

Slide 77

Slide 77 text

GCP Project VM Life of a packet: VM-to-internet root netns eth0 1:1 NAT src: 8.8.8.8 dst: VM-internal

Slide 78

Slide 78 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 1:1 NAT

Slide 79

Slide 79 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8

Slide 80

Slide 80 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8

Slide 81

Slide 81 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8

Slide 82

Slide 82 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 1:1 NAT dropped!

Slide 83

Slide 83 text

What went wrong? The 1:1 NAT only understands VM IPs ● Anything else gets dropped Pod IPs != VM IPs When in doubt, add some more iptables ● MASQUERADE, aka SNAT Applies to any packet with a destination *outside* of 10.0.0.0/8

Slide 84

Slide 84 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: pod1 src: VM-internal dst: 8.8.8.8 MASQUERADE

Slide 85

Slide 85 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal dst: 8.8.8.8

Slide 86

Slide 86 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8

Slide 87

Slide 87 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-external dst: 8.8.8.8

Slide 88

Slide 88 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external

Slide 89

Slide 89 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal

Slide 90

Slide 90 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal

Slide 91

Slide 91 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal dst: pod1

Slide 92

Slide 92 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1

Slide 93

Slide 93 text

GCP Project VM Life of a packet: pod-to-internet pod1 netns eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1

Slide 94

Slide 94 text

Receiving external traffic GCP offers multiple products here Kubernetes builds on two: ● Network Load Balancer (L4) ● HTTP/S Load balancer (L7) These map to Kubernetes APIs: ● Service type=LoadBalancer ● Ingress

Slide 95

Slide 95 text

L4: Service + LoadBalancer

Slide 96

Slide 96 text

Service Change the type of your service Implemented by the cloud provider controller kind: Service apiVersion: v1 metadata: name: store-be spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443

Slide 97

Slide 97 text

Service The LB info is populated when ready kind: Service apiVersion: v1 metadata: name: store-be # ... spec: type: LoadBalancer selector: app: store role: be clusterIP: 10.9.3.76 ports: # ... sessionAffinity: None status: loadBalancer: ingress: - ip: 86.75.30.9

Slide 98

Slide 98 text

GCP Project VM1 Life of a packet: external-to-service VM2 VM3 pod1 pod2 pod3

Slide 99

Slide 99 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 pod1 pod2 pod3

Slide 100

Slide 100 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 pod1 pod2 pod3

Slide 101

Slide 101 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB pod1 pod2 pod3

Slide 102

Slide 102 text

GCP Project VM1 VM1 VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3

Slide 103

Slide 103 text

GCP Project VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3

Slide 104

Slide 104 text

GCP Project VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 Rejected by firewall GKE runs: gcloud firewalls create ... pod1 pod2 pod3

Slide 105

Slide 105 text

GCP Project VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB pod1 pod2 pod3

Slide 106

Slide 106 text

GCP Project VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB pod1 pod2 pod3

Slide 107

Slide 107 text

Balancing to VMs The LB only knows about VMS VMs do not map 1:1 with pods VM1 VM2 VM3

Slide 108

Slide 108 text

GCP Project VM1 The imbalance problem Net LB VM2 VM3 pod1 pod2 pod3 Assume the LB only hits VMs with pods The LB only knows about VMS

Slide 109

Slide 109 text

GCP Project VM1 The imbalance problem Net LB VM2 VM3 pod1 pod2 pod3 50% 50%

Slide 110

Slide 110 text

GCP Project VM1 50% The imbalance problem Net LB VM2 VM3 50% 50% 25% 25% pod1 pod2 pod3

Slide 111

Slide 111 text

Balancing to VMs The LB only knows about VMS VMs do not map 1:1 with pods How do we avoid imbalance? iptables, of course VM1 VM2 VM3

Slide 112

Slide 112 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3

Slide 113

Slide 113 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3

Slide 114

Slide 114 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: LB dst: pod2 NAT pod1 pod2 pod3

Slide 115

Slide 115 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3

Slide 116

Slide 116 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3

Slide 117

Slide 117 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3

Slide 118

Slide 118 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3

Slide 119

Slide 119 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3

Slide 120

Slide 120 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3

Slide 121

Slide 121 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: client INVALID pod1 pod2 pod3

Slide 122

Slide 122 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client src: VM1 dst: LB dst: pod2 NAT pod1 pod2 pod3

Slide 123

Slide 123 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3

Slide 124

Slide 124 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3

Slide 125

Slide 125 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3

Slide 126

Slide 126 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3

Slide 127

Slide 127 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3

Slide 128

Slide 128 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3

Slide 129

Slide 129 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3

Slide 130

Slide 130 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 src: LB dst: VM1 dst: client pod1 pod2 pod3

Slide 131

Slide 131 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: LB dst: client pod1 pod2 pod3

Slide 132

Slide 132 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: LB dst: client pod1 pod2 pod3

Slide 133

Slide 133 text

Explain the complexity To avoid imbalance, we re-balance inside Kubernetes A backend is chosen randomly from all pods Good: ● Well balanced, in practice Bad: ● Can cause an extra network hop ● Hides the client IP from the user’s backend Users wanted to make the trade-off themselves

Slide 134

Slide 134 text

OnlyLocal Specify an external-traffic policy iptables will always choose a pod on the same node Preserves client IP Risks imbalance kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443

Slide 135

Slide 135 text

GCP Project VM1 50% Opt-in to the imbalance problem Net LB VM2 VM3 25% 25% iptables iptables In practice Kubernetes spreads pods across nodes If pods >> nodes: OK If nodes >> pods: OK If pods ~= nodes: risk pod1 pod2 pod3 50% 50%

Slide 136

Slide 136 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 Not considered Health-check fails if no backends pod1 pod2 pod3

Slide 137

Slide 137 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB pod1 pod2 pod3

Slide 138

Slide 138 text

GCP Project VM1 VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3

Slide 139

Slide 139 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM1 VM3 src: client dst: LB pod1 pod2 pod3

Slide 140

Slide 140 text

GCP Project VM1 VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: client dst: LB pod1 pod2 pod3

Slide 141

Slide 141 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3

Slide 142

Slide 142 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: LB dst: pod2 DNAT pod1 pod2 pod3

Slide 143

Slide 143 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3

Slide 144

Slide 144 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: pod2 src: LB dst: client pod1 pod2 pod3

Slide 145

Slide 145 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 iptables src: LB dst: client pod1 pod2 pod3

Slide 146

Slide 146 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: LB dst: client pod1 pod2 pod3

Slide 147

Slide 147 text

GCP Project VM1 Life of a packet: external-to-service Net LB VM2 VM3 src: LB dst: client pod1 pod2 pod3

Slide 148

Slide 148 text

L7: Ingress

Slide 149

Slide 149 text

Service Change the type of your service Allocates and forwards a port on every VM to the service port Exactly the same data path as the LB case kind: Service apiVersion: v1 metadata: name: store-be spec: type: NodePort selector: app: store role: be ports: - name: https port: 443

Slide 150

Slide 150 text

Ingress A different API resource Maps HTTP to services Implemented by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be

Slide 151

Slide 151 text

Ingress A different API resource Maps HTTP to services Implemented by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be

Slide 152

Slide 152 text

Ingress The LB info is populated when ready kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be status: loadBalancer: ingress: - ip: 86.73.50.9

Slide 153

Slide 153 text

GCP Project VM1 Life of a packet: external-to-ingress VM2 VM3 pod1 pod4 pod5 pod2 pod3

Slide 154

Slide 154 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 pod1 pod4 pod5 pod2 pod3

Slide 155

Slide 155 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 pod1 pod4 pod5 pod2 pod3

Slide 156

Slide 156 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: client dst: LB path: /products pod1 pod4 pod5 pod2 pod3

Slide 157

Slide 157 text

GCP Project VM1 VM1 VM1 VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3

Slide 158

Slide 158 text

GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3

Slide 159

Slide 159 text

GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 160

Slide 160 text

GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 161

Slide 161 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3

Slide 162

Slide 162 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3

Slide 163

Slide 163 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: GCLB src: VM3 dst: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3

Slide 164

Slide 164 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3

Slide 165

Slide 165 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3

Slide 166

Slide 166 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3

Slide 167

Slide 167 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3

Slide 168

Slide 168 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 169

Slide 169 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 170

Slide 170 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 171

Slide 171 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3

Slide 172

Slide 172 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: pod3 src: VM3 dst: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3

Slide 173

Slide 173 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 iptables src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3

Slide 174

Slide 174 text

GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3

Slide 175

Slide 175 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: VM3 dst: client pod1 pod4 pod5 pod2 pod3

Slide 176

Slide 176 text

GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2 VM3 src: LB dst: client pod1 pod4 pod5 pod2 pod3

Slide 177

Slide 177 text

OnlyLocal The same annotation as before Configured per-service iptables will always choose a pod on the same node Risks imbalance Removes 2nd hop kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: NodePort selector: app: store role: be ports: - name: https port: 443

Slide 178

Slide 178 text

But wait, there’s more! Things we didn’t really cover: ● Pod liveness probes ● Graceful termination ● Cloud health checks ● Firewalls ● Headless services ● IPAM ● SSL ● ...

Slide 179

Slide 179 text

Google Container Engine is a moving target The efforts of Open Source developers and Google Engineers continue to improve and simplify the system Google NEXT ‘18 will have more “ins” and “outs” for network traffic Watch this space

Slide 180

Slide 180 text

https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io Twitter: @kubernetesio

Slide 181

Slide 181 text

Thank you