The ins and outs of networking
in Google Container Engine
Michael Rubin
Tim Hockin
Slide 2
Slide 2 text
Kubernetes is about clusters
Because of that, networking is pretty important
Most of Kubernetes centers on network concepts
Our job is to make sure your applications can
communicate:
● With each other
● With the world outside your cluster
● Only where you want
Slide 3
Slide 3 text
It’s easy to get overwhelmed
Many people are comfortable with TCP/IP, but
containers bring new concepts:
● Namespaces
● Virtual interfaces
● IP forwarding
● Underlays
● Overlays
● iptables
● NAT
It’s enough to make your head spin
Slide 4
Slide 4 text
Kubernetes is a very API-centric system - everything
communicates through the API
● No private APIs
● No “system only” calls
REST: Defined in terms of “resources” (nouns, aka
“objects”) and methods (verbs)
Background: API server
Slide 5
Slide 5 text
A small group of tightly-coupled containers & volumes,
composed together
The atom of Kubernetes
Shared lifecycle and fate
Shared networking - a shared “real” IP, containers see
each other as localhost
Background: Pods
Slide 6
Slide 6 text
A piece of code that watches the Kubernetes API and
reacts
The defining pattern of Kubernetes, used everywhere
Self-healing, aka rectification
Examples: ReplicaSet, Services, DNS, Kubelet
Background: Controllers
Slide 7
Slide 7 text
Background: Labels
Metadata (key-value) which can be attached to any API
resource
Labels: identification
● Allow users to define how to group resources
● Examples: app name, tier (frontend/backend),
stage (dev/test/prod)
Annotations: data that “rides along” with objects
● Third-party or internal state that isn’t part of an
object’s schema
role: fe
stage: prod
app: store
Slide 8
Slide 8 text
Background: Selectors
Expresses which objects to act upon
● Think “select ... where”
Provides very loose coupling
Users can manage groups however they need
Examples: services, deployments
Slide 9
Slide 9 text
Background: Selectors
role: fe
stage: test
role: be
stage: test
role: fe
stage: prod
role: be
stage: prod
app: store
app: store
app: store
app: store
Slide 10
Slide 10 text
Background: Selectors
role: fe
stage: test
role: be
stage: test
role: fe
stage: prod
role: be
stage: prod
app: store
app: store
app: store
app: store
app=store, role=fe
Slide 11
Slide 11 text
Background: Selectors
role: fe
stage: test
role: be
stage: test
role: fe
stage: prod
role: be
stage: prod
app: store
app: store
app: store
app: store
app=store, stage=test
Slide 12
Slide 12 text
Background: Selectors
role: fe
stage: test
role: be
stage: test
role: fe
stage: prod
role: be
stage: prod
app: store
app: store
app: store
app: store
app=store
Slide 13
Slide 13 text
The IP-per-pod model
Slide 14
Slide 14 text
Every pod has a real IP address
This is different from the out-of-the-box model Docker
offers
● No machine-private IPs
● No port-mapping
Pod IPs are accessible from other pods, regardless of
which VM they are on
Linux “network namespaces” (aka “netns”) and virtual
interfaces
Slide 15
Slide 15 text
VM
Network namespaces
eth0
Slide 16
Slide 16 text
VM
Network namespaces
root netns
eth0
Slide 17
Slide 17 text
VM
Network namespaces
root netns
eth0
pod1 netns
Slide 18
Slide 18 text
VM
Network namespaces
root netns
eth0
pod1 netns
vethxy
vethxx
Slide 19
Slide 19 text
VM
Network namespaces
root netns
eth0
pod1 netns
eth0
vethxx
VM
Life of a packet: pod-to-pod, same node
root
netns
eth0
vethxx vethyy
cbr0
pod1 netns pod2 netns
eth0 eth0
Slide 23
Slide 23 text
VM
Life of a packet: pod-to-pod, same node
root
netns
pod1 netns pod2 netns
eth0
eth0
vethxx vethyy
cbr0
src: pod1
dst: pod2
eth0
Slide 24
Slide 24 text
VM
Life of a packet: pod-to-pod, same node
root
netns
eth0
ctr1 netns ctr2 netns
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod2
pod1 netns pod2 netns
eth0 eth0
Slide 25
Slide 25 text
VM
Life of a packet: pod-to-pod, same node
root
netns
eth0
ctr1 netns ctr2 netns
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod2
pod1 netns pod2 netns
eth0 eth0
Slide 26
Slide 26 text
VM
Life of a packet: pod-to-pod, same node
root
netns
eth0
ctr1 netns ctr2 netns
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod2
pod1 netns pod2 netns
eth0 eth0
Slide 27
Slide 27 text
Flat network space
Pods must be reachable across VMs, too
Kubernetes doesn’t care HOW, but this is a requirement
● L2, L3, or overlay
Assign a CIDR (IP block) to each VM
GCP: Teach the network how to route packets
Slide 28
Slide 28 text
VM1
Life of a packet: pod-to-pod, across nodes
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
vethxx vethyy
cbr0
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 29
Slide 29 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
src: pod1
dst: pod4
Slide 30
Slide 30 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 31
Slide 31 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 32
Slide 32 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 33
Slide 33 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
Anti-spoofing:
only allow
known source
IPs (i.e. VMs)
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 34
Slide 34 text
Programming GCP’s network
GKE automatically sets up routing for you using every
trick it needs
All VMs are created as “routers”
● --can-ip-forward
● Disable anti-spoof protection for this VM
Add one GCP static route for each VM
● gcloud compute routes create vm2
--destination-range=x.y.z.0/24
--next-hop-instance=vm2
The GCP network does the rest
Slide 35
Slide 35 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 36
Slide 36 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 37
Slide 37 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 38
Slide 38 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
src: pod1
dst: pod4
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
Slide 39
Slide 39 text
VM1
Life of a packet: pod-to-pod, across nodes
ctr1
eth0
ctr2
eth0
root
eth0
vethxx vethyy
cbr0
VM2
root
eth0
ctr3 ctr4
eth0 eth0
vethxx vethyy
cbr0
pod1
eth0
pod2
eth0
pod3 pod4
eth0 eth0
src: pod1
dst: pod4
Slide 40
Slide 40 text
Dealing with change
You need something more durable than a pod IP
A real cluster changes over time:
● Scale-up and scale-down events
● Rolling updates
● Pods crash or hang
● VMs reboot
The pod addresses you want to talk to can change
without warning
Slide 41
Slide 41 text
Services
Slide 42
Slide 42 text
The service abstraction
A service is a group of endpoints (usually pods)
Services provide a stable VIP
VIP automatically routes to backend pods
● Implementations can vary
● We will examine the default implementation
The set of pods “behind” a service can change
Clients only need the VIP, which doesn’t change
Slide 43
Slide 43 text
Service
What you submit is simple
● Other fields will be
defaulted or assigned
kind: Service
apiVersion: v1
metadata:
name: store-be
spec:
selector:
app: store
role: be
ports:
- name: http
port: 80
Slide 44
Slide 44 text
Service
What you submit is simple
● Other fields will be
defaulted or assigned
The ‘selector’ field chooses
which pods to balance across
kind: Service
apiVersion: v1
metadata:
name: store-be
spec:
selector:
app: store
role: be
ports:
- name: http
port: 80
Slide 45
Slide 45 text
Service
What you get back has more
information
Automatically creates a
distributed load balancer
kind: Service
apiVersion: v1
metadata:
name: store-be
namespace: default
creationTimestamp: 2016-05-06T19:16:56Z
resourceVersion: "7"
selfLink:
/api/v1/namespaces/default/services/store-be
uid: 196d5751-13bf-11e6-9353-42010a800fe3
Spec:
type: ClusterIP
selector:
app: store
role: be
clusterIP: 10.9.3.76
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: None
Slide 46
Slide 46 text
Service
What you get back has more
information
Automatically creates a
distributed load balancer
The default is to allocate an
in-cluster IP
kind: Service
apiVersion: v1
metadata:
name: store-be
namespace: default
creationTimestamp: 2016-05-06T19:16:56Z
resourceVersion: "7"
selfLink:
/api/v1/namespaces/default/services/store-be
uid: 196d5751-13bf-11e6-9353-42010a800fe3
Spec:
type: ClusterIP
selector:
app: store
role: be
clusterIP: 10.9.3.76
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: None
Slide 47
Slide 47 text
Endpoints
selector:
app: store
role: be
app: store
role: be
10.11.8.67
app: store
role: be
10.11.5.3
app: store
role: be
10.11.0.9
app: db
role: be
10.7.1.18
app: store
role: fe
10.11.8.67
app: db
role: be
10.4.1.11
Slide 48
Slide 48 text
Endpoints
selector:
app: store
role: be
app: store
role: be
10.11.8.67
app: store
role: be
10.11.5.3
app: store
role: be
10.11.0.9
app: db
role: be
10.7.1.18
app: store
role: fe
10.11.8.67
app: db
role: be
10.4.1.11
Slide 49
Slide 49 text
Endpoints
When you create a service, a
controller wakes up
kind: Endpoints
apiVersion: v1
metadata:
name: store-be
namespace: default
subsets:
- addresses:
- ip: 10.11.8.67
- ip: 10.11.5.3
- ip: 10.11.0.9
ports:
- name: http
port: 80
protocol: TCP
Slide 50
Slide 50 text
Endpoints
When you create a service, a
controller wakes up
Holds the IPs of the pod
backends
kind: Endpoints
apiVersion: v1
metadata:
name: store-be
namespace: default
subsets:
- addresses:
- ip: 10.11.8.67
- ip: 10.11.5.3
- ip: 10.11.0.9
ports:
- name: http
port: 80
protocol: TCP
Slide 51
Slide 51 text
Life of a packet: pod-to-service
root
netns
eth0
pod1 netns
eth0
vethxx
cbr0
Slide 52
Slide 52 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
src: pod1
dst: svc1
pod1 netns
eth0
Slide 53
Slide 53 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
src: pod1
dst: svc1
pod1 netns
eth0
Slide 54
Slide 54 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
src: pod1
dst: svc1 iptables
pod1 netns
eth0
Slide 55
Slide 55 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: pod1
dst: svc1
dst: pod99
DNAT, conntrack
pod1 netns
eth0
Slide 56
Slide 56 text
Conntrack
Linux kernel connection-tracking
Remembers address translations
● Based on the 5-tuple
Does a lot more, but not very
relevant here
Reversed on the return path
{
protocol = TCP
src_ip = pod1
src_port = 1234
dst_ip = svc1
dst_port = 80
} => {
protocol = TCP
src_ip = pod1
src_port = 1234
dst_ip = pod99
dst_port = 80
}
Slide 57
Slide 57 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: pod1
dst: pod99
pod1 netns
eth0
Slide 58
Slide 58 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: pod99
dst: pod1
pod1 netns
eth0
Slide 59
Slide 59 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: pod99
src: svc1
dst: pod1
un-DNAT
pod1 netns
eth0
Slide 60
Slide 60 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: svc1
dst: pod1
pod1 netns
eth0
Slide 61
Slide 61 text
Life of a packet: pod-to-service
root
netns
eth0
ctr1 netns
eth0
vethxx
cbr0
iptables
src: svc1
dst: pod1
pod1 netns
eth0
Slide 62
Slide 62 text
The iptables rules look scary, but are actually simple:
Configured by ‘kube-proxy’ - a pod running on each VM
● Not actually a proxy
● Not in the data path
Kube-proxy is a controller - it watches the API for services
if dest.ip == svc1.ip && dest.port == svc1.port {
pick one of the backends at random
rewrite destination IP
}
A bit more on iptables
Slide 63
Slide 63 text
DNS
Even easier: services are added to an in-cluster DNS
server
You would never hardcode an IP, but you might
hardcode a hostname and port
Serves “A” and “SRV” records
DNS itself runs as pods and a service
Slide 64
Slide 64 text
DNS Service
Requests a particular cluster IP
Pods are auto-scaled with the
cluster size
Service VIP is stable
kind: Service
apiVersion: v1
metadata:
name: kube-dns
namespace: kube-system
spec:
clusterIP: 10.0.0.10
selector:
k8s-app: kube-dns
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
Slide 65
Slide 65 text
DNS Service
Requests a particular cluster IP
Pods are auto-scaled with the
cluster size
Service VIP is stable
kind: Service
apiVersion: v1
metadata:
name: kube-dns
namespace: kube-system
spec:
clusterIP: 10.0.0.10
selector:
k8s-app: kube-dns
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
Slide 66
Slide 66 text
Simple and powerful
Can use any port you want, no conflicts
Can request a particular ‘clusterIP’
Can remap ports
Slide 67
Slide 67 text
That’s all there is to it
Services are an abstraction - the API is a VIP
No running process or intercepting the data-path
All a client needs to do is hit the service IP:port
Slide 68
Slide 68 text
Sending external traffic
Services are within a cluster
What happens if you want your pod to reach google.com?
Slide 69
Slide 69 text
Egress
Slide 70
Slide 70 text
Leaving the GCP project
VMs get private IPs (in 10.0.0.0/8)
VMs can have public IPs, too
GCP: Public IPs are provided by 1-to-1 NAT
Slide 71
Slide 71 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
Slide 72
Slide 72 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: VM-internal
dst: 8.8.8.8
Slide 73
Slide 73 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: VM-internal
src: VM-external
dst: 8.8.8.8
Slide 74
Slide 74 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: VM-external
dst: 8.8.8.8
Slide 75
Slide 75 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: 8.8.8.8
dst: VM-external
Slide 76
Slide 76 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: 8.8.8.8
dst: VM-external
dst: VM-internal
Slide 77
Slide 77 text
GCP Project
VM
Life of a packet: VM-to-internet
root
netns
eth0
1:1 NAT
src: 8.8.8.8
dst: VM-internal
Slide 78
Slide 78 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
1:1 NAT
Slide 79
Slide 79 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
1:1 NAT
src: pod1
dst: 8.8.8.8
Slide 80
Slide 80 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
1:1 NAT
src: pod1
dst: 8.8.8.8
Slide 81
Slide 81 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
1:1 NAT
src: pod1
dst: 8.8.8.8
Slide 82
Slide 82 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
1:1 NAT
dropped!
Slide 83
Slide 83 text
What went wrong?
The 1:1 NAT only understands VM IPs
● Anything else gets dropped
Pod IPs != VM IPs
When in doubt, add some more iptables
● MASQUERADE, aka SNAT
Applies to any packet with a destination *outside* of 10.0.0.0/8
Slide 84
Slide 84 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: pod1
src: VM-internal
dst: 8.8.8.8
MASQUERADE
Slide 85
Slide 85 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: VM-internal
dst: 8.8.8.8
Slide 86
Slide 86 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: VM-internal
src: VM-external
dst: 8.8.8.8
Slide 87
Slide 87 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: VM-external
dst: 8.8.8.8
Slide 88
Slide 88 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: VM-external
Slide 89
Slide 89 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: VM-external
dst: VM-internal
Slide 90
Slide 90 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: VM-internal
Slide 91
Slide 91 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: VM-internal
dst: pod1
Slide 92
Slide 92 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: pod1
Slide 93
Slide 93 text
GCP Project
VM
Life of a packet: pod-to-internet
pod1 netns
eth0
root
netns
eth0
vethxx
cbr0
iptables
1:1 NAT
src: 8.8.8.8
dst: pod1
Slide 94
Slide 94 text
Receiving external traffic
GCP offers multiple products here
Kubernetes builds on two:
● Network Load Balancer (L4)
● HTTP/S Load balancer (L7)
These map to Kubernetes APIs:
● Service type=LoadBalancer
● Ingress
Slide 95
Slide 95 text
L4: Service + LoadBalancer
Slide 96
Slide 96 text
Service
Change the type of your service
Implemented by the cloud
provider controller
kind: Service
apiVersion: v1
metadata:
name: store-be
spec:
type: LoadBalancer
selector:
app: store
role: be
ports:
- name: https
port: 443
Slide 97
Slide 97 text
Service
The LB info is populated when
ready
kind: Service
apiVersion: v1
metadata:
name: store-be
# ...
spec:
type: LoadBalancer
selector:
app: store
role: be
clusterIP: 10.9.3.76
ports:
# ...
sessionAffinity: None
status:
loadBalancer:
ingress:
- ip: 86.75.30.9
Slide 98
Slide 98 text
GCP Project
VM1
Life of a packet: external-to-service
VM2 VM3
pod1 pod2 pod3
Slide 99
Slide 99 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
pod1 pod2 pod3
Slide 100
Slide 100 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
pod1 pod2 pod3
Slide 101
Slide 101 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 102
Slide 102 text
GCP Project
VM1 VM1 VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
Choose a VM
pod1 pod2 pod3
Slide 103
Slide 103 text
GCP Project
VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
Choose a VM
pod1 pod2 pod3
Slide 104
Slide 104 text
GCP Project
VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
Rejected by
firewall
GKE runs:
gcloud firewalls
create ...
pod1 pod2 pod3
Slide 105
Slide 105 text
GCP Project
VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 106
Slide 106 text
GCP Project
VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 107
Slide 107 text
Balancing to VMs
The LB only knows about VMS
VMs do not map 1:1 with pods
VM1 VM2 VM3
Slide 108
Slide 108 text
GCP Project
VM1
The imbalance problem
Net LB
VM2 VM3
pod1 pod2 pod3
Assume the LB only hits VMs with pods
The LB only knows about VMS
Slide 109
Slide 109 text
GCP Project
VM1
The imbalance problem
Net LB
VM2 VM3
pod1 pod2 pod3
50% 50%
Slide 110
Slide 110 text
GCP Project
VM1
50%
The imbalance problem
Net LB
VM2 VM3
50% 50%
25% 25%
pod1 pod2 pod3
Slide 111
Slide 111 text
Balancing to VMs
The LB only knows about VMS
VMs do not map 1:1 with pods
How do we avoid imbalance?
iptables, of course
VM1 VM2 VM3
Slide 112
Slide 112 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: LB
Choose a pod
pod1 pod2 pod3
Slide 113
Slide 113 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: LB
Choose a pod
pod1 pod2 pod3
Slide 114
Slide 114 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: LB
dst: pod2
NAT
pod1 pod2 pod3
Slide 115
Slide 115 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: pod2
pod1 pod2 pod3
Slide 116
Slide 116 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: pod2
pod1 pod2 pod3
Slide 117
Slide 117 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: pod2
pod1 pod2 pod3
Slide 118
Slide 118 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: pod2
pod1 pod2 pod3
Slide 119
Slide 119 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: client
pod1 pod2 pod3
Slide 120
Slide 120 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: client
pod1 pod2 pod3
Slide 121
Slide 121 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: client
INVALID
pod1 pod2 pod3
Slide 122
Slide 122 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
src: VM1
dst: LB
dst: pod2
NAT
pod1 pod2 pod3
Slide 123
Slide 123 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: VM1
dst: pod2
pod1 pod2 pod3
Slide 124
Slide 124 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: VM1
dst: pod2
pod1 pod2 pod3
Slide 125
Slide 125 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: VM1
dst: pod2
pod1 pod2 pod3
Slide 126
Slide 126 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: VM1
dst: pod2
pod1 pod2 pod3
Slide 127
Slide 127 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: VM1
pod1 pod2 pod3
Slide 128
Slide 128 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: VM1
pod1 pod2 pod3
Slide 129
Slide 129 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
dst: VM1
pod1 pod2 pod3
Slide 130
Slide 130 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
src: LB
dst: VM1
dst: client
pod1 pod2 pod3
Slide 131
Slide 131 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: LB
dst: client
pod1 pod2 pod3
Slide 132
Slide 132 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: LB
dst: client
pod1 pod2 pod3
Slide 133
Slide 133 text
Explain the complexity
To avoid imbalance, we re-balance inside Kubernetes
A backend is chosen randomly from all pods
Good:
● Well balanced, in practice
Bad:
● Can cause an extra network hop
● Hides the client IP from the user’s backend
Users wanted to make the trade-off themselves
Slide 134
Slide 134 text
OnlyLocal
Specify an external-traffic policy
iptables will always choose a
pod on the same node
Preserves client IP
Risks imbalance
kind: Service
apiVersion: v1
metadata:
name: store-be
annotations:
service.beta.kubernetes.io/external-traffic:
OnlyLocal
spec:
type: LoadBalancer
selector:
app: store
role: be
ports:
- name: https
port: 443
Slide 135
Slide 135 text
GCP Project
VM1
50%
Opt-in to the imbalance problem
Net LB
VM2 VM3
25% 25%
iptables
iptables
In practice Kubernetes spreads
pods across nodes
If pods >> nodes: OK
If nodes >> pods: OK
If pods ~= nodes: risk
pod1 pod2 pod3
50% 50%
Slide 136
Slide 136 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
Not considered
Health-check
fails if no
backends
pod1 pod2 pod3
Slide 137
Slide 137 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 138
Slide 138 text
GCP Project
VM1 VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
Choose a VM
pod1 pod2 pod3
Slide 139
Slide 139 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM1
VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 140
Slide 140 text
GCP Project
VM1
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: client
dst: LB
pod1 pod2 pod3
Slide 141
Slide 141 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: LB
Choose a pod
pod1 pod2 pod3
Slide 142
Slide 142 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: LB
dst: pod2
DNAT
pod1 pod2 pod3
Slide 143
Slide 143 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: client
dst: pod2
pod1 pod2 pod3
Slide 144
Slide 144 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: pod2
src: LB
dst: client
pod1 pod2 pod3
Slide 145
Slide 145 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
iptables
src: LB
dst: client
pod1 pod2 pod3
Slide 146
Slide 146 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: LB
dst: client
pod1 pod2 pod3
Slide 147
Slide 147 text
GCP Project
VM1
Life of a packet: external-to-service
Net LB
VM2 VM3
src: LB
dst: client
pod1 pod2 pod3
Slide 148
Slide 148 text
L7: Ingress
Slide 149
Slide 149 text
Service
Change the type of your service
Allocates and forwards a port on
every VM to the service port
Exactly the same data path as
the LB case
kind: Service
apiVersion: v1
metadata:
name: store-be
spec:
type: NodePort
selector:
app: store
role: be
ports:
- name: https
port: 443
Slide 150
Slide 150 text
Ingress
A different API resource
Maps HTTP to services
Implemented by the cloud
provider controller
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: store-ing
spec:
rules:
- http:
paths:
- path: /customers
backend:
serviceName: customers-be
- path: /products
backend:
serviceName: products-be
Slide 151
Slide 151 text
Ingress
A different API resource
Maps HTTP to services
Implemented by the cloud
provider controller
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: store-ing
spec:
rules:
- http:
paths:
- path: /customers
backend:
serviceName: customers-be
- path: /products
backend:
serviceName: products-be
Slide 152
Slide 152 text
Ingress
The LB info is populated when
ready
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: store-ing
spec:
rules:
- http:
paths:
- path: /customers
backend:
serviceName: customers-be
- path: /products
backend:
serviceName: products-be
status:
loadBalancer:
ingress:
- ip: 86.73.50.9
Slide 153
Slide 153 text
GCP Project
VM1
Life of a packet: external-to-ingress
VM2 VM3
pod1 pod4 pod5
pod2 pod3
Slide 154
Slide 154 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
pod1 pod4 pod5
pod2 pod3
Slide 155
Slide 155 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
pod1 pod4 pod5
pod2 pod3
Slide 156
Slide 156 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: client
dst: LB
path: /products
pod1 pod4 pod5
pod2 pod3
Slide 157
Slide 157 text
GCP Project
VM1 VM1 VM1
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: client
dst: LB
path: /products
Choose a VM
pod1 pod4 pod5
pod2 pod3
Slide 158
Slide 158 text
GCP Project
VM1
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: client
dst: LB
path: /products
Choose a VM
pod1 pod4 pod5
pod2 pod3
Slide 159
Slide 159 text
GCP Project
VM1
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: GCLB
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 160
Slide 160 text
GCP Project
VM1
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: GCLB
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 161
Slide 161 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: GCLB
dst: VM3
Choose a pod
pod1 pod4 pod5
pod2 pod3
Slide 162
Slide 162 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: GCLB
dst: VM3
Choose a pod
pod1 pod4 pod5
pod2 pod3
Slide 163
Slide 163 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: GCLB
src: VM3
dst: VM3
dst: pod3
pod1 pod4 pod5
pod2 pod3
Slide 164
Slide 164 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: VM3
dst: pod3
pod1 pod4 pod5
pod2 pod3
Slide 165
Slide 165 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: VM3
dst: pod3
pod1 pod4 pod5
pod2 pod3
Slide 166
Slide 166 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: VM3
dst: pod3
pod1 pod4 pod5
pod2 pod3
Slide 167
Slide 167 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: VM3
dst: pod3
pod1 pod4 pod5
pod2 pod3
Slide 168
Slide 168 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: pod3
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 169
Slide 169 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: pod3
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 170
Slide 170 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: pod3
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 171
Slide 171 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: pod3
dst: VM3
pod1 pod4 pod5
pod2 pod3
Slide 172
Slide 172 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: pod3
src: VM3
dst: VM3
dst: GCLB
pod1 pod4 pod5
pod2 pod3
Slide 173
Slide 173 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
iptables
src: VM3
dst: GCLB
pod1 pod4 pod5
pod2 pod3
Slide 174
Slide 174 text
GCP Project
VM1
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: VM3
dst: GCLB
pod1 pod4 pod5
pod2 pod3
Slide 175
Slide 175 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: VM3
dst: client
pod1 pod4 pod5
pod2 pod3
Slide 176
Slide 176 text
GCP Project
VM1
Life of a packet: external-to-ingress
GCLB
VM2 VM3
src: LB
dst: client
pod1 pod4 pod5
pod2 pod3
Slide 177
Slide 177 text
OnlyLocal
The same annotation as before
Configured per-service
iptables will always choose a
pod on the same node
Risks imbalance
Removes 2nd hop
kind: Service
apiVersion: v1
metadata:
name: store-be
annotations:
service.beta.kubernetes.io/external-traffic:
OnlyLocal
spec:
type: NodePort
selector:
app: store
role: be
ports:
- name: https
port: 443
Slide 178
Slide 178 text
But wait, there’s more!
Things we didn’t really cover:
● Pod liveness probes
● Graceful termination
● Cloud health checks
● Firewalls
● Headless services
● IPAM
● SSL
● ...
Slide 179
Slide 179 text
Google Container Engine is a moving target
The efforts of Open Source developers and Google
Engineers continue to improve and simplify the system
Google NEXT ‘18 will have more “ins” and “outs” for
network traffic
Watch this space