The Ins and Outs of Networking in Google Container Engine

The Ins and Outs of Networking in Google Container Engine

569f10721398d92f5033097ac6d9132c?s=128

Tim Hockin

March 10, 2017
Tweet

Transcript

  1. The ins and outs of networking in Google Container Engine

    Michael Rubin Tim Hockin
  2. Kubernetes is about clusters Because of that, networking is pretty

    important Most of Kubernetes centers on network concepts Our job is to make sure your applications can communicate: • With each other • With the world outside your cluster • Only where you want
  3. It’s easy to get overwhelmed Many people are comfortable with

    TCP/IP, but containers bring new concepts: • Namespaces • Virtual interfaces • IP forwarding • Underlays • Overlays • iptables • NAT It’s enough to make your head spin
  4. Kubernetes is a very API-centric system - everything communicates through

    the API • No private APIs • No “system only” calls REST: Defined in terms of “resources” (nouns, aka “objects”) and methods (verbs) Background: API server
  5. A small group of tightly-coupled containers & volumes, composed together

    The atom of Kubernetes Shared lifecycle and fate Shared networking - a shared “real” IP, containers see each other as localhost Background: Pods
  6. A piece of code that watches the Kubernetes API and

    reacts The defining pattern of Kubernetes, used everywhere Self-healing, aka rectification Examples: ReplicaSet, Services, DNS, Kubelet Background: Controllers
  7. Background: Labels Metadata (key-value) which can be attached to any

    API resource Labels: identification • Allow users to define how to group resources • Examples: app name, tier (frontend/backend), stage (dev/test/prod) Annotations: data that “rides along” with objects • Third-party or internal state that isn’t part of an object’s schema role: fe stage: prod app: store
  8. Background: Selectors Expresses which objects to act upon • Think

    “select ... where” Provides very loose coupling Users can manage groups however they need Examples: services, deployments
  9. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store
  10. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, role=fe
  11. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, stage=test
  12. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store
  13. The IP-per-pod model

  14. Every pod has a real IP address This is different

    from the out-of-the-box model Docker offers • No machine-private IPs • No port-mapping Pod IPs are accessible from other pods, regardless of which VM they are on Linux “network namespaces” (aka “netns”) and virtual interfaces
  15. VM Network namespaces eth0

  16. VM Network namespaces root netns eth0

  17. VM Network namespaces root netns eth0 pod1 netns

  18. VM Network namespaces root netns eth0 pod1 netns vethxy vethxx

  19. VM Network namespaces root netns eth0 pod1 netns eth0 vethxx

  20. VM Network namespaces root netns eth0 pod1 netns pod2 netns

    eth0 eth0 vethxx vethyy
  21. VM Network namespaces root netns eth0 pod1 netns pod2 netns

    eth0 eth0 vethxx vethyy cbr0
  22. VM Life of a packet: pod-to-pod, same node root netns

    eth0 vethxx vethyy cbr0 pod1 netns pod2 netns eth0 eth0
  23. VM Life of a packet: pod-to-pod, same node root netns

    pod1 netns pod2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 eth0
  24. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  25. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  26. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  27. Flat network space Pods must be reachable across VMs, too

    Kubernetes doesn’t care HOW, but this is a requirement • L2, L3, or overlay Assign a CIDR (IP block) to each VM GCP: Teach the network how to route packets
  28. VM1 Life of a packet: pod-to-pod, across nodes root eth0

    vethxx vethyy cbr0 VM2 root eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  29. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4
  30. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  31. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  32. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  33. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 Anti-spoofing: only allow known source IPs (i.e. VMs) pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  34. Programming GCP’s network GKE automatically sets up routing for you

    using every trick it needs All VMs are created as “routers” • --can-ip-forward • Disable anti-spoof protection for this VM Add one GCP static route for each VM • gcloud compute routes create vm2 --destination-range=x.y.z.0/24 --next-hop-instance=vm2 The GCP network does the rest
  35. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  36. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  37. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  38. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  39. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4
  40. Dealing with change You need something more durable than a

    pod IP A real cluster changes over time: • Scale-up and scale-down events • Rolling updates • Pods crash or hang • VMs reboot The pod addresses you want to talk to can change without warning
  41. Services

  42. The service abstraction A service is a group of endpoints

    (usually pods) Services provide a stable VIP VIP automatically routes to backend pods • Implementations can vary • We will examine the default implementation The set of pods “behind” a service can change Clients only need the VIP, which doesn’t change
  43. Service What you submit is simple • Other fields will

    be defaulted or assigned kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80
  44. Service What you submit is simple • Other fields will

    be defaulted or assigned The ‘selector’ field chooses which pods to balance across kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80
  45. Service What you get back has more information Automatically creates

    a distributed load balancer kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None
  46. Service What you get back has more information Automatically creates

    a distributed load balancer The default is to allocate an in-cluster IP kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None
  47. Endpoints selector: app: store role: be app: store role: be

    10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11
  48. Endpoints selector: app: store role: be app: store role: be

    10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11
  49. Endpoints When you create a service, a controller wakes up

    kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP
  50. Endpoints When you create a service, a controller wakes up

    Holds the IPs of the pod backends kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP
  51. Life of a packet: pod-to-service root netns eth0 pod1 netns

    eth0 vethxx cbr0
  52. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0
  53. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0
  54. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 iptables pod1 netns eth0
  55. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod1 dst: svc1 dst: pod99 DNAT, conntrack pod1 netns eth0
  56. Conntrack Linux kernel connection-tracking Remembers address translations • Based on

    the 5-tuple Does a lot more, but not very relevant here Reversed on the return path { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = svc1 dst_port = 80 } => { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = pod99 dst_port = 80 }
  57. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod1 dst: pod99 pod1 netns eth0
  58. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod99 dst: pod1 pod1 netns eth0
  59. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod99 src: svc1 dst: pod1 un-DNAT pod1 netns eth0
  60. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0
  61. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0
  62. The iptables rules look scary, but are actually simple: Configured

    by ‘kube-proxy’ - a pod running on each VM • Not actually a proxy • Not in the data path Kube-proxy is a controller - it watches the API for services if dest.ip == svc1.ip && dest.port == svc1.port { pick one of the backends at random rewrite destination IP } A bit more on iptables
  63. DNS Even easier: services are added to an in-cluster DNS

    server You would never hardcode an IP, but you might hardcode a hostname and port Serves “A” and “SRV” records DNS itself runs as pods and a service
  64. DNS Service Requests a particular cluster IP Pods are auto-scaled

    with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP
  65. DNS Service Requests a particular cluster IP Pods are auto-scaled

    with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP
  66. Simple and powerful Can use any port you want, no

    conflicts Can request a particular ‘clusterIP’ Can remap ports
  67. That’s all there is to it Services are an abstraction

    - the API is a VIP No running process or intercepting the data-path All a client needs to do is hit the service IP:port
  68. Sending external traffic Services are within a cluster What happens

    if you want your pod to reach google.com?
  69. Egress

  70. Leaving the GCP project VMs get private IPs (in 10.0.0.0/8)

    VMs can have public IPs, too GCP: Public IPs are provided by 1-to-1 NAT
  71. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT
  72. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-internal dst: 8.8.8.8
  73. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8
  74. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-external dst: 8.8.8.8
  75. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-external
  76. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal
  77. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-internal
  78. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT
  79. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  80. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  81. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  82. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT dropped!
  83. What went wrong? The 1:1 NAT only understands VM IPs

    • Anything else gets dropped Pod IPs != VM IPs When in doubt, add some more iptables • MASQUERADE, aka SNAT Applies to any packet with a destination *outside* of 10.0.0.0/8
  84. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: pod1 src: VM-internal dst: 8.8.8.8 MASQUERADE
  85. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal dst: 8.8.8.8
  86. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8
  87. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-external dst: 8.8.8.8
  88. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external
  89. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal
  90. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal
  91. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal dst: pod1
  92. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1
  93. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1
  94. Receiving external traffic GCP offers multiple products here Kubernetes builds

    on two: • Network Load Balancer (L4) • HTTP/S Load balancer (L7) These map to Kubernetes APIs: • Service type=LoadBalancer • Ingress
  95. L4: Service + LoadBalancer

  96. Service Change the type of your service Implemented by the

    cloud provider controller kind: Service apiVersion: v1 metadata: name: store-be spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443
  97. Service The LB info is populated when ready kind: Service

    apiVersion: v1 metadata: name: store-be # ... spec: type: LoadBalancer selector: app: store role: be clusterIP: 10.9.3.76 ports: # ... sessionAffinity: None status: loadBalancer: ingress: - ip: 86.75.30.9
  98. GCP Project VM1 Life of a packet: external-to-service VM2 VM3

    pod1 pod2 pod3
  99. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 pod1 pod2 pod3
  100. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 pod1 pod2 pod3
  101. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: client dst: LB pod1 pod2 pod3
  102. GCP Project VM1 VM1 VM1 VM1 Life of a packet:

    external-to-service Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  103. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  104. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 Rejected by firewall GKE runs: gcloud firewalls create ... pod1 pod2 pod3
  105. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  106. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  107. Balancing to VMs The LB only knows about VMS VMs

    do not map 1:1 with pods VM1 VM2 VM3
  108. GCP Project VM1 The imbalance problem Net LB VM2 VM3

    pod1 pod2 pod3 Assume the LB only hits VMs with pods The LB only knows about VMS
  109. GCP Project VM1 The imbalance problem Net LB VM2 VM3

    pod1 pod2 pod3 50% 50%
  110. GCP Project VM1 50% The imbalance problem Net LB VM2

    VM3 50% 50% 25% 25% pod1 pod2 pod3
  111. Balancing to VMs The LB only knows about VMS VMs

    do not map 1:1 with pods How do we avoid imbalance? iptables, of course VM1 VM2 VM3
  112. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  113. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  114. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB dst: pod2 NAT pod1 pod2 pod3
  115. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  116. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  117. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  118. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  119. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3
  120. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3
  121. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client INVALID pod1 pod2 pod3
  122. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client src: VM1 dst: LB dst: pod2 NAT pod1 pod2 pod3
  123. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  124. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  125. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  126. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  127. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  128. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  129. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  130. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 src: LB dst: VM1 dst: client pod1 pod2 pod3
  131. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  132. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  133. Explain the complexity To avoid imbalance, we re-balance inside Kubernetes

    A backend is chosen randomly from all pods Good: • Well balanced, in practice Bad: • Can cause an extra network hop • Hides the client IP from the user’s backend Users wanted to make the trade-off themselves
  134. OnlyLocal Specify an external-traffic policy iptables will always choose a

    pod on the same node Preserves client IP Risks imbalance kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443
  135. GCP Project VM1 50% Opt-in to the imbalance problem Net

    LB VM2 VM3 25% 25% iptables iptables In practice Kubernetes spreads pods across nodes If pods >> nodes: OK If nodes >> pods: OK If pods ~= nodes: risk pod1 pod2 pod3 50% 50%
  136. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 Not considered Health-check fails if no backends pod1 pod2 pod3
  137. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: client dst: LB pod1 pod2 pod3
  138. GCP Project VM1 VM1 VM1 Life of a packet: external-to-service

    Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  139. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM1 VM3 src: client dst: LB pod1 pod2 pod3
  140. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  141. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  142. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB dst: pod2 DNAT pod1 pod2 pod3
  143. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  144. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 src: LB dst: client pod1 pod2 pod3
  145. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: LB dst: client pod1 pod2 pod3
  146. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  147. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  148. L7: Ingress

  149. Service Change the type of your service Allocates and forwards

    a port on every VM to the service port Exactly the same data path as the LB case kind: Service apiVersion: v1 metadata: name: store-be spec: type: NodePort selector: app: store role: be ports: - name: https port: 443
  150. Ingress A different API resource Maps HTTP to services Implemented

    by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be
  151. Ingress A different API resource Maps HTTP to services Implemented

    by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be
  152. Ingress The LB info is populated when ready kind: Ingress

    apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be status: loadBalancer: ingress: - ip: 86.73.50.9
  153. GCP Project VM1 Life of a packet: external-to-ingress VM2 VM3

    pod1 pod4 pod5 pod2 pod3
  154. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 pod1 pod4 pod5 pod2 pod3
  155. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 pod1 pod4 pod5 pod2 pod3
  156. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: client dst: LB path: /products pod1 pod4 pod5 pod2 pod3
  157. GCP Project VM1 VM1 VM1 VM1 Life of a packet:

    external-to-ingress GCLB VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3
  158. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3
  159. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3
  160. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3
  161. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3
  162. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3
  163. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB src: VM3 dst: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  164. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  165. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  166. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  167. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  168. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  169. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  170. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  171. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  172. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 src: VM3 dst: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  173. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  174. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  175. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: VM3 dst: client pod1 pod4 pod5 pod2 pod3
  176. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: LB dst: client pod1 pod4 pod5 pod2 pod3
  177. OnlyLocal The same annotation as before Configured per-service iptables will

    always choose a pod on the same node Risks imbalance Removes 2nd hop kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: NodePort selector: app: store role: be ports: - name: https port: 443
  178. But wait, there’s more! Things we didn’t really cover: •

    Pod liveness probes • Graceful termination • Cloud health checks • Firewalls • Headless services • IPAM • SSL • ...
  179. Google Container Engine is a moving target The efforts of

    Open Source developers and Google Engineers continue to improve and simplify the system Google NEXT ‘18 will have more “ins” and “outs” for network traffic Watch this space
  180. https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io Twitter: @kubernetesio

  181. Thank you