Bringing Traffic Into Your Kubernetes Cluster

Bringing traffic into your Kubernetes cluster It seems like this
should be easy Tim Hockin @thockin v2

Start with a “normal” cluster

Cluster: 10.0.0.0/16

Cluster: 10.0.0.0/16 Node1: IP: 10.240.0.1 Node2: IP: 10.240.0.2

Cluster: 10.0.0.0/16 Node1: IP: 10.240.0.1 Pod range: 10.0.1.0/24 Node2: IP:
10.240.0.2 Pod range: 10.0.2.0/24

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2

Kubernetes demands that pods can reach each other

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2

Kubernetes says very little about how traﬃc gets INTO the
cluster

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 ?

That client might be from the internet or from elsewhere
on your internal network

Kubernetes offers 4 main APIs to bring traﬃc into your
cluster

1) Pod IP

2) Service NodePort

3) Service LoadBalancer

4) Ingress

Let’s look at these a bit more

1) Pod IP

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: pod:pod-port

Requires a fully integrated network (ﬂat IP space)

Doesn’t work well for internet traﬃc

Requires smart clients and service discovery (pod IPs change when
pods move)

Included for completeness, but not what most people are here
to read about

2) Service NodePort

A port on each node will forward traﬃc to your
service We know which service by which port

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: node1:node-port :30093 :30076

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 :30093 :30076 Src: node1 Dst: pod:pod-port

Hold up, why did the source IP change?

By default, a NodePort can forward to any pod, so
this is possible:

10.240.0.2 Pod range: 10.0.2.0/24 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 :30093 :30076 Src: node1 Dst: pod:pod-port

In that case, the traﬃc MUST return through node1, so
we have to SNAT

Pro: - No external infrastructure needed Con: - Can’t use
arbitrary ports - Clients have to pick a node (nodes can be added and removed over time) - SNAT loses client IP - Two hops

Option: externalTraﬃcPolicy = Local

If you set this on your service, nodes will only
choose “local” pods

Eliminates the need for SNAT

Client must choose nodes which actually have pods, or else:

10.240.0.2 Pod range: 10.0.2.0/24 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 :30093 :30076 Src: node1 Dst: ??? Failure ?

Also risk imbalance if clients assume equal weight on nodes:

10.240.0.2 Pod range: 10.0.2.0/24 pod Client: 1.2.3.4 pod pod pod pod pod pod 50% 50%

Pro: - No external infrastructure needed - Client IP is
available Con: - Can’t use arbitrary ports - Clients have to pick a node with pods - Two hops (but less impactful)

3) Service LoadBalancer

Someone (e.g. cloud provider) allocates a load-balancer for your service

This is an API with very loose requirements

There are a few ways this has been implemented (non-exhaustive)

3a) VIP-like, 2-hops (e.g. GCP NetworkLB)

The node knows which service by which destination IP (VIP)

How VIPs are propagated and managed is a broad topic,
and not considered here

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: VIP:service-port VIP

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: node1 Dst: pod:pod-port VIP

Why did the source IP change, again?

Like a NodePort, a VIP can forward to any pod,
so this is possible:

10.240.0.2 Pod range: 10.0.2.0/24 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: node1 Dst: pod:pod-port VIP

Again, the traﬃc MUST return through node1, so we have
to SNAT

Pro: - Stable VIP - Can use any port you
want Con: - Requires programmable infrastructure - SNAT loses client IP - Two hops

LBs must choose nodes which actually have pods

Pro: - Stable VIP - Can use any port you
want - Client IP is available Con: - Requires programmable infrastructure - Two hops (but less impactful)

3b) VIP-like, 1-hop (no known examples)

As far as I know, nobody has implemented this

3c) Proxy-like, 2-hops (e.g. AWS ElasticLB)

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: proxy:service-port Proxy :30093 :30076

Proxy Cluster: 10.0.0.0/16 Node1: IP: 10.240.0.1 Pod range: 10.0.1.0/24 Node2:
IP: 10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: proxy Dst: node1:node-port :30093 :30076

IP: 10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 :30093 :30076 Src: node1 Dst: pod:pod-port

Again with the SNAT?

Yes, this is basically the same as NodePort, but with
nicer front door

Note that the node which receives the traﬃc has no
idea what the original client IP was

Pro: - Stable IP - Can use any port you
want - Proxy can prevent some classes of attacks - Proxy can add value (e.g. TLS) Con: - Requires programmable infrastructure - Two hops - Loss of client IP (has to move in-band)

LBs must choose nodes which actually have pods

want - Proxy can prevent some classes of attacks - Proxy can add value (e.g. TLS) Con: - Requires programmable infrastructure - Two hops - Loss of client IP (has to move in-band)

3d) Proxy-like, 1-hop (e.g. GCP HTTP LB)

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: proxy:service-port Proxy

IP: 10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: proxy Dst: pod:pod-port

No need for the node to do anything

LB needs to know the pod IPs and be kept
in sync

want - Proxy can prevent some classes of attacks - Proxy can add value (e.g. TLS) - One hop Con: - Requires programmable infrastructure - Loss of client IP (has to move in-band)

4) Ingress (HTTP only)

Someone (e.g. cloud provider) allocates an HTTP load-balancer for your
service

This is an API with very loose requirements

There are a couple ways this has been implemented (non-exhaustive)

4a) External, 2-hops (e.g. GCP without VPC Native)

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: proxy:service-port Proxy :30093 :30076

IP: 10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: proxy Dst: node1:node-port :30093 :30076

Similar to 3c wrt SNAT

HTTP Proxy can save client IP in X-Forwarded-For header

Pro: - Proxy can prevent some classes of attacks -
Proxy can add value (e.g. TLS) - Can offer HTTP semantics (e.g. URL maps) Con: - Requires programmable infrastructure - Two hops

As before: LBs must choose nodes which actually have pods

4b) External, 1-hop (e.g. GCP with VPC Native)

10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: client Dst: proxy:service-port Proxy

Proxy can choose any pod, regardless of node

IP: 10.240.0.2 Pod range: 10.0.2.0/24 Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Src: proxy Dst: pod:pod-port

HTTP Proxy can save client IP in X-Forwarded-For header

Pro: - Proxy can prevent some classes of attacks -
Proxy can add value (e.g. TLS) - Can offer HTTP semantics (e.g. URL maps) - One hop Con: - Requires programmable infrastructure

4c) Internal, shared (e.g. nginx)

Use a service LoadBalancer (see 3a-d) to bring traﬃc into
pods which are HTTP proxies Those in-cluster proxies route to the ﬁnal pods

4c.1) VIP-like

10.240.0.2 Pod range: 10.0.2.0/24 Ingress Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Ingress Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 VIP

Pro: - Cost effective (1 VIP) - Proxy can add
value (e.g. TLS) - Flexible Con: - You manage and scale the in-cluster proxies - Conﬂicts can arise between Ingress resources (e.g. use same hostname) - Multiple hops

4c.2) Proxy-like, 2-hops

10.240.0.2 Pod range: 10.0.2.0/24 Ingress Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Ingress Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Proxy

Pro: - Cost effective (1 proxy IP) - Proxy can
prevent some classes of attacks - Proxies can add value (e.g. TLS) - Flexible - External proxy can be less dynamic (just nodes) Con: - You manage and scale the in-cluster proxies - Conﬂicts can arise between Ingress resources (e.g. use same hostname) - Multiple hops

4c.3) Proxy-like, 1-hop

10.240.0.2 Pod range: 10.0.2.0/24 Ingress Pod-a: 10.0.1.1 Pod-b: 10.0.1.2 Ingress Pod-c: 10.0.2.1 Pod-d: 10.0.2.2 Client: 1.2.3.4 Proxy

Pro: - Cost effective (1 proxy IP) - Proxy can
prevent some classes of attacks - Proxies can add value (e.g. TLS) - Flexible Con: - You manage and scale the in-cluster proxies - Conﬂicts can arise between Ingress resources (e.g. use same hostname) - Multiple hops

4d) Internal, dedicated (e.g. no known examples)

The idea is that you would spin up the equivalent
of 4c for each Ingress instance or maybe per-namespace

As far as I know, nobody has implemented this

Bringing Traffic Into Your Kubernetes Cluster

Bringing Traffic Into Your Kubernetes Cluster

More Decks by Tim Hockin

Other Decks in Technology

Featured

Transcript