Slide 1

Slide 1 text

Google Cloud Platform Why Service is the worst API in Kubernetes, and what we can do about it KubeCon, Chicago Nov 6, 2023 Tim Hockin @thockin

Slide 2

Slide 2 text

Google Cloud Platform “Service” is one of the oldest APIs in Kubernetes

Slide 3

Slide 3 text

Google Cloud Platform “Service” is one of the oldest APIs in Kubernetes $ git blame --ignore-rev bd7643c03339 pkg/apis/core/types.go | grep "type Service struct" ^2c4b3a562ce pkg/api/types.go (Joe Beda 2014-06-06 16:40:48 -0700 4358) type Service struct {

Slide 4

Slide 4 text

Google Cloud Platform “Service” is also one of the most widely used APIs in Kubernetes

Slide 5

Slide 5 text

Google Cloud Platform In-cluster virtual services

Slide 6

Slide 6 text

Google Cloud Platform In-cluster virtual services IP allocation

Slide 7

Slide 7 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

Slide 8

Slide 8 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports

Slide 9

Slide 9 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV

Slide 10

Slide 10 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases

Slide 11

Slide 11 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks

Slide 12

Slide 12 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks Routing policy

Slide 13

Slide 13 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management

Slide 14

Slide 14 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints

Slide 15

Slide 15 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity

Slide 16

Slide 16 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints

Slide 17

Slide 17 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS & SRV names Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints Port mappings

Slide 18

Slide 18 text

Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers Node-ports DNS & SRV names Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints Port mappings Simple firewall

Slide 19

Slide 19 text

Google Cloud Platform “Service” was designed to be simple! LoadBalancer NodePort ClusterIP

Slide 20

Slide 20 text

Google Cloud Platform As Kubernetes expanded and matured, we accumulated functionality: • Internal and external traffic policies • Dual-stack support • Topology awareness • LB Options • Don’t allocate NodePorts for LBs • Don’t allocate HCNPs

Slide 21

Slide 21 text

Google Cloud Platform The API that we laid out almost 10 years ago is starting to limit how we can evolve But, we have a strong commitment to compat, which includes under-specified semantics!

Slide 22

Slide 22 text

Google Cloud Platform Is session affinity per-service or per-port?

Slide 23

Slide 23 text

Google Cloud Platform Are implementations required to consider port-protocol or just port number when routing?

Slide 24

Slide 24 text

Google Cloud Platform Is a Service immutable, or can it be updated?

Slide 25

Slide 25 text

Google Cloud Platform Service API does too many things for too many use-cases Service API is different from other APIs in too many subtle ways ● Example: synchronous IP and node-port allocation

Slide 26

Slide 26 text

Google Cloud Platform Result: A complex API to use and maintain ● Lots of inter-related fields ● Hard to validate and test ● Hard to document

Slide 27

Slide 27 text

Google Cloud Platform Result: Hard to extend ● “All ports” is basically impossible ● Port naming across protocols is clunky ● Implementations need more and more knobs ● Adding different types of LBs is challenging

Slide 28

Slide 28 text

Google Cloud Platform So...what are we going to do about it?

Slide 29

Slide 29 text

Google Cloud Platform Gateway API

Slide 30

Slide 30 text

Google Cloud Platform Application Operator Infrastructure Provider Cluster Operator

Slide 31

Slide 31 text

Google Cloud Platform Gateway (front door) Application Operator Infrastructure Provider Cluster Operator

Slide 32

Slide 32 text

Google Cloud Platform Gateway (front door) Gateway Class (which impl) Application Operator Infrastructure Provider Cluster Operator

Slide 33

Slide 33 text

Google Cloud Platform Gateway (front door) Gateway Class (which impl) Application Operator Infrastructure Provider Cluster Operator *Route *Route *Route

Slide 34

Slide 34 text

Google Cloud Platform Gateway (front door) Gateway Class (which impl) Application Operator Infrastructure Provider Cluster Operator *Route *Route *Route *Route *Route Service

Slide 35

Slide 35 text

Google Cloud Platform Gateway class=ClusterIP

Slide 36

Slide 36 text

Google Cloud Platform Gateway class=LoadBalancer

Slide 37

Slide 37 text

Google Cloud Platform Gateway class=LoadBalancer Gateway class=ClusterIP

Slide 38

Slide 38 text

Google Cloud Platform Legacy model, evolved GW LoadBalancer class=internal Pod Selector Cluster IP Service Name GW ClusterIP Cluster IP Service Name GW LoadBalancer class=external

Slide 39

Slide 39 text

Google Cloud Platform This is not a commitment! ● Several of these pieces are already in progress ● Some are barely sketched out ● Gateway API is hitting 1.0 imminently ○ That doesn’t include ClusterIP support, yet I am seeking feedback on the idea!