Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Service is the worst API in Kubernetes, and what we can do about it

Tim Hockin
November 06, 2023

Why Service is the worst API in Kubernetes, and what we can do about it

This was my KubeCon NA 2023 (Chicago) lightning talk.

Tim Hockin

November 06, 2023
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform Why Service is the worst API in

    Kubernetes, and what we can do about it KubeCon, Chicago Nov 6, 2023 Tim Hockin <[email protected]> @thockin
  2. Google Cloud Platform “Service” is one of the oldest APIs

    in Kubernetes $ git blame --ignore-rev bd7643c03339 pkg/apis/core/types.go | grep "type Service struct" ^2c4b3a562ce pkg/api/types.go (Joe Beda 2014-06-06 16:40:48 -0700 4358) type Service struct {
  3. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS names & SRV Name aliases LB health-checks Routing policy
  4. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management
  5. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints
  6. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity
  7. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS names & SRV Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints
  8. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS & SRV names Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints Port mappings
  9. Google Cloud Platform In-cluster virtual services IP allocation Out-of-cluster loadbalancers

    Node-ports DNS & SRV names Name aliases LB health-checks Routing policy Automatic endpoint management Manual endpoints Session affinity Node implementation hints Port mappings Simple firewall
  10. Google Cloud Platform As Kubernetes expanded and matured, we accumulated

    functionality: • Internal and external traffic policies • Dual-stack support • Topology awareness • LB Options • Don’t allocate NodePorts for LBs • Don’t allocate HCNPs
  11. Google Cloud Platform The API that we laid out almost

    10 years ago is starting to limit how we can evolve But, we have a strong commitment to compat, which includes under-specified semantics!
  12. Google Cloud Platform Service API does too many things for

    too many use-cases Service API is different from other APIs in too many subtle ways • Example: synchronous IP and node-port allocation
  13. Google Cloud Platform Result: A complex API to use and

    maintain • Lots of inter-related fields • Hard to validate and test • Hard to document
  14. Google Cloud Platform Result: Hard to extend • “All ports”

    is basically impossible • Port naming across protocols is clunky • Implementations need more and more knobs • Adding different types of LBs is challenging
  15. Google Cloud Platform Gateway (front door) Gateway Class (which impl)

    Application Operator Infrastructure Provider Cluster Operator
  16. Google Cloud Platform Gateway (front door) Gateway Class (which impl)

    Application Operator Infrastructure Provider Cluster Operator *Route *Route *Route
  17. Google Cloud Platform Gateway (front door) Gateway Class (which impl)

    Application Operator Infrastructure Provider Cluster Operator *Route *Route *Route *Route *Route Service
  18. Google Cloud Platform Legacy model, evolved GW LoadBalancer class=internal Pod

    Selector Cluster IP Service Name GW ClusterIP Cluster IP Service Name GW LoadBalancer class=external
  19. Google Cloud Platform This is not a commitment! • Several

    of these pieces are already in progress • Some are barely sketched out • Gateway API is hitting 1.0 imminently ◦ That doesn’t include ClusterIP support, yet I am seeking feedback on the idea!