Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Service is the worst API in Kubernetes, and what we can do about it

Tim Hockin
November 06, 2023

Why Service is the worst API in Kubernetes, and what we can do about it

This was my KubeCon NA 2023 (Chicago) lightning talk.

Tim Hockin

November 06, 2023
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform
    Why Service is the worst API in
    Kubernetes, and what we can do
    about it
    KubeCon, Chicago
    Nov 6, 2023
    Tim Hockin
    @thockin

    View full-size slide

  2. Google Cloud Platform
    “Service” is one of the oldest APIs in Kubernetes

    View full-size slide

  3. Google Cloud Platform
    “Service” is one of the oldest APIs in Kubernetes
    $ git blame --ignore-rev bd7643c03339 pkg/apis/core/types.go | grep "type Service struct"
    ^2c4b3a562ce pkg/api/types.go (Joe Beda 2014-06-06 16:40:48 -0700 4358) type Service struct {

    View full-size slide

  4. Google Cloud Platform
    “Service” is also one of the most widely used APIs in
    Kubernetes

    View full-size slide

  5. Google Cloud Platform
    In-cluster virtual services

    View full-size slide

  6. Google Cloud Platform
    In-cluster virtual services
    IP allocation

    View full-size slide

  7. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers

    View full-size slide

  8. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports

    View full-size slide

  9. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV

    View full-size slide

  10. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases

    View full-size slide

  11. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks

    View full-size slide

  12. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks
    Routing policy

    View full-size slide

  13. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management

    View full-size slide

  14. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management
    Manual endpoints

    View full-size slide

  15. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management
    Manual endpoints
    Session affinity

    View full-size slide

  16. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS names & SRV
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management
    Manual endpoints
    Session affinity
    Node implementation
    hints

    View full-size slide

  17. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS & SRV names
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management
    Manual endpoints
    Session affinity
    Node implementation
    hints
    Port mappings

    View full-size slide

  18. Google Cloud Platform
    In-cluster virtual services
    IP allocation
    Out-of-cluster
    loadbalancers Node-ports
    DNS & SRV names
    Name aliases
    LB health-checks
    Routing policy
    Automatic endpoint
    management
    Manual endpoints
    Session affinity
    Node implementation
    hints
    Port mappings
    Simple firewall

    View full-size slide

  19. Google Cloud Platform
    “Service” was designed to be simple!
    LoadBalancer
    NodePort
    ClusterIP

    View full-size slide

  20. Google Cloud Platform
    As Kubernetes expanded and matured, we
    accumulated functionality:
    • Internal and external traffic policies
    • Dual-stack support
    • Topology awareness
    • LB Options
    • Don’t allocate NodePorts for LBs
    • Don’t allocate HCNPs

    View full-size slide

  21. Google Cloud Platform
    The API that we laid out almost 10 years ago is
    starting to limit how we can evolve
    But, we have a strong commitment to compat,
    which includes under-specified semantics!

    View full-size slide

  22. Google Cloud Platform
    Is session affinity per-service or per-port?

    View full-size slide

  23. Google Cloud Platform
    Are implementations required to consider
    port-protocol or just port number when routing?

    View full-size slide

  24. Google Cloud Platform
    Is a Service immutable, or can it be updated?

    View full-size slide

  25. Google Cloud Platform
    Service API does too many things for too many
    use-cases
    Service API is different from other APIs in too many
    subtle ways
    ● Example: synchronous IP and node-port allocation

    View full-size slide

  26. Google Cloud Platform
    Result: A complex API to use and maintain
    ● Lots of inter-related fields
    ● Hard to validate and test
    ● Hard to document

    View full-size slide

  27. Google Cloud Platform
    Result: Hard to extend
    ● “All ports” is basically impossible
    ● Port naming across protocols is clunky
    ● Implementations need more and more knobs
    ● Adding different types of LBs is challenging

    View full-size slide

  28. Google Cloud Platform
    So...what are we going to do about it?

    View full-size slide

  29. Google Cloud Platform
    Gateway API

    View full-size slide

  30. Google Cloud Platform
    Application
    Operator
    Infrastructure
    Provider
    Cluster
    Operator

    View full-size slide

  31. Google Cloud Platform
    Gateway
    (front door)
    Application
    Operator
    Infrastructure
    Provider
    Cluster
    Operator

    View full-size slide

  32. Google Cloud Platform
    Gateway
    (front door)
    Gateway
    Class
    (which impl)
    Application
    Operator
    Infrastructure
    Provider
    Cluster
    Operator

    View full-size slide

  33. Google Cloud Platform
    Gateway
    (front door)
    Gateway
    Class
    (which impl)
    Application
    Operator
    Infrastructure
    Provider
    Cluster
    Operator
    *Route
    *Route
    *Route

    View full-size slide

  34. Google Cloud Platform
    Gateway
    (front door)
    Gateway
    Class
    (which impl)
    Application
    Operator
    Infrastructure
    Provider
    Cluster
    Operator
    *Route
    *Route
    *Route
    *Route
    *Route
    Service

    View full-size slide

  35. Google Cloud Platform
    Gateway
    class=ClusterIP

    View full-size slide

  36. Google Cloud Platform
    Gateway
    class=LoadBalancer

    View full-size slide

  37. Google Cloud Platform
    Gateway
    class=LoadBalancer
    Gateway
    class=ClusterIP

    View full-size slide

  38. Google Cloud Platform
    Legacy model, evolved
    GW LoadBalancer
    class=internal
    Pod
    Selector
    Cluster
    IP
    Service
    Name
    GW ClusterIP
    Cluster
    IP
    Service
    Name
    GW LoadBalancer
    class=external

    View full-size slide

  39. Google Cloud Platform
    This is not a commitment!
    ● Several of these pieces are already in progress
    ● Some are barely sketched out
    ● Gateway API is hitting 1.0 imminently
    ○ That doesn’t include ClusterIP support, yet
    I am seeking feedback on the idea!

    View full-size slide