$30 off During Our Annual Pro Sale. View Details »

We've Made Quite a Mesh

Tim Hockin
November 16, 2019

We've Made Quite a Mesh

Kubernetes has evolved many service-mesh-like properties.

Tim Hockin

November 16, 2019
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform
    We’ve Made Quite A Mesh
    Rejekts 2019
    Nov, 2019
    Tim Hockin
    @thockin
    (c) Google LLC

    View Slide

  2. Google Cloud Platform

    View Slide

  3. Google Cloud Platform
    What is a mesh?

    View Slide

  4. Google Cloud Platform
    Service virtualization
    • VIPs, names
    What is a mesh?

    View Slide

  5. Google Cloud Platform
    Service virtualization
    • VIPs, names
    What is a mesh?
    Endpoint management
    • Constituency, health-checking

    View Slide

  6. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    What is a mesh?
    Endpoint management
    • Constituency, health-checking

    View Slide

  7. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    What is a mesh?
    Endpoint management
    • Constituency, health-checking
    Traffic management
    • Routing, splitting, fault injection

    View Slide

  8. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    Identity
    • Who is sending/receiving
    What is a mesh?
    Endpoint management
    • Constituency, health-checking
    Traffic management
    • Routing, splitting, fault injection

    View Slide

  9. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    Identity
    • Who is sending/receiving
    What is a mesh?
    Endpoint management
    • Constituency, health-checking
    Traffic management
    • Routing, splitting, fault injection
    Access control
    • Who may send/receive

    View Slide

  10. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    Identity
    • Who is sending/receiving
    Telemetry
    • What happened
    What is a mesh?
    Endpoint management
    • Constituency, health-checking
    Traffic management
    • Routing, splitting, fault injection
    Access control
    • Who may send/receive

    View Slide

  11. Google Cloud Platform
    Service virtualization
    • VIPs, names
    Client-side load-balancing
    • Distinct from the app itself
    Identity
    • Who is sending/receiving
    Telemetry
    • What happened
    What is a mesh?
    Endpoint management
    • Constituency, health-checking
    Traffic management
    • Routing, splitting, fault injection
    Access control
    • Who may send/receive
    Encryption & zero-trust
    • Why do I believe it

    View Slide

  12. Google Cloud Platform
    Many mesh offerings
    Implemented in terms of proxies
    • Sidecars, middle-proxies, gateways
    Work best at L7
    • e.g. HTTP, but not exclusively
    Lots of thinking, experimenting, and
    development happening here!
    The mesh market today
    https://www.flickr.com/photos/foxypar4/2124673642

    View Slide

  13. Google Cloud Platform
    ASSERTION:
    Kubernetes is already a
    service mesh

    View Slide

  14. Google Cloud Platform
    ASSERTION:
    Kubernetes is already a
    primitive service mesh

    View Slide

  15. Google Cloud Platform
    Service was one of the very first APIs
    in Kubernetes
    Generally described by VIPs
    Service type=LoadBalancer often
    has an “internal” option
    Plugs into DNS automatically
    Service virtualization

    View Slide

  16. Google Cloud Platform
    Service select Pods (by labels) into
    dynamic sets of Endpoints
    Automatically managed by Kubernetes
    Unhealthy Pods are automatically
    removed from endpoint sets
    EndpointSlice is a new, more flexible
    API
    Endpoint management

    View Slide

  17. Google Cloud Platform
    “Kube-proxy isn’t really a proxy”
    -- me, frequently
    Well, ACTUALLY...
    The Node’s kernel is the proxy
    • The root netns considers every packet
    • Policy is applied (iptables, ipvs, eBPF, etc)
    • Routing decisions are made
    • Connections are tracked
    Client-side load-balancing

    View Slide

  18. Google Cloud Platform
    Services allow a little bit of this at L4
    • Traffic split by labels & selectors
    Very coarse - barely useful, really
    Ingress is slightly more capable
    • Hostnames and paths
    • Non-portable annotations
    • Only targets Kubernetes Services
    • More extensible for v1 GA
    • Not really what Ingress was designed for
    Traffic management

    View Slide

  19. Google Cloud Platform
    ServiceAccounts give identity to Pods
    Central root of trust - cluster CA
    Does not automatically use identity
    on-the-wire
    • Some implementations offer this
    Does not associate identity with
    NetworkPolicy
    Identity

    View Slide

  20. Google Cloud Platform
    NetworkPolicy defines which client
    Pods can talk to which server Pods
    Describes a DAG of “allow”
    Described in terms of selectors
    • Not deployments
    Does not consider identity by default
    • Some implementations off this
    Access control

    View Slide

  21. Google Cloud Platform
    AKA “observability”
    Kubernetes has very little here by
    default
    Several 3rd party implementations
    One of the most frequently asked
    questions, IMO
    Telemetry

    View Slide

  22. Google Cloud Platform
    Kubernetes has very little here by
    default
    No encryption on-the-wire by default
    • Apps must do their own TLS
    • Can use Kubernetes API server for some, but
    not automatic
    Several 3rd party implementations
    Encryption & zero-trust

    View Slide

  23. Google Cloud Platform
    Keeping score
    Property Kubernetes
    Service virtualization
    Endpoint management
    Client-side load-balancing
    Traffic management
    Identity
    Access control
    Telemetry
    Encryption & zero-trust

    View Slide

  24. Google Cloud Platform
    DON’T PANIC

    View Slide

  25. Google Cloud Platform
    Let’s not be shy about it - we have a cluster mesh!
    This does not invalidate “real” meshes
    If anything it validates them
    It proves that mesh is a good idea
    Lots of R&D on mesh - we should pay attention
    What next?

    View Slide

  26. Google Cloud Platform
    How far do we want to go?
    How general-purpose do we
    want to be?

    View Slide

  27. Google Cloud Platform
    Moving from Kubernetes to a “real” service mesh should be
    easy and incremental
    Kubernetes should be general-purpose
    enough to handle most workloads
    We should start to break down the walls
    around clusters: there are other things we
    want to integrate with
    Tim’s opinions: 1/2

    View Slide

  28. Google Cloud Platform
    We should not build or bundle a full-featured, opinionated, L7
    service mesh at this time
    We should embrace our inner mesh and
    learn from the experiences of others
    We should lean on the extensibility and
    pluggability that has made Kubernetes
    successful
    Tim’s opinions: 2/2

    View Slide

  29. Google Cloud Platform
    Concretely...

    View Slide

  30. Google Cloud Platform
    Allow “external” workloads in Services and Ingress
    • Details are fuzzy, demand is not
    Services kind of allow this already
    • A bit clunky
    EndpointSlice is better - is that good
    enough?
    Out-of-core CRDs+controllers ==> EndpointSlice
    Non-Pod endpoints

    View Slide

  31. Google Cloud Platform
    More capable & expressive L7 processing
    • Better API and data model
    Maybe better L4 at the same time
    • A grand unified model
    • Incremental API
    Focus on the “right” extension points
    Requires plug-in implementations
    KEP coming soon!
    Replace Ingress

    View Slide

  32. Google Cloud Platform
    Gets too much first-class treatment today
    People assume it is required
    • It’s not - it’s just one implementation of the
    Service API
    We should encourage alternate
    implementations
    • No more new proxy modes
    Looking for creative new options
    Treat kube-proxy as a plugin

    View Slide

  33. Google Cloud Platform
    More opaque than Pod labels (good)
    • Especially when crossing Namespaces
    More general purpose
    • Remember “external” workloads?
    Pods can join and leave a Service, but
    are born into ServiceAccounts
    BUT: Identity is not carried on the wire
    • Well, it is in some implementations!
    Use ServiceAccount in NetworkPolicy

    View Slide

  34. Google Cloud Platform
    Find ways to extend beyond the walls
    • There is a world beyond our borders
    Cross-cluster identity
    Cross-cluster Services
    • EndpointSlice enables some of this?
    Cross-cluster NetworkPolicy
    • See discussion of identity
    Multi-cluster

    View Slide

  35. Google Cloud Platform
    Kubernetes health checks (readiness
    probes) are weak
    • Not required
    • Don’t have to be on network (exec)
    • Can be TCP or HTTP
    Hard to build or integrate higher-level
    mesh control on that
    Logically belongs with Service
    • Hard to retrofit, but probably worthwhile
    Better health checks

    View Slide

  36. Google Cloud Platform
    Kubernetes is a mesh
    But not a very good one -- we
    can do better

    View Slide

  37. Google Cloud Platform

    View Slide