$30 off During Our Annual Pro Sale. View Details »

We've Made Quite a Mesh

Tim Hockin
November 16, 2019

We've Made Quite a Mesh

Kubernetes has evolved many service-mesh-like properties.

Tim Hockin

November 16, 2019
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform Service virtualization • VIPs, names What is

    a mesh? Endpoint management • Constituency, health-checking
  2. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking
  3. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection
  4. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection
  5. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive
  6. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive
  7. Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing

    • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive Encryption & zero-trust • Why do I believe it
  8. Google Cloud Platform Many mesh offerings Implemented in terms of

    proxies • Sidecars, middle-proxies, gateways Work best at L7 • e.g. HTTP, but not exclusively Lots of thinking, experimenting, and development happening here! The mesh market today https://www.flickr.com/photos/foxypar4/2124673642
  9. Google Cloud Platform Service was one of the very first

    APIs in Kubernetes Generally described by VIPs Service type=LoadBalancer often has an “internal” option Plugs into DNS automatically Service virtualization
  10. Google Cloud Platform Service select Pods (by labels) into dynamic

    sets of Endpoints Automatically managed by Kubernetes Unhealthy Pods are automatically removed from endpoint sets EndpointSlice is a new, more flexible API Endpoint management
  11. Google Cloud Platform “Kube-proxy isn’t really a proxy” -- me,

    frequently Well, ACTUALLY... The Node’s kernel is the proxy • The root netns considers every packet • Policy is applied (iptables, ipvs, eBPF, etc) • Routing decisions are made • Connections are tracked Client-side load-balancing
  12. Google Cloud Platform Services allow a little bit of this

    at L4 • Traffic split by labels & selectors Very coarse - barely useful, really Ingress is slightly more capable • Hostnames and paths • Non-portable annotations • Only targets Kubernetes Services • More extensible for v1 GA • Not really what Ingress was designed for Traffic management
  13. Google Cloud Platform ServiceAccounts give identity to Pods Central root

    of trust - cluster CA Does not automatically use identity on-the-wire • Some implementations offer this Does not associate identity with NetworkPolicy Identity
  14. Google Cloud Platform NetworkPolicy defines which client Pods can talk

    to which server Pods Describes a DAG of “allow” Described in terms of selectors • Not deployments Does not consider identity by default • Some implementations off this Access control
  15. Google Cloud Platform AKA “observability” Kubernetes has very little here

    by default Several 3rd party implementations One of the most frequently asked questions, IMO Telemetry
  16. Google Cloud Platform Kubernetes has very little here by default

    No encryption on-the-wire by default • Apps must do their own TLS • Can use Kubernetes API server for some, but not automatic Several 3rd party implementations Encryption & zero-trust
  17. Google Cloud Platform Keeping score Property Kubernetes Service virtualization Endpoint

    management Client-side load-balancing Traffic management Identity Access control Telemetry Encryption & zero-trust
  18. Google Cloud Platform Let’s not be shy about it -

    we have a cluster mesh! This does not invalidate “real” meshes If anything it validates them It proves that mesh is a good idea Lots of R&D on mesh - we should pay attention What next?
  19. Google Cloud Platform How far do we want to go?

    How general-purpose do we want to be?
  20. Google Cloud Platform Moving from Kubernetes to a “real” service

    mesh should be easy and incremental Kubernetes should be general-purpose enough to handle most workloads We should start to break down the walls around clusters: there are other things we want to integrate with Tim’s opinions: 1/2
  21. Google Cloud Platform We should not build or bundle a

    full-featured, opinionated, L7 service mesh at this time We should embrace our inner mesh and learn from the experiences of others We should lean on the extensibility and pluggability that has made Kubernetes successful Tim’s opinions: 2/2
  22. Google Cloud Platform Allow “external” workloads in Services and Ingress

    • Details are fuzzy, demand is not Services kind of allow this already • A bit clunky EndpointSlice is better - is that good enough? Out-of-core CRDs+controllers ==> EndpointSlice Non-Pod endpoints
  23. Google Cloud Platform More capable & expressive L7 processing •

    Better API and data model Maybe better L4 at the same time • A grand unified model • Incremental API Focus on the “right” extension points Requires plug-in implementations KEP coming soon! Replace Ingress
  24. Google Cloud Platform Gets too much first-class treatment today People

    assume it is required • It’s not - it’s just one implementation of the Service API We should encourage alternate implementations • No more new proxy modes Looking for creative new options Treat kube-proxy as a plugin
  25. Google Cloud Platform More opaque than Pod labels (good) •

    Especially when crossing Namespaces More general purpose • Remember “external” workloads? Pods can join and leave a Service, but are born into ServiceAccounts BUT: Identity is not carried on the wire • Well, it is in some implementations! Use ServiceAccount in NetworkPolicy
  26. Google Cloud Platform Find ways to extend beyond the walls

    • There is a world beyond our borders Cross-cluster identity Cross-cluster Services • EndpointSlice enables some of this? Cross-cluster NetworkPolicy • See discussion of identity Multi-cluster
  27. Google Cloud Platform Kubernetes health checks (readiness probes) are weak

    • Not required • Don’t have to be on network (exec) • Can be TCP or HTTP Hard to build or integrate higher-level mesh control on that Logically belongs with Service • Hard to retrofit, but probably worthwhile Better health checks
  28. Google Cloud Platform Kubernetes is a mesh But not a

    very good one -- we can do better