Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking
Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection
Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection
Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive
Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive
Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive Encryption & zero-trust • Why do I believe it
Google Cloud Platform Many mesh offerings Implemented in terms of proxies • Sidecars, middle-proxies, gateways Work best at L7 • e.g. HTTP, but not exclusively Lots of thinking, experimenting, and development happening here! The mesh market today https://www.flickr.com/photos/foxypar4/2124673642
Google Cloud Platform Service was one of the very first APIs in Kubernetes Generally described by VIPs Service type=LoadBalancer often has an “internal” option Plugs into DNS automatically Service virtualization
Google Cloud Platform Service select Pods (by labels) into dynamic sets of Endpoints Automatically managed by Kubernetes Unhealthy Pods are automatically removed from endpoint sets EndpointSlice is a new, more flexible API Endpoint management
Google Cloud Platform “Kube-proxy isn’t really a proxy” -- me, frequently Well, ACTUALLY... The Node’s kernel is the proxy • The root netns considers every packet • Policy is applied (iptables, ipvs, eBPF, etc) • Routing decisions are made • Connections are tracked Client-side load-balancing
Google Cloud Platform Services allow a little bit of this at L4 • Traffic split by labels & selectors Very coarse - barely useful, really Ingress is slightly more capable • Hostnames and paths • Non-portable annotations • Only targets Kubernetes Services • More extensible for v1 GA • Not really what Ingress was designed for Traffic management
Google Cloud Platform ServiceAccounts give identity to Pods Central root of trust - cluster CA Does not automatically use identity on-the-wire • Some implementations offer this Does not associate identity with NetworkPolicy Identity
Google Cloud Platform NetworkPolicy defines which client Pods can talk to which server Pods Describes a DAG of “allow” Described in terms of selectors • Not deployments Does not consider identity by default • Some implementations off this Access control
Google Cloud Platform AKA “observability” Kubernetes has very little here by default Several 3rd party implementations One of the most frequently asked questions, IMO Telemetry
Google Cloud Platform Kubernetes has very little here by default No encryption on-the-wire by default • Apps must do their own TLS • Can use Kubernetes API server for some, but not automatic Several 3rd party implementations Encryption & zero-trust
Google Cloud Platform Let’s not be shy about it - we have a cluster mesh! This does not invalidate “real” meshes If anything it validates them It proves that mesh is a good idea Lots of R&D on mesh - we should pay attention What next?
Google Cloud Platform Moving from Kubernetes to a “real” service mesh should be easy and incremental Kubernetes should be general-purpose enough to handle most workloads We should start to break down the walls around clusters: there are other things we want to integrate with Tim’s opinions: 1/2
Google Cloud Platform We should not build or bundle a full-featured, opinionated, L7 service mesh at this time We should embrace our inner mesh and learn from the experiences of others We should lean on the extensibility and pluggability that has made Kubernetes successful Tim’s opinions: 2/2
Google Cloud Platform Allow “external” workloads in Services and Ingress • Details are fuzzy, demand is not Services kind of allow this already • A bit clunky EndpointSlice is better - is that good enough? Out-of-core CRDs+controllers ==> EndpointSlice Non-Pod endpoints
Google Cloud Platform More capable & expressive L7 processing • Better API and data model Maybe better L4 at the same time • A grand unified model • Incremental API Focus on the “right” extension points Requires plug-in implementations KEP coming soon! Replace Ingress
Google Cloud Platform Gets too much first-class treatment today People assume it is required • It’s not - it’s just one implementation of the Service API We should encourage alternate implementations • No more new proxy modes Looking for creative new options Treat kube-proxy as a plugin
Google Cloud Platform More opaque than Pod labels (good) • Especially when crossing Namespaces More general purpose • Remember “external” workloads? Pods can join and leave a Service, but are born into ServiceAccounts BUT: Identity is not carried on the wire • Well, it is in some implementations! Use ServiceAccount in NetworkPolicy
Google Cloud Platform Find ways to extend beyond the walls • There is a world beyond our borders Cross-cluster identity Cross-cluster Services • EndpointSlice enables some of this? Cross-cluster NetworkPolicy • See discussion of identity Multi-cluster
Google Cloud Platform Kubernetes health checks (readiness probes) are weak • Not required • Don’t have to be on network (exec) • Can be TCP or HTTP Hard to build or integrate higher-level mesh control on that Logically belongs with Service • Hard to retrofit, but probably worthwhile Better health checks