Slide 1

Slide 1 text

Google Cloud Platform We’ve Made Quite A Mesh Rejekts 2019 Nov, 2019 Tim Hockin @thockin (c) Google LLC

Slide 2

Slide 2 text

Google Cloud Platform

Slide 3

Slide 3 text

Google Cloud Platform What is a mesh?

Slide 4

Slide 4 text

Google Cloud Platform Service virtualization • VIPs, names What is a mesh?

Slide 5

Slide 5 text

Google Cloud Platform Service virtualization • VIPs, names What is a mesh? Endpoint management • Constituency, health-checking

Slide 6

Slide 6 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking

Slide 7

Slide 7 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection

Slide 8

Slide 8 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection

Slide 9

Slide 9 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive

Slide 10

Slide 10 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive

Slide 11

Slide 11 text

Google Cloud Platform Service virtualization • VIPs, names Client-side load-balancing • Distinct from the app itself Identity • Who is sending/receiving Telemetry • What happened What is a mesh? Endpoint management • Constituency, health-checking Traffic management • Routing, splitting, fault injection Access control • Who may send/receive Encryption & zero-trust • Why do I believe it

Slide 12

Slide 12 text

Google Cloud Platform Many mesh offerings Implemented in terms of proxies • Sidecars, middle-proxies, gateways Work best at L7 • e.g. HTTP, but not exclusively Lots of thinking, experimenting, and development happening here! The mesh market today https://www.flickr.com/photos/foxypar4/2124673642

Slide 13

Slide 13 text

Google Cloud Platform ASSERTION: Kubernetes is already a service mesh

Slide 14

Slide 14 text

Google Cloud Platform ASSERTION: Kubernetes is already a primitive service mesh

Slide 15

Slide 15 text

Google Cloud Platform Service was one of the very first APIs in Kubernetes Generally described by VIPs Service type=LoadBalancer often has an “internal” option Plugs into DNS automatically Service virtualization

Slide 16

Slide 16 text

Google Cloud Platform Service select Pods (by labels) into dynamic sets of Endpoints Automatically managed by Kubernetes Unhealthy Pods are automatically removed from endpoint sets EndpointSlice is a new, more flexible API Endpoint management

Slide 17

Slide 17 text

Google Cloud Platform “Kube-proxy isn’t really a proxy” -- me, frequently Well, ACTUALLY... The Node’s kernel is the proxy • The root netns considers every packet • Policy is applied (iptables, ipvs, eBPF, etc) • Routing decisions are made • Connections are tracked Client-side load-balancing

Slide 18

Slide 18 text

Google Cloud Platform Services allow a little bit of this at L4 • Traffic split by labels & selectors Very coarse - barely useful, really Ingress is slightly more capable • Hostnames and paths • Non-portable annotations • Only targets Kubernetes Services • More extensible for v1 GA • Not really what Ingress was designed for Traffic management

Slide 19

Slide 19 text

Google Cloud Platform ServiceAccounts give identity to Pods Central root of trust - cluster CA Does not automatically use identity on-the-wire • Some implementations offer this Does not associate identity with NetworkPolicy Identity

Slide 20

Slide 20 text

Google Cloud Platform NetworkPolicy defines which client Pods can talk to which server Pods Describes a DAG of “allow” Described in terms of selectors • Not deployments Does not consider identity by default • Some implementations off this Access control

Slide 21

Slide 21 text

Google Cloud Platform AKA “observability” Kubernetes has very little here by default Several 3rd party implementations One of the most frequently asked questions, IMO Telemetry

Slide 22

Slide 22 text

Google Cloud Platform Kubernetes has very little here by default No encryption on-the-wire by default • Apps must do their own TLS • Can use Kubernetes API server for some, but not automatic Several 3rd party implementations Encryption & zero-trust

Slide 23

Slide 23 text

Google Cloud Platform Keeping score Property Kubernetes Service virtualization Endpoint management Client-side load-balancing Traffic management Identity Access control Telemetry Encryption & zero-trust

Slide 24

Slide 24 text

Google Cloud Platform DON’T PANIC

Slide 25

Slide 25 text

Google Cloud Platform Let’s not be shy about it - we have a cluster mesh! This does not invalidate “real” meshes If anything it validates them It proves that mesh is a good idea Lots of R&D on mesh - we should pay attention What next?

Slide 26

Slide 26 text

Google Cloud Platform How far do we want to go? How general-purpose do we want to be?

Slide 27

Slide 27 text

Google Cloud Platform Moving from Kubernetes to a “real” service mesh should be easy and incremental Kubernetes should be general-purpose enough to handle most workloads We should start to break down the walls around clusters: there are other things we want to integrate with Tim’s opinions: 1/2

Slide 28

Slide 28 text

Google Cloud Platform We should not build or bundle a full-featured, opinionated, L7 service mesh at this time We should embrace our inner mesh and learn from the experiences of others We should lean on the extensibility and pluggability that has made Kubernetes successful Tim’s opinions: 2/2

Slide 29

Slide 29 text

Google Cloud Platform Concretely...

Slide 30

Slide 30 text

Google Cloud Platform Allow “external” workloads in Services and Ingress • Details are fuzzy, demand is not Services kind of allow this already • A bit clunky EndpointSlice is better - is that good enough? Out-of-core CRDs+controllers ==> EndpointSlice Non-Pod endpoints

Slide 31

Slide 31 text

Google Cloud Platform More capable & expressive L7 processing • Better API and data model Maybe better L4 at the same time • A grand unified model • Incremental API Focus on the “right” extension points Requires plug-in implementations KEP coming soon! Replace Ingress

Slide 32

Slide 32 text

Google Cloud Platform Gets too much first-class treatment today People assume it is required • It’s not - it’s just one implementation of the Service API We should encourage alternate implementations • No more new proxy modes Looking for creative new options Treat kube-proxy as a plugin

Slide 33

Slide 33 text

Google Cloud Platform More opaque than Pod labels (good) • Especially when crossing Namespaces More general purpose • Remember “external” workloads? Pods can join and leave a Service, but are born into ServiceAccounts BUT: Identity is not carried on the wire • Well, it is in some implementations! Use ServiceAccount in NetworkPolicy

Slide 34

Slide 34 text

Google Cloud Platform Find ways to extend beyond the walls • There is a world beyond our borders Cross-cluster identity Cross-cluster Services • EndpointSlice enables some of this? Cross-cluster NetworkPolicy • See discussion of identity Multi-cluster

Slide 35

Slide 35 text

Google Cloud Platform Kubernetes health checks (readiness probes) are weak • Not required • Don’t have to be on network (exec) • Can be TCP or HTTP Hard to build or integrate higher-level mesh control on that Logically belongs with Service • Hard to retrofit, but probably worthwhile Better health checks

Slide 36

Slide 36 text

Google Cloud Platform Kubernetes is a mesh But not a very good one -- we can do better

Slide 37

Slide 37 text

Google Cloud Platform