Slide 1

Slide 1 text

Multi-Cluster Kubernetes and Service Mesh Patterns Christian Posta Field CTO – Solo.io

Slide 2

Slide 2 text

2 | Copyright © 2020 CHRISTIAN POSTA Global Field CTO, Solo.io @christianposta christian@solo.io https://blog.christianposta.com https://slideshare.net/ceposta

Slide 3

Slide 3 text

3 | Copyright © 2020 Challenges • Improve velocity of teams building and delivering code • Decentralized implementations vs centralized operations • Connect and include existing systems and investments • Improve security posture • Stay within regulations and compliance

Slide 4

Slide 4 text

4 | Copyright © 2020 More, smaller clusters • High availability • Compliance • Isolation / Autonomy • Scale • Data locality, cost • Public/DMZ/Private networks

Slide 5

Slide 5 text

5 | Copyright © 2020 Multiple clusters • Exact replicas of each other, same fleet? • Separate, non-uniform deployments? • Single operational/administrative control • Segmented by network? Segmented by team? • Independent administration?

Slide 6

Slide 6 text

6 | Copyright © 2020 Cluster federation • Autonomous clusters • Different organizational/network/administrative boundaries • Share pieces of configuration • For those shared pieces, treat union as a single unit • Uses an orchestrator to stitch together policies for federation

Slide 7

Slide 7 text

7 | Copyright © 2020 Example: Kubefed Cluster 1 Cluster 2 Cluster 0 Kubefed CP Federated Resources watches Federate to clusters https://github.com/kubernetes-sigs/kubefed

Slide 8

Slide 8 text

8 | Copyright © 2020 Example: Kubefed apiVersion: types.kubefed.io/v1beta1 kind: FederatedService metadata: name: echo-server spec: placement: clusterSelector: matchLabels: {} template: metadata: labels: app: echo-server spec: ports: - name: http port: 8080 selector: app: echo-server

Slide 9

Slide 9 text

9 | Copyright © 2020 9 | Copyright © 2020 Demo Simple Kubernetes federation

Slide 10

Slide 10 text

10 | Copyright © 2020 Services need to communicate with each other

Slide 11

Slide 11 text

11 | Copyright © 2020 Pattern: flat network across pods Account User Products Cluster 1 Cluster 2 History

Slide 12

Slide 12 text

12 | Copyright © 2020 Pattern: Different network, expose all services Account User Products Cluster 1 Cluster 2 History

Slide 13

Slide 13 text

13 | Copyright © 2020 Pattern: Different network, controlled gateway Account User Products Cluster 1 Cluster 2 History

Slide 14

Slide 14 text

14 | Copyright © 2020 Forces to balance • Security (authz/authn/encryption/identity) • Service discovery • Failover / traffic shifting / transparent routing • Observability • Separate networks • Well-defined fault domains • Building for scale

Slide 15

Slide 15 text

15 | Copyright © 2020 Could you build these patterns just using Kubernetes?

Slide 16

Slide 16 text

16 | Copyright © 2020 Service Mesh can help

Slide 17

Slide 17 text

17 | Copyright © 2020 Envoy is the magic behind service mesh http://envoyproxy.io

Slide 18

Slide 18 text

18 | Copyright © 2020 Envoy implements: • zone aware, priority/locality load balancing • circuit breaking, outlier detection • timeouts, retries, retry budgets • traffic shadowing • request racing • rate limiting • RBAC, TLS origination/termination • access logging, statistics collection

Slide 19

Slide 19 text

19 | Copyright © 2020 Envoy to do application networking heavy lifting Account work load work load work load mTLS • Transparent client-side routing decisions • TLS orig/termination • Circuit breaking • Stats collection

Slide 20

Slide 20 text

20 | Copyright © 2020 Envoy as backbone for multi-cluster communication federation Account User Cluster 1 Cluster 2 Products History User

Slide 21

Slide 21 text

21 | Copyright © 2020 Other key Envoy proxying features • Request hedging • Retry Budgets • Load balancing priorities • Locality weighted load balancing • Zone aware routing • Degraded endpoints (fallback) • Aggregated clusters

Slide 22

Slide 22 text

22 | Copyright © 2020 Exploring Envoy failover routing capabilities: Request racing Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Timeout Race request First to return is the response to the caller

Slide 23

Slide 23 text

23 | Copyright © 2020 Exploring Envoy failover routing capabilities: Zone aware routing (Envoy decides) Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Not enough healthy hosts in same zone Spill over to another zone

Slide 24

Slide 24 text

24 | Copyright © 2020 Exploring Envoy failover routing capabilities: Locality aware (Control plane decides) Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Not enough healthy hosts in same zone Spill over to another zone W=1 W=1 W=1 W=5 W=5

Slide 25

Slide 25 text

25 | Copyright © 2020 Exploring Envoy failover routing capabilities: Aggregate Cluster (for routing to gateways) Account work load work load work load Calls http://products.service/ Edge gw us-west-1 us-west-2 EDS Strict DNS

Slide 26

Slide 26 text

26 | Copyright © 2020 26 | Copyright © 2020 Multi-cluster examples Service mesh examples using Envoy Proxy

Slide 27

Slide 27 text

27 | Copyright © 2020 Istio shared control plane, flat network Account User Cluster 1 Cluster 2 Products History User Istiod

Slide 28

Slide 28 text

28 | Copyright © 2020 Thoughts about shared control plane/flat network • Simplest set up for Istio multi-cluster • No special Envoy routing (though may use zone-aware) • Shared control plane increases the failure domain to multiple clusters • Use flat networking if possible (simpler) but may not have/want that option • No special considerations for identity (identity domain is shared) • Still need to federate telemetry collection

Slide 29

Slide 29 text

29 | Copyright © 2020 Account User Cluster 1 Cluster 2 Products History User Istiod Istio shared control plane, separate networks

Slide 30

Slide 30 text

30 | Copyright © 2020 Thoughts about shared control plane/separate network • Uses a gateway to allow communication between networks • Uses Envoy Locality Weighted LB (for the gateway endpoints). Istio calls this “split horizon EDS”. • Shares same failure domain across all clusters • Use the gateways to facilitate communication AND control plane • Slight increase in burden on operator to label networks and gateway endpoints correctly so Istio has that information

Slide 31

Slide 31 text

31 | Copyright © 2020 Account User Cluster 1 Cluster 2 Products History User Istiod Istio separate control planes, separate networks Istiod

Slide 32

Slide 32 text

32 | Copyright © 2020 Thoughts about separate control plane/separate network • Uses a gateway to allow communication between networks • Uses Istio’s ServiceEntry mechanism to enable cross-network discovery • Independent control planes • Separate, independent failure domains • Doesn’t solve where trust domains MUST be separate (with federation at the boundaries) • Increase burden on operator to maintain service discovery, identity federation, and multi-cluster configuration across meshes

Slide 33

Slide 33 text

33 | Copyright © 2020 Account Cluster 1 Cluster 2 User User Istiod Example multi-cluster routing with ServiceEntry Istiod http://users.default.svc.cluster.local http://users.default.cluster-2 ServiceEntry users.default.cluster-2

Slide 34

Slide 34 text

34 | Copyright © 2020 ServiceEntry for service discovery apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: users-cluster2 spec: hosts: - users.default.cluster2 location: MESH_INTERNAL ports: - name: http1 number: 8000 protocol: http resolution: DNS addresses: - 240.0.0.2 endpoints: - address: 10.0.2.5 ports: http1: 15443

Slide 35

Slide 35 text

35 | Copyright © 2020 Forces to balance • Security (authz/authn/encryption/identity) • Service discovery • Failover / traffic shifting / transparent routing • Observability • Separate networks • Well-defined fault domains • Building for scale

Slide 36

Slide 36 text

36 | Copyright © 2020 What to do about the added burden for the operator?

Slide 37

Slide 37 text

37 | Copyright © 2020 @christianposta Cluster 1 Cluster 2 Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub Ingress Gateway Management Plane

Slide 38

Slide 38 text

38 | Copyright © 2020 38 | Copyright © 2020 Demo Service Mesh Hub

Slide 39

Slide 39 text

39 | Copyright © 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub Ingress Gateway Management Plane Remote Cluster

Slide 40

Slide 40 text

40 | Copyright © 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub CSR agent CSR agent Create cert/key and CSR Sign cert w/ shared root Shared root Ingress Gateway Management Plane Remote Cluster

Slide 41

Slide 41 text

41 | Copyright © 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub CSR agent CSR agent Shared root Ingress Gateway Chain with same root Management Plane Remote Cluster

Slide 42

Slide 42 text

42 | Copyright © 2020 About Us solo.io WebAssembly Hub webassemblyhub.io Service Mesh Hub servicemeshhub.io Questions? Join the Community slack.solo.io