Slide 1

Slide 1 text

Enabling canary deployments in k8s

Slide 2

Slide 2 text

Canary? Shaping the traffic in a way, so that we could direct a % of traffic to the new pods and promoting the same deployment to a full scaleout and gradually phasing out the older release. 2

Slide 3

Slide 3 text

Why Canary? ● Testing on staging doesn’t weed out all the possible reasons for something failing, final testing for a feature being done on some part of the traffic is not something unheard of. ● Canary being a precursor to enable full blue green deployments. ○ Why? ■ No feature flags for respective microservices. ■ Hence canary testing becomes paramount to test out features. 3

Slide 4

Slide 4 text

Approaches to enable Canary on k8s? 4

Slide 5

Slide 5 text

Using bare bones k8s deployments? How? Creation of two sets of deployments and referring to figure 1, v1 and v2, both would be separate deployment objects with separate label selectors. Both v1 and v2 deployments would be exposed via the same svc object which would point to their pods. 5

Slide 6

Slide 6 text

Using bare bones k8s deployments? Advantages ● Plain and simple. ● Can be done without any plugins/extra stuff in the vanilla k8s we get in GKE. Disadvantages ● Traffic will be a function of replicas, and cannot be customized. For example, if the traffic splitting is done between the two deployment with v1 having 3 replicas and v2 having 1 replica, the traffic split for canary will be 25% 6

Slide 7

Slide 7 text

Using Istio? ● In an istio enabled cluster, we need to set the routing rules to configure the traffic distribution. ● Similar to the approach above, we have two deployments and svc objects for the same service, called v1 and v2. ● The rule will look something like 7

Slide 8

Slide 8 text

Using Istio 8 Ref: https://istio.io/blog/2017/0.1-canary/

Slide 9

Slide 9 text

Advantages ● Has been out there in the wild for quite some time. i.e Battle tested ● Flagger can be added to have automated canary promotion. ● GKE has an add on feature which can be used to install istio in our clusters ● Traffic routing and replica deployment are two completely orthogonal independent functions ● Focused canary testing, eg: instead of exposing the canary to an arbitrary number of users, if you wanted the users from some-company-name.com to the canary version, leaving the other users unaffected, you can do that too with a rule to match the headers for the match to check for the cookie for the above. Disadvantages ● Another add on to manage inside the cluster if gone through the route of installing the istio version 9

Slide 10

Slide 10 text

Using Linkerd ● Linkerd has a canary CRD that enabled how a rollout should occur ● It automatically creates two sets of deployments for a deployment name `podinfo` 10

Slide 11

Slide 11 text

Using Linkerd 11

Slide 12

Slide 12 text

Using Linkerd Advantages ● Flagger can be integrated for automated canary promotion/demotion ● Battle tested and has been in use by the virtue of being the first service mesh Disadvantages ● Another component inside the k8s cluster to be maintained 12

Slide 13

Slide 13 text

Using Traefik ● It requires the same setup of two deployment and svc objects for the service which needs to have canary enabled for it. ● It makes use of the ingress object in k8s to define the traffic split between the services. 13

Slide 14

Slide 14 text

Using Traefik 14

Slide 15

Slide 15 text

Using Traefik Advantages ● Easy to setup with no frills, as it’s just an ingress controller in the k8s controller (like contour/ nginx-ingress-controller) ● Doesn’t need the pods to be scaled. ● Support for tracing included. Disadvantages ● No inbuilt process to shift weights from v1 to v2 or revert back traffic in case of increased error rates. Ie. not a clear cut way to integrate with flagger for automated canary promotion/demotion 15

Slide 16

Slide 16 text

References ● https://martinfowler.com/bliki/CanaryRelease.html ● https://martinfowler.com/bliki/FeatureToggle.html ● Canary using istio ○ https://istio.io/blog/2017/0.1-canary/ ● Bare bones canary on k8s ○ https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#can ary-deployments ● Canary using traefik ○ https://blog.containo.us/canary-releases-with-traefik-on-gke-at-holidaycheck-d3c092 8f1e02?gi=c58435e35526 ○ https://tasdikrahman.me/2018/10/25/canary-deployments-on-AWS-and-kubernetes-u sing-traefik/ ○ https://docs.traefik.io/user-guide/kubernetes/#traffic-splitting ● Canary using linkerd ○ https://linkerd.io/2/tasks/canary-release/ ○ https://linkerd.io/2/features/traffic-split/ ■ Done using https://flagger.app/ ○ https://www.tarunpothulapati.com/posts/traffic-splitting-linkerd/ 16

Slide 17

Slide 17 text

17 @tasdikrahman tasdikrahman.me