Slide 1

Slide 1 text

@mathetake — Kubernete Meetup Tokyo #32 Introduction to Flagger — A Progressive Delivery Operator for Kubernetes —

Slide 2

Slide 2 text

ҎԼΛ͓࿩͠·͢ • എܠ: Progressive Deliveryͱ͸ͳʹ͔ • Flaggerೖ໳ • Flaggerͷ಺෦ڍಈ ໌೔͔ΒFlaggerΛݕূ͢ΔͨΊೖ໳ ࠓ೔ͷ͓࿩

Slide 3

Slide 3 text

1. Background ※gitopsલఏͰ࿩͠·͕͢, ผͷफ೿Ͱ΋ڞ௨͢Δ࿩Ͱ͢

Slide 4

Slide 4 text

• ΈΜͳେ޷͖Continuous X • Cloud Nativeք۾ͷελϯμʔυ • ͳ͍ͱੜ͖͍͚ͯͳ͍ • ๛෋ͳsoftware܈ • Spinnaker, Flux, ArgoCD, … • Goal: Agility & Reliability • Automated testing, deployment Continuous X https://www.weave.works/blog/automate-kubernetes-with-gitops

Slide 5

Slide 5 text

• Continuous X allows us to test/deploy tons of applications in a day • PRϚʔδͰσϓϩΠ׬ྃɺखΦϖແ͠ • Rollback = PRͷϦόʔτ • ؆୯! • Reconciliation Loopສࡀ • ා͍΋ͷ͸ͳ͍ Continuous X

Slide 6

Slide 6 text

• Continuous X allows us to test/deploy tons of applications in a day • PRϚʔδͰσϓϩΠ׬ྃɺखΦϖແ͠ • Rollback = PRͷϦόʔτ • ؆୯! • Reconciliation Loopສࡀ • ා͍΋ͷ͸ͳ͍ Continuous X No Free Lunch

Slide 7

Slide 7 text

• CDʹΑΔdeployޙͷreliability…? • deploy௚ޙ͸ϩά/ϝτϦΫεΛݟ͟ΔΛಘͳ͍ • ΍͹͔ͬͨΒϦόʔτʂ • ݁ہHuman in the loop • ϩʔϧόοΫͷΦϖϨʔγϣϯͰࣄނΔՄೳੑ • We humans are all fallible Continuous Delivery is hard

Slide 8

Slide 8 text

΋ͬͱ͍͍ײ͡ʹγϡοͱ͍ͨ͠ • CDʹΑΔdeployޙͷreliability…? • deploy௚ޙ͸ϩά/ϝτϦΫεΛݟ͟ΔΛಘͳ͍ • ΍͹͔ͬͨΒϦόʔτʂ • ݁ہHuman in the loop • ϩʔϧόοΫͷΦϖϨʔγϣϯͰࣄނΔՄೳੑ • We humans are all fallible Continuous Delivery is hard

Slide 9

Slide 9 text

What is Progressive Delivery https://carlossg.github.io/presentations/2019-06_cdsummit/#/2/1

Slide 10

Slide 10 text

What is Progressive Delivery https://carlossg.github.io/presentations/2019-06_cdsummit/#/2/1

Slide 11

Slide 11 text

• NaiveͳCDͩͱAll-or-nothing deployment • ৽͍͠versionͰݹ͍΍ͭΛ͢΂ͯೖΕସ͑Δ • ϩʔϧόοΫ͸·ͨͦͷٯ • *Progressive Delivery = CD++ • ී௨ͷCDΛΑΓγϡοͱ͢Δ΋ͷ • All-or-nothingͰ͸ͳ͘ɺঃʑʹσϓϩΠ • ʮঃʑʹσϓϩΠʯΛࣗಈԽ • ຊ൪σϓϩΠͷϦεΫΛݮΒ͢΋ͷ What is Progressive Delivery * https://qiita.com/mumoshu/items/63b29bca6a052d8c7087

Slide 12

Slide 12 text

• Progressive Delivery = ঃʑʹຊ൪ʹσϓϩΠ͍ͯ͘͠ ≒ ຊ൪؀ڥͰ৽͍͠όʔδϣϯΛࢼ͢(test) • ‘Test in production’ ͷߟ͑ํʹ͍ۙ • Staging cannot be production • iterationͷߴ଎Խ, Agilityͷ޲্ All-or-nothing deployment + Test in production = high risk Progressive Delivery + Test in production = low risk Test in production meets P.D. https://www.getambassador.io/docs/latest/topics/concepts/progressive-delivery/

Slide 13

Slide 13 text

• Spinnaker • K8s native • Argo-rollouts • nativeͳservice based* • weaveworks/flagger • Service Mesh/Ingress Controllerϕʔε • جຊಈ࡞: গ͠rollout 㱻 ϝτϦΫεͷνΣοΫ Implementation *ੲͷόʔδϣϯ͸service meshͱ߹Θͤͯtraffic shifting͕Ͱ͖ͳ͔ͬͨ

Slide 14

Slide 14 text

2. Introduction to Flagger

Slide 15

Slide 15 text

• Progressive DeliveryͷͨΊͷOSS • github.com/weaveworks/flagger • weaveworksࣾͷStefan͕։ൃ • 2020೥6݄17೔ʹGA • IstioΛલఏͱͨ͠΋ͷͱͯ͠։ൃ → ͦͷޙෳ਺platformΛαϙʔτ What is Flagger Give developers confidence in automating the production releases

Slide 16

Slide 16 text

• Service mesh native • Istio, SMI(linkerd, crossover), Appmesh + ingress controllers • Fine tuned traffic shifting • Gitops native • multiple deployment strategies • Custom metrics • Manual gating(approve, pause, resume), Webhooks • Alerting Flagger’s Features

Slide 17

Slide 17 text

1. Canary Release (progressive traffic shifting) • http, grpcͳΞϓϦέʔγϣϯ༻ 2. A/B Testing (HTTP headers and cookies traffic routing) • Session affinity͕ඞཁͳΞϓϦέʔγϣϯ༻ 3. Blue/Green (traffic switching) • Any workload 4. Blue/Green Mirroring (traffic shadowing) • ႈ౳ͳΞϓϦέʔγϣϯ༻(࠷΋҆શ) • Canary Releaseͷલஈ֊ͱͯ͠mirroring͢Δ͜ͱ΋Մೳ Deployment Strategy

Slide 18

Slide 18 text

Deployment Strategy - Canary Release

Slide 19

Slide 19 text

Deployment Strategy - A/B testing

Slide 20

Slide 20 text

Deployment Strategy - Blue Green

Slide 21

Slide 21 text

Control loop / phase Analysis Promotion • Deployment strategy • Traffic Shifting • webhook • metrics check • Manual gating Rollback • Alert provider Stable Initialize

Slide 22

Slide 22 text

1. Canary • ϝΠϯͷϦιʔε • ࠷௿ݶͷػೳͰ͋Ε͹͜Ε͚ͩͰOK 2. MetricTemplate • Analyze͢ΔmetricsͷΫΤϦ • Metric ϓϩόΠμʔͷࢦఆ 3. Alertprovider • deliveryͷ௨஌ઌΛࢦఆ Flagger CRD

Slide 23

Slide 23 text

apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…”

Slide 24

Slide 24 text

apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” Applicationͷࢦఆ - Deployment - Daemonset

Slide 25

Slide 25 text

apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” HPAͷࢦఆ(optional)

Slide 26

Slide 26 text

apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” serviceͷఆٛ

Slide 27

Slide 27 text

apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” Canary analysisͷఆٛ

Slide 28

Slide 28 text

apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate

Slide 29

Slide 29 text

apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate MetricϓϩόΠμͷࢦఆ - Prometheus - Datadog - CloudWatch

Slide 30

Slide 30 text

apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate ΫΤϦͷtemplate

Slide 31

Slide 31 text

apiVersion: flagger.app/v1beta1 kind: AlertProvider metadata: name: on-call namespace: flagger spec: type: slack channel: on-call-alerts username: flagger # address: https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK secretRef: name: on-call-url --- apiVersion: v1 kind: Secret metadata: name: on-call-url namespace: flagger data: address: Flagger CRD: Alert Provider

Slide 32

Slide 32 text

3. How Flagger works?

Slide 33

Slide 33 text

Overview https://github.com/stefanprodan/gitops-progressive-delivery

Slide 34

Slide 34 text

• Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ΞϓϦέʔγϣϯؔ܎: جຊతʹݩͷఆٛͷίϐʔ • Deployment/Daemonset: -primary • “app: foo” → “app: foo-primary” ͱϥϕϧΛม׵ • HPA: -primary • mountͯ͠Δconfigmap, secret: -primary Canary Initialization

Slide 35

Slide 35 text

• Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ΞϓϦέʔγϣϯؔ܎: جຊతʹݩͷఆٛͷίϐʔ • Deployment/Daemonset: -primary • “app: foo” → “app: foo-primary” ͱϥϕϧΛม׵ • HPA: -primary • mountͯ͠Δconfigmap, secret: -primary Canary Initialization طଘͷdeployment,HPAΛͦͷ··࢖͑Δ

Slide 36

Slide 36 text

• Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ҎԼͷ3ͭͷserviceΛ࡞੒(+meshʹԠͯ͡VS౳) • ..svc.cluster.local • selector: app=-primary • -primary..svc.cluster.local • selector: app=-primary • -canary..svc.cluster.local • selector: app= Canary Initialization

Slide 37

Slide 37 text

• ݩͷuser managedͳdeploymentΛ canary ͱݺͿ • ίϐʔ͞ΕͨFlagger managedͳdeploymentΛ primary ͱݺͿ • ॾʑͷ࡞੒͕׬ྃޙɺuser managedͳdeploymentΛreplicas = 0ʹ͢Δ • (replicas͸ݩʑઃఆ͞Ε͍ͯͳ͍લఏ) • Canaryͷpod͸θϩݸʹͳΔ Canary Initialization

Slide 38

Slide 38 text

deployment foo-primary service foo-primary Canary Initialization deployment foo HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Managed by Users

Slide 39

Slide 39 text

deployment foo-primary service foo-primary Canary Initialization deployment foo HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Managed by Flagger

Slide 40

Slide 40 text

deployment foo-primary service foo-primary Canary Initialization deployment foo replicas: 0 HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Set ‘replicas = 0’ on canary

Slide 41

Slide 41 text

deployment foo-primary service foo-primary Canary Initialization deployment foo replicas: 0 HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Traffic

Slide 42

Slide 42 text

• ॳظԽޙtargetʹมߋ͕ೖΔͷΛ଴ͭ • Target = deployment, daemonset • spec.templateͷϋογϡ஋ͷมԽ • Target͕Ϛ΢ϯτͯ͠Δconfigmap, secretͷมԽ΋ݕ஌ • ݕ஌ͨ࣌͠఺Ͱ͸primaryʹ͸൓ө͞Εͳ͍(౰ͨΓલ) Change Detection

Slide 43

Slide 43 text

• canaryͷ rollout: Replicas = 0 Λ࡟আ • HealthyʹͳΒͳ͚Ε͹rollback • 100%ͷτϥϑΟοΫ͕canaryʹྲྀΕΔ·ͰɺҎԼΛ܁Γฦ͢ • ਺ˋͷτϥϑΟοΫΛcanaryʹshifting͢Δ • metricsΛ֬ೝ: ࢦఆ͞Εͨrangeͷதʹೖ͍ͬͯΔ͔ • ೖ͍ͬͯͳ͚Ε͹rollback • canary͔Βprimary΁deployment, configmap, secret, hpaΛίϐʔ͢Δ • User managedͳresourceΛFlagger managedͳresource΁ίϐʔ • τϥϑΟοΫΛprimary΁໭͢ • ஈ֊తʹ໭͢͜ͱ΋Մೳ (progressive promotion) Promotion Process ※͋͘·ͰҰྫ

Slide 44

Slide 44 text

Promotion Process https://github.com/weaveworks/flagger/pull/593 Scale up canary Start analysis Finish analysis Update primary Start progressive promotion Scale down canary Promotion finish

Slide 45

Slide 45 text

• rollback = ݩͷdeployment, daemonsetʹඥͮ͘podΛ0ݸʹ͢Δ • deploymentͷ৔߹: ‘replicas: 0’ Ληοτ͢Δ • daemonsetͷ৔߹: ଘࡏ͠ͳ͍labelͰnodeSelectorΛηοτ͢Δ • ϚχϑΣετࣗମ͕ϩʔϧόοΫ͞ΕΔΘ͚Ͱ͸ͳ͍ • gitopsϑϨϯυϦʔ Rollback

Slide 46

Slide 46 text

• ֤ϑΣʔζͰwebhookΛઃఆͰ͖Δ(2xxҎ֎͕ฦ͖ͬͯͨΒrollback) • confirm-rollout … canaryͷrolloutલ • pre-rollout … analysisΛ։࢝͢Δલ • rollout … analysisͷϧʔϓຖ • confirm-promotion … primaryʹpromotion͢Δલ • post-rollout … primary΁ͷpromotionޙ Webhooks

Slide 47

Slide 47 text

• podͷ਺͸୯७ʹೋഒ • service͕flaggerʹΑͬͯ࡞ΒΕΔ • طଘͷserviceΛͦͷ··࢖͑ͳ͍(࢖͑Δ͕ɺgitopsతʹ͸Ξ΢τ) • nativeͷserviceΛ࢖ͬͨcanary release͕Ͱ͖ͳ͍(argo-rolloutsํࣜ) • Blue Green͸Մೳ͕ͩɺجຊతʹ͸service mesh, ingress controllerલఏ • HPA΁ͷมߋ͸ݕ஌͞Εͳ͍ • ׬શʹࣗಈԽ͞ΕΔΘ͚Ͱ͸ͳ͍(ઈର҆શͰ͸ͳ͍) • metricsͰ֬ೝͰ͖ͳ͍όά΍Τϥʔ͸͋ΓಘΔ Considerations / Limitations

Slide 48

Slide 48 text

• Progressive Delivery͸CD++ • ProductionϦϦʔεͷriskΛݮΒ͢ • AgilityͱReliabilityͷཱ྆ • Flagger͸k8s(service mesh)্ͰͷProgressive DeliveryΛ࣮ݱ • ෳ਺ͷdeployment strategy, metrics provider, • configmap/secretͷมߋݕ஌ • We welcome all contributions! • Stack driverରԠ, StatefulsetରԠ, HPAͷมߋݕ஌, etc. • ࣭໰౳͸ؾܰʹweaveworks community slackʹͯ: https://slack.weave.works/ #flagger Conclusion

Slide 49

Slide 49 text

• https://github.com/weaveworks/flagger • https://docs.flagger.app/ • https://www.weave.works/blog/announcing-flagger-1-0 • https://medium.com/google-cloud-jp/gke-istio-flagger%E3%81%AB%E3%82%88%E3%82%8Bprogressive- delivery-5f1ea9b627c1 • https://www.slideshare.net/weaveworks/whats-new-in-flagger-10-with-stefan-prodan • https://medium.com/google-cloud/automated-canary-deployments-with-flagger-and-istio-ac747827f9d1 • https://medium.com/@dlorenc/pitfalls-of-progressive-delivery-114c6e3f9dbb • https://carlossg.github.io/presentations/2019-06_cdsummit • https://medium.com/@copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1 • https://www.infoq.com/presentations/progressive-delivery • https://qiita.com/mumoshu/items/63b29bca6a052d8c7087 • https://www.getambassador.io/docs/latest/topics/concepts/progressive-delivery/ References