Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Flagger

Introduction to Flagger

Kubernetes Meetup Tokyo #32

mathetake

July 28, 2020
Tweet

More Decks by mathetake

Other Decks in Programming

Transcript

  1. @mathetake — Kubernete Meetup Tokyo #32 Introduction to Flagger —

    A Progressive Delivery Operator for Kubernetes —
  2. • ΈΜͳେ޷͖Continuous X • Cloud Nativeք۾ͷελϯμʔυ • ͳ͍ͱੜ͖͍͚ͯͳ͍ • ๛෋ͳsoftware܈

    • Spinnaker, Flux, ArgoCD, … • Goal: Agility & Reliability • Automated testing, deployment Continuous X https://www.weave.works/blog/automate-kubernetes-with-gitops
  3. • Continuous X allows us to test/deploy tons of applications

    in a day • PRϚʔδͰσϓϩΠ׬ྃɺखΦϖແ͠ • Rollback = PRͷϦόʔτ • ؆୯! • Reconciliation Loopສࡀ • ා͍΋ͷ͸ͳ͍ Continuous X
  4. • Continuous X allows us to test/deploy tons of applications

    in a day • PRϚʔδͰσϓϩΠ׬ྃɺखΦϖແ͠ • Rollback = PRͷϦόʔτ • ؆୯! • Reconciliation Loopສࡀ • ා͍΋ͷ͸ͳ͍ Continuous X No Free Lunch
  5. • CDʹΑΔdeployޙͷreliability…? • deploy௚ޙ͸ϩά/ϝτϦΫεΛݟ͟ΔΛಘͳ͍ • ΍͹͔ͬͨΒϦόʔτʂ • ݁ہHuman in the

    loop • ϩʔϧόοΫͷΦϖϨʔγϣϯͰࣄނΔՄೳੑ • We humans are all fallible Continuous Delivery is hard
  6. ΋ͬͱ͍͍ײ͡ʹγϡοͱ͍ͨ͠ • CDʹΑΔdeployޙͷreliability…? • deploy௚ޙ͸ϩά/ϝτϦΫεΛݟ͟ΔΛಘͳ͍ • ΍͹͔ͬͨΒϦόʔτʂ • ݁ہHuman in

    the loop • ϩʔϧόοΫͷΦϖϨʔγϣϯͰࣄނΔՄೳੑ • We humans are all fallible Continuous Delivery is hard
  7. • NaiveͳCDͩͱAll-or-nothing deployment • ৽͍͠versionͰݹ͍΍ͭΛ͢΂ͯೖΕସ͑Δ • ϩʔϧόοΫ͸·ͨͦͷٯ • *Progressive Delivery

    = CD++ • ී௨ͷCDΛΑΓγϡοͱ͢Δ΋ͷ • All-or-nothingͰ͸ͳ͘ɺঃʑʹσϓϩΠ • ʮঃʑʹσϓϩΠʯΛࣗಈԽ • ຊ൪σϓϩΠͷϦεΫΛݮΒ͢΋ͷ What is Progressive Delivery * https://qiita.com/mumoshu/items/63b29bca6a052d8c7087
  8. • Progressive Delivery = ঃʑʹຊ൪ʹσϓϩΠ͍ͯ͘͠ ≒ ຊ൪؀ڥͰ৽͍͠όʔδϣϯΛࢼ͢(test) • ‘Test in

    production’ ͷߟ͑ํʹ͍ۙ • Staging cannot be production • iterationͷߴ଎Խ, Agilityͷ޲্ All-or-nothing deployment + Test in production = high risk Progressive Delivery + Test in production = low risk Test in production meets P.D. https://www.getambassador.io/docs/latest/topics/concepts/progressive-delivery/
  9. • Spinnaker • K8s native • Argo-rollouts • nativeͳservice based*

    • weaveworks/flagger • Service Mesh/Ingress Controllerϕʔε • جຊಈ࡞: গ͠rollout 㱻 ϝτϦΫεͷνΣοΫ Implementation *ੲͷόʔδϣϯ͸service meshͱ߹Θͤͯtraffic shifting͕Ͱ͖ͳ͔ͬͨ
  10. • Progressive DeliveryͷͨΊͷOSS • github.com/weaveworks/flagger • weaveworksࣾͷStefan͕։ൃ • 2020೥6݄17೔ʹGA •

    IstioΛલఏͱͨ͠΋ͷͱͯ͠։ൃ → ͦͷޙෳ਺platformΛαϙʔτ What is Flagger Give developers confidence in automating the production releases
  11. • Service mesh native • Istio, SMI(linkerd, crossover), Appmesh +

    ingress controllers • Fine tuned traffic shifting • Gitops native • multiple deployment strategies • Custom metrics • Manual gating(approve, pause, resume), Webhooks • Alerting Flagger’s Features
  12. 1. Canary Release (progressive traffic shifting) • http, grpcͳΞϓϦέʔγϣϯ༻ 2.

    A/B Testing (HTTP headers and cookies traffic routing) • Session affinity͕ඞཁͳΞϓϦέʔγϣϯ༻ 3. Blue/Green (traffic switching) • Any workload 4. Blue/Green Mirroring (traffic shadowing) • ႈ౳ͳΞϓϦέʔγϣϯ༻(࠷΋҆શ) • Canary Releaseͷલஈ֊ͱͯ͠mirroring͢Δ͜ͱ΋Մೳ Deployment Strategy
  13. Control loop / phase Analysis Promotion • Deployment strategy •

    Traffic Shifting • webhook • metrics check • Manual gating Rollback • Alert provider Stable Initialize
  14. 1. Canary • ϝΠϯͷϦιʔε • ࠷௿ݶͷػೳͰ͋Ε͹͜Ε͚ͩͰOK 2. MetricTemplate • Analyze͢ΔmetricsͷΫΤϦ

    • Metric ϓϩόΠμʔͷࢦఆ 3. Alertprovider • deliveryͷ௨஌ઌΛࢦఆ Flagger CRD
  15. apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec:

    provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…”
  16. apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec:

    provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” Applicationͷࢦఆ - Deployment - Daemonset
  17. apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec:

    provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” HPAͷࢦఆ(optional)
  18. apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec:

    provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” serviceͷఆٛ
  19. apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec:

    provider: istio targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo progressDeadlineSeconds: 60 autoscalerRef: apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler name: podinfo service: name: podinfo port: 9898 targetPort: 9898 portName: http portDiscovery: true match: - uri: prefix: / timeout: 5s Flagger CRD: Canary analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: "database connections" templateRef: name: db-connections thresholdRange: min: 2 max: 100 interval: 1m webhooks: - name: "load test" type: rollout url: http://flagger-loadtester.test/ metadata: cmd: “…” Canary analysisͷఆٛ
  20. apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec:

    provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate
  21. apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec:

    provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate MetricϓϩόΠμͷࢦఆ - Prometheus - Datadog - CloudWatch
  22. apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec:

    provider: type: prometheus address: http://promethues.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100 Flagger CRD: MetricTemplate ΫΤϦͷtemplate
  23. apiVersion: flagger.app/v1beta1 kind: AlertProvider metadata: name: on-call namespace: flagger spec:

    type: slack channel: on-call-alerts username: flagger # address: https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK secretRef: name: on-call-url --- apiVersion: v1 kind: Secret metadata: name: on-call-url namespace: flagger data: address: <encoded-url> Flagger CRD: Alert Provider
  24. • Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ΞϓϦέʔγϣϯؔ܎: جຊతʹݩͷఆٛͷίϐʔ • Deployment/Daemonset: <canary.spec.targetRef.name>-primary •

    “app: foo” → “app: foo-primary” ͱϥϕϧΛม׵ • HPA: <canary.spec.autoscalerRef.name>-primary • mountͯ͠Δconfigmap, secret: <name>-primary Canary Initialization
  25. • Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ΞϓϦέʔγϣϯؔ܎: جຊతʹݩͷఆٛͷίϐʔ • Deployment/Daemonset: <canary.spec.targetRef.name>-primary •

    “app: foo” → “app: foo-primary” ͱϥϕϧΛม׵ • HPA: <canary.spec.autoscalerRef.name>-primary • mountͯ͠Δconfigmap, secret: <name>-primary Canary Initialization طଘͷdeployment,HPAΛͦͷ··࢖͑Δ
  26. • Canary CRΛ࡞੒ͨ࣌͠఺ͰFlagger͸ෳ਺ͷϦιʔεΛ࡞Δ • ҎԼͷ3ͭͷserviceΛ࡞੒(+meshʹԠͯ͡VS౳) • <service.name>.<namespace>.svc.cluster.local • selector: app=<name>-primary

    • <service.name>-primary.<namespace>.svc.cluster.local • selector: app=<name>-primary • <service.name>-canary.<namespace>.svc.cluster.local • selector: app=<name> Canary Initialization
  27. • ݩͷuser managedͳdeploymentΛ canary ͱݺͿ • ίϐʔ͞ΕͨFlagger managedͳdeploymentΛ primary ͱݺͿ

    • ॾʑͷ࡞੒͕׬ྃޙɺuser managedͳdeploymentΛreplicas = 0ʹ͢Δ • (replicas͸ݩʑઃఆ͞Ε͍ͯͳ͍લఏ) • Canaryͷpod͸θϩݸʹͳΔ Canary Initialization
  28. deployment foo-primary service foo-primary Canary Initialization deployment foo HPA foo

    secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Managed by Users
  29. deployment foo-primary service foo-primary Canary Initialization deployment foo HPA foo

    secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Managed by Flagger
  30. deployment foo-primary service foo-primary Canary Initialization deployment foo replicas: 0

    HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Set ‘replicas = 0’ on canary
  31. deployment foo-primary service foo-primary Canary Initialization deployment foo replicas: 0

    HPA foo secret foo configmap foo HPA foo-primary secret foo-primary configmap foo-primary service foo-canary service foo Traffic
  32. • ॳظԽޙtargetʹมߋ͕ೖΔͷΛ଴ͭ • Target = deployment, daemonset • spec.templateͷϋογϡ஋ͷมԽ •

    Target͕Ϛ΢ϯτͯ͠Δconfigmap, secretͷมԽ΋ݕ஌ • ݕ஌ͨ࣌͠఺Ͱ͸primaryʹ͸൓ө͞Εͳ͍(౰ͨΓલ) Change Detection
  33. • canaryͷ rollout: Replicas = 0 Λ࡟আ • HealthyʹͳΒͳ͚Ε͹rollback •

    100%ͷτϥϑΟοΫ͕canaryʹྲྀΕΔ·ͰɺҎԼΛ܁Γฦ͢ • ਺ˋͷτϥϑΟοΫΛcanaryʹshifting͢Δ • metricsΛ֬ೝ: ࢦఆ͞Εͨrangeͷதʹೖ͍ͬͯΔ͔ • ೖ͍ͬͯͳ͚Ε͹rollback • canary͔Βprimary΁deployment, configmap, secret, hpaΛίϐʔ͢Δ • User managedͳresourceΛFlagger managedͳresource΁ίϐʔ • τϥϑΟοΫΛprimary΁໭͢ • ஈ֊తʹ໭͢͜ͱ΋Մೳ (progressive promotion) Promotion Process ※͋͘·ͰҰྫ
  34. Promotion Process https://github.com/weaveworks/flagger/pull/593 Scale up canary Start analysis Finish analysis

    Update primary Start progressive promotion Scale down canary Promotion finish
  35. • rollback = ݩͷdeployment, daemonsetʹඥͮ͘podΛ0ݸʹ͢Δ • deploymentͷ৔߹: ‘replicas: 0’ Ληοτ͢Δ

    • daemonsetͷ৔߹: ଘࡏ͠ͳ͍labelͰnodeSelectorΛηοτ͢Δ • ϚχϑΣετࣗମ͕ϩʔϧόοΫ͞ΕΔΘ͚Ͱ͸ͳ͍ • gitopsϑϨϯυϦʔ Rollback
  36. • ֤ϑΣʔζͰwebhookΛઃఆͰ͖Δ(2xxҎ֎͕ฦ͖ͬͯͨΒrollback) • confirm-rollout … canaryͷrolloutલ • pre-rollout … analysisΛ։࢝͢Δલ

    • rollout … analysisͷϧʔϓຖ • confirm-promotion … primaryʹpromotion͢Δલ • post-rollout … primary΁ͷpromotionޙ Webhooks
  37. • podͷ਺͸୯७ʹೋഒ • service͕flaggerʹΑͬͯ࡞ΒΕΔ • طଘͷserviceΛͦͷ··࢖͑ͳ͍(࢖͑Δ͕ɺgitopsతʹ͸Ξ΢τ) • nativeͷserviceΛ࢖ͬͨcanary release͕Ͱ͖ͳ͍(argo-rolloutsํࣜ) •

    Blue Green͸Մೳ͕ͩɺجຊతʹ͸service mesh, ingress controllerલఏ • HPA΁ͷมߋ͸ݕ஌͞Εͳ͍ • ׬શʹࣗಈԽ͞ΕΔΘ͚Ͱ͸ͳ͍(ઈର҆શͰ͸ͳ͍) • metricsͰ֬ೝͰ͖ͳ͍όά΍Τϥʔ͸͋ΓಘΔ Considerations / Limitations
  38. • Progressive Delivery͸CD++ • ProductionϦϦʔεͷriskΛݮΒ͢ • AgilityͱReliabilityͷཱ྆ • Flagger͸k8s(service mesh)্ͰͷProgressive

    DeliveryΛ࣮ݱ • ෳ਺ͷdeployment strategy, metrics provider, • configmap/secretͷมߋݕ஌ • We welcome all contributions! • Stack driverରԠ, StatefulsetରԠ, HPAͷมߋݕ஌, etc. • ࣭໰౳͸ؾܰʹweaveworks community slackʹͯ: https://slack.weave.works/ #flagger Conclusion
  39. • https://github.com/weaveworks/flagger • https://docs.flagger.app/ • https://www.weave.works/blog/announcing-flagger-1-0 • https://medium.com/google-cloud-jp/gke-istio-flagger%E3%81%AB%E3%82%88%E3%82%8Bprogressive- delivery-5f1ea9b627c1 •

    https://www.slideshare.net/weaveworks/whats-new-in-flagger-10-with-stefan-prodan • https://medium.com/google-cloud/automated-canary-deployments-with-flagger-and-istio-ac747827f9d1 • https://medium.com/@dlorenc/pitfalls-of-progressive-delivery-114c6e3f9dbb • https://carlossg.github.io/presentations/2019-06_cdsummit • https://medium.com/@copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1 • https://www.infoq.com/presentations/progressive-delivery • https://qiita.com/mumoshu/items/63b29bca6a052d8c7087 • https://www.getambassador.io/docs/latest/topics/concepts/progressive-delivery/ References