ABEMA における GKE スケール戦略と Anthos Service Mesh 活用事例 Deep Dive

AbemaTV, Inc. All Rights Reserved 1 Cloud Day で紹介した ABEMA
における GKE スケール戦略 / ASM 活⽤事例 Deep Dive June 9th, 2023 永岡克利株式会社 AbemaTV ‧ Cloud Platform Group Manager

AbemaTV, inc. 2011 年 - 株式会社サイバーエージェント中途⼊社 2017 年 - ABEMA
に参画 2023 年 - Jagu'e'r に加⼊ ABEMA ではプロダクト開発の Technical Lead Engineer を経て、現在 Cloud Platform Group の Manager として Cloud Solution 全般を担当スピーカー⾃⼰紹介永岡克利株式会社 AbemaTV Cloud Platform Group Manager Software Engineer 2 na_ga #Gopher #Snowboard #Scubadiving #Sauna #Paraglider #⼆児の⽗

AbemaTV, inc. Google Cloud Day ʻ23 東京 3

AbemaTV, inc. Google Cloud Day ʻ23 東京 Archive: https://cloudonair.withgoogle.com/events/google-cloud-day-23/watch?talk=tok-d3-appdev04 4

AbemaTV, inc. Cloud Storage ABEMA’s Cloud Architecture Firestore Cloud Spanner
Memorystore Cloud Bigtable Cloud Pub/Sub GKE 東京 Region API Gateway Micro Service GKE 台湾 Region API Gateway Micro Service Anthos Service Mesh MongoDB Managed Services MongoDB Atlas Compute Engine Zixi Wowza Redis Cluster … etc … etc gRPC / HTTP Cloud CDN Cloud Load Balancing Cloud Armor Akamai Fastly CDN Amazon CloudFront 5

AbemaTV, inc. Cloud Storage Firestore Cloud Spanner Memorystore Cloud Bigtable
Cloud Pub/Sub GKE 東京 Region API Gateway Micro Service GKE 台湾 Region API Gateway Micro Service Anthos Service Mesh MongoDB Managed Services MongoDB Atlas Compute Engine Zixi Wowza Redis Cluster … etc … etc gRPC / HTTP Cloud CDN Cloud Load Balancing Cloud Armor Akamai Fastly CDN Amazon CloudFront ABEMA’s Cloud Architecture 6

AbemaTV, Inc. All Rights Reserved 7 GKE スケール戦術 7 選と
Anthos Service Mesh 活⽤術 7 選 June 9th, 2023 永岡克利株式会社 AbemaTV ‧ Cloud Platform Group Manager Jagu'e'r クラウドネイティブ分科会 Meetup #11

AbemaTV, inc. GKE スケール戦術 7 選 8

AbemaTV, inc. GKE スケール戦略コスト最適化 • Node Capacity を可能な限り効率よく使⽤する •
Workload の Resource Requests に合わせてスケールする求められること • Node Capacity 不⾜による Workload の Pending を極⼒避ける • ⼤規模イベント期間などは、コストをかけて安定性を最優先できる 9

AbemaTV, inc. 1. Resource Type に特化した Node Pool 設計 2.
Resource Requests の変化に合わせた Node Pool 設計 3. 同時接続数 ( CCU ) の変化に合わせた Node Pool の事前スケールアウト 4. 余剰 Capacity の多い Node Pool の積極的なスケールイン 5. Workload の Node 偏りを⾃動的に最適化する仕組み 6. ⼤規模イベントに向けたストックアウト対策 7. 災害級スパイクアクセスに向けて GKE スケール戦術 10

AbemaTV, inc. 課題 • Workload によって Resource Requests の傾向が異なる •
極端に⼤きい Resource Requests によって Node Capacity の無駄が⽣じる Node Capacity 使⽤率 12

AbemaTV, inc. Node Capacity 使⽤率 13 Example • One Node
Capacity: vCPU 8 core, Memory 16 GiB • One Workload Resource Requests: vCPU 1 core, Memory 15 GiB • Memory Resource の空きが少なく CPU Resource の無駄が⽣じる One Node Memory Usage 15/16 ( 93.7% ) One Workload resources: requests: cpu: "1" memory: "15 Gi" CPU Usage 1/8 ( 12.5% )

AbemaTV, inc. High CPU Node Pool • CPU が多く、Memory が少ない
• CPU Request の⼤きい Workload を配置 High Memory Node Pool • CPU が少なく、Memory が多い • Memory Request の⼤きい Workload を配置 (Prometheus, Batch …etc) Resource ⽐率の異なる Node Pool 14

AbemaTV, inc. Resource ⽐率の異なる Node Pool N2 CPU 1 core
と Memory 1 GB あたりの単価 • CPU $0.040618 / core hour, Memory $0.005419 / GB hour • 約 7.5 倍 ( 0.040618 / 0.005419 = 7.4954788706 ) 15

AbemaTV, inc. Resource ⽐率の異なる Node Pool N2 CPU 1 core
と Memory 1 GB あたりの単価 • CPU $0.040618 / core hour, Memory $0.005419 / GB hour • 約 7.5 倍 ( 0.040618 / 0.005419 = 7.4954788706 ) Node Capacity 使⽤率に対する基本的な考え⽅ • CPU は無駄を極⼒避ける • Memory は ”多少” 余分があっても良い 16

AbemaTV, inc. High CPU Node Pool • Machine Type: n2-custom-36-73728
• Node Capacity: CPU 36 core, Memory 72 GB ( Rate 1 : 2 ) High Memory Node Pool • Machine Type: n2-highmem-16 • Node Capacity: CPU 16 core, Memory 128 GB ( Rate 1 : 8 ) Node Pool 設計 17

AbemaTV, inc. Resource Requests と Limits を必ず設定 • CPU Resource
は Request と Limit は 2 倍以内とする • Memory Resource は Request と Limit は同値とする Node Capacity ( CPU / Memory ) の 50% 超過時は分割 • Replicas を増やして 1 Pod あたりの Resource Requests を減らす • 分割できない場合は、専⽤ Node Pool の追加を検討する Workload の Resource Rule 18

AbemaTV, inc. Resource Requests が CPU 1 core に対して Memory
4GB 未満 • High CPU Node Pool に配置する • ⼀般的な Workload ( 例: CPU 2 core / Memory 4 GB ) Resource Requests が CPU 1 core に対して Memory 4GB 以上 • High Memory Node Pool に配置する • Prometheus や Batch Cron Job など ( 例: CPU 2 core / Memory 8 GB ) Workload の Deploy Rule 19

AbemaTV, inc. 課題 • Workload の Replicas を HPA で制御
• 負荷が集中しない時間帯は Node Capacity が無駄になっている Node Capacity Scalability 21

AbemaTV, inc. 課題 • Workload の Replicas を HPA で制御
• 負荷が集中しない時間帯は Node Capacity が無駄になっている Example • ABEMA ピーク帯 10:00-16:00 と 19:00-25:00 ( JST ) • 特に深夜帯 AM 2:00 - 7:00 の同時接続数 ( CCU ) は少なく遷移する Node Capacity Scalability 22

AbemaTV, inc. 固定サイズの Fixed と Resoure Requests に応じて動的に変動する Auto Scale
Capacity の異なる Node Pool GKE Node Pool Node 台数は常に固定 Fixed Auto Scale Node 台数は動的に変動する 23

AbemaTV, inc. Capacity の異なる Node Pool 25

AbemaTV, inc. Pod Disruption Budget ( PDB ) を設定 GKE
Workload 特性に適した配置 GKE Workload PDB apiVersion: policy/v1beta1 kind: PodDisruptionBudget spec: maxUnavailable: 30% selector: matchLabels: name: xxx Evict を許容する割合を指定 GKE Node Pool Node 台数は常に固定 Fixed Node 台数は動的に変動する Auto Scale 26

AbemaTV, inc. Priority Class と Node Affinity を設定 GKE Workload
特性に適した配置 GKE Workload GKE Node Pool Node 台数は常に固定 Fixed Node 台数は動的に変動する Auto Scale Evict を避けたい Priority Class Middle Node Affinity Fixed or Auto Scale Evict を許容できる Priority Class High Node Affinity Fixed PDB 27

AbemaTV, inc. $ kubectl get priorityclass NAME VALUE GLOBAL-DEFAULT system-cluster-critical
2000000000 false system-node-critical 2000001000 false gmp-critical 1000000000 false high-priority 100 false middle-priority 50 false zero-priority 0 true balloon-priority -5 false Priority Class 役割 • Pod の優先度 • 優先度に従った Evict 判定定義 • Zero を Global Default • High と Middle を追加 28

AbemaTV, inc. Kustomize Patch apiVersion: apps/v1 kind: Deployment spec: template:
spec: priorityClassName: high-priority affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: In values: - high-cpu # fixed node pool Name • public-cpu.yaml Priority Class • High Node Affinity • Fixed 29

AbemaTV, inc. apiVersion: apps/v1 kind: Deployment spec: template: spec: priorityClassName:
middle-priority affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: In values: - high-cpu # fixed node pool - high-cpu-asg # auto scale node pool preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: type operator: In values: - high-cpu # fixed node pool Kustomize Patch Name • public-cpu-asg.yaml Priority Class • Middle Node Affinity • Fixed (優先) or Auto Scale 30

AbemaTV, inc. Selection for Kustomize Patch 31

AbemaTV, inc. 課題 • Node のスケールアウトは、起動時間 ( 20 ~ 40
秒 ) が必要 • Workload のスケールアウトによって⼀時的な Pending 状態が発⽣迅速なスケールアウトに向けて 33

AbemaTV, inc. 課題 • Node のスケールアウトは、起動時間 ( 20 ~ 40
秒 ) が必要 • Workload のスケールアウトによって⼀時的な Pending 状態が発⽣ Example • 徐々に負荷が増える場合は、⼤きな問題にならない • 緊急ニュースや SNS 拡散によるスパイクアクセスを処理できない迅速なスケールアウトに向けて 34

AbemaTV, inc. Priority Class の低い Balloon を Auto Scale に配置した迅速なスケールアウト
Balloon による余剰確保 GKE Node Pool Node 台数は常に固定 Fixed Auto Scale Node 台数は動的に変動する GKE Workload Priority Class: Balloon Node Aﬃnity: Auto Scale Only Resource Requests: Capacity * 0.6 Pod Replicas: CCU による HPA 制御 Balloon Pod(s) 35

AbemaTV, inc. $ kubectl get priorityclass NAME VALUE GLOBAL-DEFAULT system-cluster-critical
2000000000 false system-node-critical 2000001000 false gmp-critical 1000000000 false high-priority 100 false middle-priority 50 false zero-priority 0 true balloon-priority -5 false Balloon Priority 役割 • Pod の優先度 • 優先度に従った Evict 判定定義 • Zero を Global Default • Balloon は最⼩の優先度 36

AbemaTV, inc. Balloon Deployment Priority Class • 即座に Evict 対象
Node Affinity • Auto Scale に配置 Capacity • Node Capacity * 0.6 apiVersion: apps/v1 kind: Deployment metadata: namespace: balloon name: balloon-high-cpu-asg spec: template: spec: priorityClassName: balloon-priority # -5 terminationGracePeriodSeconds: 0 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: In values: - high-cpu-asg # auto scale node pool containers: - resources: requests: cpu: 21600m # node capacity * 0.6 memory: 43.2Gi # node capacity * 0.6 37

AbemaTV, inc. Replicas • Min 1 • Max 3 Custom
Metrics • 同時接続数 ( CCU ) • 150,000 per Pod Balloon HPA kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta2 metadata: name: balloon-public-cpu-asg namespace: balloon spec: scaleTargetRef: kind: Deployment name: balloon-public-cpu-asg minReplicas: 1 maxReplicas: 3 behavior: # skip metrics: - type: External external: metric: name: "prometheus.googleapis.com|ccu_scale|gauge" target: type: AverageValue averageValue: 150000 38

Resource Requests の変化に合わせた Node Pool 設計 3. 同時接続数 ( CCU ) の変化に合わせた Node Pool の事前スケールアウト 4. 余剰 Capacity の多い Node Pool の積極的なスケールイン 5. Workload の偏りを最適化する 6. ⼤規模イベントに向けたストックアウト対策 7. 災害級スパイクアクセスに向けて GKE スケール戦術 39

AbemaTV, inc. 課題 • 使⽤率の低い Node がスケールインされない • ⼿動でスケールインを試みても、数分後に戻ってしまう余剰
Capacity の最適化 40

AbemaTV, inc. 課題 • 使⽤率の低い Node がスケールインされない • ⼿動でスケールインを試みても、数分後に戻ってしまう Example
• 他ノードに配置できる Balloon が存在する • 使⽤率の低い Node に⼿動 Drain を実施しても、新規 Node が起動する余剰 Capacity の最適化 41

AbemaTV, inc. Types • balanced: 各ゾーンの可⽤性を優先 ( デフォルト ) •
optimize-utilization: 使⽤率の最適化を優先し、積極的なスケールイン GKE Autoscaling Proﬁle 42

AbemaTV, inc. GKE Autoscaling Proﬁle 43

AbemaTV, inc. Autoscaling Profile を optimize-utilization に変更 • balanced: デフォルト
• optimize-utilization: 使⽤率の最適化を優先し、積極的なスケールイン GKE Autoscaling Profile Minimum 43 node Minimum 37 node Autoscaling Profile を Balanced から optimize-utilization に変更 44

AbemaTV, inc. 課題 • 同じ Node に Replicas が配置される状況がある •
Auto Scale Node Pool の縮退による Workload 影響を最⼩化したい Workload の偏り 46

AbemaTV, inc. 課題 • 同じ Node に Replicas が配置される状況がある •
Auto Scale Node Pool の縮退による Workload 影響を最⼩化したい Example • 急激な負荷⾼騰により、新規 Node に Pod Replicas が集中する • Zone 障害時は、復旧後に Workload の偏りを最適化する必要がある Workload の偏り 47

AbemaTV, inc. 10 種類の Strategy によって Pod の再配置を⾏うソリューション Descheduler for
Kubernetes 48 出典: https://github.com/kubernetes-sigs/descheduler/blob/master/strategies_diagram.png

AbemaTV, inc. • RemoveDuplicates / LowNodeUtilization / HighNodeUtilization • RemovePodsViolatingInterPodAntiAﬃnity
• RemovePodsViolatingNodeAﬃnity • RemovePodsViolatingNodeTaints • RemovePodsViolatingTopologySpreadConstraint • RemovePodsHavingTooManyRestarts • PodLifeTime / RemoveFailedPods Descheduler Strategies 49

AbemaTV, inc. Strategy • 同⼀ Node の Pod を Evict
• Node の偏りを均等化する対象 • 全ての Node Pool • Priority Class High 未満 RemoveDuplicates apiVersion: v1 kind: ConﬁgMap metadata: name: descheduler-policy-conﬁgmap-all namespace: kube-system data: policy.yaml: | apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" evictLocalStoragePods: true maxNoOfPodsToEvictPerNode: 1 maxNoOfPodsToEvictPerNamespace: 1 strategies: "RemoveDuplicates": enabled: true params: nodeFit: true # 他ノードに配置できる Pod を対象とする thresholdPriorityClassName: "high-priority" 50

AbemaTV, inc. Strategy • 余剰 Node の Pod を Evict
• Node Pool の縮退を促す対象 • Auto Scale Node Pool • Priority Class High 未満 HighNodeUtilization apiVersion: v1 kind: ConﬁgMap metadata: name: descheduler-policy-conﬁgmap-public-cpu-asg namespace: kube-system data: policy.yaml: | apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" evictLocalStoragePods: true nodeSelector: type=public-cpu-asg maxNoOfPodsToEvictPerNode: 1 maxNoOfPodsToEvictPerNamespace: 1 strategies: "HighNodeUtilization": enabled: true params: nodeFit: true # 他ノードに配置できる Pod を対象とする thresholdPriorityClassName: "high-priority" nodeResourceUtilizationThresholds: numberOfNodes: 1 thresholds: "cpu": 30 "memory": 30 "pods": 30 51

AbemaTV, inc. Strategy • ⻑命な Pod を Evict • Fixed
Node への配置を促す対象 • Auto Scale Node Pool • Priority Class High 未満 PodLifeTime apiVersion: v1 kind: ConﬁgMap metadata: name: descheduler-policy-conﬁgmap-public-cpu-asg namespace: kube-system data: policy.yaml: | apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" evictLocalStoragePods: true nodeSelector: type=public-cpu-asg maxNoOfPodsToEvictPerNode: 1 maxNoOfPodsToEvictPerNamespace: 1 strategies: "PodLifeTime": enabled: true params: thresholdPriorityClassName: "high-priority" podLifeTime: maxPodLifeTimeSeconds: 604800 52

AbemaTV, inc. 課題 • ⼤規模イベント期間中は、安定性が最重視される • 急激な需要増加によって Resource を利⽤できない状況が発⽣するストックアウト
54

AbemaTV, inc. 課題 • ⼤規模イベント期間中は、安定性が最重視される • 急激な需要増加によって Resource を利⽤できない状況が発⽣する Example
• ワールドカップは約 1 ヶ⽉間 • 期間中に、アメリカの感謝祭とブラックフライデーが開催されるストックアウト 55

AbemaTV, inc. 未来の期間における Capacity を事前に確約 Zone 毎の Machine Type と台数を指定し、Google
Cloud の承認を得る Future Reservations • 準決勝 11/20 12/18 12/13 12/03 11/24 • 開幕 • BEST16 12/09 • 準々決勝 • 決勝 11/23 • ⽇本 vs ドイツ 11/27 • ⽇本 vs コスタリカ 12/01 • ⽇本 vs スペイン 12/05 • ⽇本 vs クロアチア 12/17 • 三位決定戦 11/22 • アルゼンチン vs サウジアラビア • スペイン vs ドイツ • オランダ vs アルゼンチン 12/10 • イングランド vs フランス • フランス vs アルゼンチン • カタール vs エクアドル • オランダ vs アメリカ事前に Capacity を確保する期間 • 感謝祭 & ブラックフライデー 56

AbemaTV, inc. 全ての Workload を Fixed Node Pool に配置する Capacity
を確保安定性を最重視 GKE Workload Evict を避けたい Priority Class Middle Node Aﬃnity Fixed or Auto Scale Evict を許容できる Priority Class High Node Aﬃnity Fixed PDB GKE Node Pool Future Reservations で確保した Capacity を常時稼働する Fixed Balloon Pod のみを配置通常時は最⼩構成で稼働する Auto Scale 57

AbemaTV, inc. GKE 東京リージョン単⼀ゾーン障害を考慮した Capacity を Future Reservations で確保
Capacity 通常時の約 6 倍⼤規模イベント期間の Capacity Node 300+ CPU 12,000+ core Memory 30+ TiB 58

AbemaTV, inc. 課題 • 試合展開や SNS 拡散によって瞬間的に⾮常に⼤きなアクセスが発⽣ • 常に瞬間的なスパイクアクセスに備えるとコストの無駄が⼤きい災害級スパイクアクセス
60

AbemaTV, inc. Example • 地震 • 事件 • ⽇本中からピーキーなアクセス災害級スパイクアクセス
61

AbemaTV, inc. Example • 著名⼈によるツイート • 世界中からピーキーなアクセス災害級スパイクアクセス 62

AbemaTV, inc. CDN レイヤーで Throttling 機構 • 許容できる閾値を超過した場合は、数秒間の待機を促す • バックエンドへの負荷を緩和し、スケールアウトの猶予時間を確保
流⼊制限 Amazon Route 53 Akamai API Gateway Amazon CloudFront ABEMA API Active Passive ❷ Rate Check に成功した場合のみ API Request を送信する ❶ Rate Check 起動 / 復帰シーケンス 63

Resource Requests の変化に合わせた Node Pool 設計 3. 同時接続数 ( CCU ) の変化に合わせた Node Pool の事前スケールアウト 4. 余剰 Capacity の多い Node Pool の積極的なスケールイン 5. Workload の Node 偏りを⾃動的に最適化する仕組み 6. ⼤規模イベントに向けたストックアウト対策 7. 災害級スパイクアクセスに向けて GKE スケール戦術まとめ 64

AbemaTV, inc. GKE スケール戦術まとめ通常時 • リクエストにアジャストなスケールを重視した戦略⼤規模イベント期間中 •
必要なコストをかけて、あらゆる状況を想定し、安定性を最優先した戦略 65

AbemaTV, inc. 通常時 Last 30 days における平均 CPU 使⽤率は 93.2%
GKE スケール戦術まとめ 66

AbemaTV, inc. Anthos Service Mesh 活⽤術 7 選 67

AbemaTV, inc. Anthos Service Mesh 活⽤事例適⽤前 • サービス間通信は共通 SDK
によるクライアント側の負荷分散 • アプリケーションレイヤーに Network 処理が組み込まれている全⾯的に適⽤ • Network 処理を ASM に移譲し、アプリケーション開発に専念 • ASM の恩恵を最⼤限に享受し、堅牢なアプリケーションを実現 68

AbemaTV, inc. 1. アクセスログとの付き合い⽅ 2. 安全な終了処理の実現⽅法 3. 接続 Endpoint の公開先制御
4. リトライによる⾃動回復性 5. サーキットブレーカーによる障害の局所化 6. Fault Injection による障害試験 7. 分散トレーシングによるボトルネック箇所の特定 Anthos Service Mesh 活⽤術 69

AbemaTV, inc. 課題 • Data Plane ( Envoy ) のアクセスログを確認したい
• ⼤量に出⼒されることで Cloud Logging のコストが増加する Data Plane のアクセスログ 71

AbemaTV, inc. 課題 • Data Plane ( Envoy ) のアクセスログを確認したい
• ⼤量に出⼒されることで Cloud Logging のコストが増加する Example • 疎通不能等の調査に確認したい • アプリケーションにアクセスログが実装されていない Data Plane のアクセスログ 72

AbemaTV, inc. 設定 • 初期状態は無効化 • 標準 Provider ( TEXT
) • 独⾃ Provider ( JSON ) アクセスログ制御 kind: ConﬁgMap apiVersion: v1 metadata: name: istio-asm-managed namespace: istio-system data: mesh: | … skip … extensionProviders: - name: envoy_access_log_json envoyFileAccessLog: path: /dev/stdout logFormat: labels: {} 73

AbemaTV, inc. 設定 • 利⽤側で Telemetry を定義 • Selector で適⽤箇所を指定
• ログ出⼒⽤ Provider を指定アクセスログ制御 apiVersion: telemetry.istio.io/v1alpha1 kind: Telemetry metadata: namespace: example name: gateway spec: selector: matchLabels: name: gateway accessLogging: # Enable to access logging of envoy - providers: - name: envoy_access_log_json 74

AbemaTV, inc. resource "google_logging_project_sink" "default" { … skip … exclusions
{ name = "ExclusionIstioProxyAccessLog" disabled = false ﬁlter = <<-EOT ( logName = ( "projects/xxx/logs/stdout" OR "projects/xxx/logs/server-accesslog-stackdriver" ) AND resource.labels.container_name = "istio-proxy" AND httpRequest.status < "400" ) EOT } } 設定 • Cloud Logging 除外 Rule • 4xx, 5xx のみを取り込むアクセスログ制御 75

AbemaTV, inc. 課題 • Multi Container は Life Cycle を正しく理解する必要がある
• 適切な終了処理を⾏わないと、Pod Terminate 時に 5xx が発⽣する終了処理 77

AbemaTV, inc. 課題 • Multi Container は Life Cycle を正しく理解する必要がある
• 適切な終了処理を⾏わないと、Pod Terminate 時に 5xx が発⽣する Example • Rolling Update 時に⼀部リクエストの 5xx が発⽣する • Application の Graceful Shutdown だけでは解消しない終了処理 78

AbemaTV, inc. これらの設定を⽤いて、正しい順序で終了する • Pod ◦ spec.terminationGracePeriodSeconds • Application Container
◦ Lifecycle Pre Stop • Data Plane Container ◦ MINIMUM_DRAIN_DURATION / EXIT_ON_ZERO_ACTIVE_CONNECTIONS Container Life Cycle 79

AbemaTV, inc. 80 AbemaTV, inc.

AbemaTV, inc. apiVersion: apps/v1 kind: Deployment metadata: name: xxx spec:
template: spec: terminationGracePeriodSeconds: 30 terminationGracePeriodSeconds • Terminate 処理の最⼤時間 Pod 81

AbemaTV, inc. drainingTimeoutSec • LB が接続を残す時間 • Pod の terminationGrace
PeriodSeconds + 数秒を指定 healthCheck • 初期値から適切に変更 BackendConﬁg apiVersion: cloud.google.com/v1 kind: BackendConﬁg metadata: name: xxx spec: connectionDraining: drainingTimeoutSec: 35 healthCheck: checkIntervalSec: 5 # インターバル (Default: 15) timeoutSec: 2 # タイムアウト (Default: 5) healthyThreshold: 1 # 成功判定とする回数 (Default: 1) unhealthyThreshold: 2 # 失敗判定とする回数 (Default: 2) type: HTTP requestPath: / port: 8000 82

AbemaTV, inc. Application apiVersion: apps/v1 kind: Deployment metadata: name: xxx
spec: template: spec: terminationGracePeriodSeconds: 30 containers: - name: app lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 10"] 83 lifecycle.preStop • SIGTERM の前処理 • 新規接続を受け付ける期間

AbemaTV, inc. drainDuration & MINIMUM_DRAIN_DURATION • 新規接続を受け付ける期間 EXIT_ON_ZERO_ACTIVE_CONNECTIONS • Active
Connection に基づいた終了処理を⾏う機能の有効化 Data Plane ( Pilot Agent & Envoy ) apiVersion: apps/v1 kind: Deployment metadata: name: xxx spec: template: metadata: annotations: traﬃc.sidecar.istio.io/excludeInboundPorts: 30001 proxy.istio.io/conﬁg: | drainDuration: '10s' proxyMetadata: MINIMUM_DRAIN_DURATION: '10s' EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true' 84

AbemaTV, inc. traffic.sidecar.istio.io/ excludeInboundPorts • ASM の対象外とする Port • 複数指定はカンマ区切り
• Active Connection から除外 Data Plane ( Pilot Agent & Envoy ) apiVersion: apps/v1 kind: Deployment metadata: name: xxx spec: template: metadata: annotations: traffic.sidecar.istio.io/excludeInboundPorts: 30001 proxy.istio.io/config: | drainDuration: '10s' proxyMetadata: MINIMUM_DRAIN_DURATION: '10s' EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true' 85

AbemaTV, inc. 課題 • Endpoint 情報は Endpoint Discovery Service で伝搬される
• 初期状態は、全ての Namespace に Endpoint 情報が伝搬される Data Plane ( Envoy ) の Memory 使⽤量 87

AbemaTV, inc. 課題 • Endpoint 情報は Endpoint Discovery Service で伝搬される
• 初期状態は、全ての Namespace に Endpoint 情報が伝搬される Example • DEV 環境より PRD 環境の⽅が Envoy Memory 使⽤量が⼤きい • 起動時に⼤量の Memory を使⽤した OOM が発⽣するケースがある Data Plane ( Envoy ) の Memory 使⽤量 88

AbemaTV, inc. ⼀部の NS に限られる場合は公開先を制御し、Memory 使⽤量を削減する • Service Resource ◦
Annotation ( networking.istio.io/exportTo ) を指定 • VirtualService Resource ◦ spec.exportTo を指定接続 Endpoint 公開先制御 89

AbemaTV, inc. networking.istio.io/exportTo • 公開先 NS を指定 • 初期値は制限なし設定例
• 予約語「.」で⾃⾝と同じ NS のみに公開先を限定する Service apiVersion: v1 kind: Service metadata: namespace: "service-mesh-example" name: "news" annotations: networking.istio.io/exportTo: "." # 複数指定時はカンマ区切り 90

AbemaTV, inc. spec.exportTo • 公開先 NS を指定 • 初期値は制限なし設定例
• 予約語「.」で⾃⾝と同じ NS のみに公開先を限定する VirtualService apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: namespace: "service-mesh-example" name: "news" spec: exportTo: - "." 91

AbemaTV, inc. 課題 • 様々な理由によって⼀時的な異常が発⽣する • 応答コードや応答時間などに応じた Retry が効果的適切なリトライ処理
93

AbemaTV, inc. 課題 • 様々な理由によって⼀時的な異常が発⽣する • 応答コードや応答時間などに応じた Retry が効果的 Example
• Exponential Backoﬀ Retry を正しく実装する • アプリケーションによって実装レベルが異なる適切なリトライ処理 94

AbemaTV, inc. VirtualService • 応答コードや応答時間による Retry を設定できる • 異なる Pod
に Retry する為、Pod レベルの⼀時的な異常を回避できる ASM による Retry 処理 95

AbemaTV, inc. 最⼤ 2 回まで再試⾏ • attempts 再試⾏毎に 1 秒のタイムアウト
• perTryTimeout 再試⾏を⾏うコード • retryOn VirtualService apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: viewing-history namespace: viewing-history spec: hosts: - viewing-history.viewing-history.svc.cluster.local http: - name: default retries: attempts: 2 perTryTimeout: 1s retryOn: 5xx,unavailable,internal,deadline-exceeded,cancelled route: - destination: host: viewing-history.viewing-history.svc.cluster.local 96

AbemaTV, inc. 必要性 • Retry が継続している場合は早期に検出する • Retry 発⽣数に応じた Alert
で詳細を調査する Retry Metrics の可視化 • Envoy の envoy_cluster_upstream_rq_retry を Promethues に取り込む • Grafana を⽤いた可視化と Grafana Uniﬁed Alert を⽤いた通知を実現 Network Metrics の可視化 97

AbemaTV, inc. Network Metrics の可視化 Retry Metrics • envoy_cluster_upstream_rq_retry •
envoy_cluster_upstream_rq_retry_backoﬀ_exponential • envoy_cluster_upstream_rq_retry_limit_exceeded • envoy_cluster_upstream_rq_retry_success 98

AbemaTV, inc. 初期値 • 最⼩限の Metrics を出⼒ • Network Metrics
は対象外注意点 • 全ての Endpoint が対象 • CPU / Memory 使⽤量が増加 Istio Document apiVersion: apps/v1 kind: Deployment metadata: name: <SRC_POD_NAME> spec: template: metadata: annotations: proxy.istio.io/conﬁg: | proxyStatsMatcher: inclusionRegexps: - ".*upstream_rq_.*" 99

AbemaTV, inc. Network Metrics の出⼒対象とする接続先を追加する ABEMA での推奨⼿順 apiVersion: apps/v1 kind:
Deployment metadata: name: <SRC_POD_NAME> spec: template: metadata: annotations: proxy.istio.io/conﬁg: | proxyStatsMatcher: inclusionPreﬁxes: - "cluster.outbound|<DST_POD_PORT>||<DST_POD_NAME>.<DST_POD_NAMESPACE>.svc.cluster.local" 100

AbemaTV, inc. Envoy Config Dump API • 正しく反映されている • 接続先を限定している
$ curl http://localhost:15000/config_dump | jq '.configs[].bootstrap.stats_config.stats_matcher' { "inclusion_list": { "patterns": [ { "prefix": "reporter=" }, { "prefix": "cluster_manager" }, { "prefix": "listener_manager" }, { "prefix": "server" }, { "prefix": "cluster.xds-grpc" }, { "prefix": "wasm" }, { "prefix": "component" }, { "prefix": "cluster.outbound|<Port>|<Name>.<NS>.svc.cluster.local" }, { "suffix": "rbac.allowed" }, { "suffix": "rbac.denied" }, { "suffix": "shadow_allowed" }, { "suffix": "shadow_denied" }, { "suffix": "downstream_cx_active" }, ] } } 設定確認 101

AbemaTV, inc. Prometheus Stats に、指定した接続先の Network Metrics が出⼒出⼒確認 $
curl -s http://localhost:15020/stats/prometheus \ | grep envoy_cluster_upstream_rq_retry \ | grep -v "0$" \ | grep -v '^#' envoy_cluster_upstream_rq_retry{cluster_name="outbound|<Port>|<Name>.<NS>.svc.cluster.local"} 4 envoy_cluster_upstream_rq_retry_backoﬀ_exponential{cluster_name=outbound|<Port>|<Name>.<NS>.svc.cluster.local"} 4 envoy_cluster_upstream_rq_retry_limit_exceeded{cluster_name="outbound|<Port>|<Name>.<NS>.svc.cluster.local"} 1 envoy_cluster_upstream_rq_retry_success{cluster_name="outbound|<Port>|<Name>.<NS>.svc.cluster.local"} 2 102

AbemaTV, inc. 1. envoy_cluster_assignment_stale 2. envoy_cluster_assignment_timeout_received 3. envoy_cluster_bind_errors 4. envoy_cluster_circuit_breakers_default_cx_open
5. envoy_cluster_circuit_breakers_default_cx_pool_open 6. envoy_cluster_circuit_breakers_default_remaining_cx_pools 7. envoy_cluster_circuit_breakers_default_remaining_cx 8. envoy_cluster_circuit_breakers_default_remaining_pending 9. envoy_cluster_circuit_breakers_default_remaining_retries 10. envoy_cluster_circuit_breakers_default_remaining_rq 11. envoy_cluster_circuit_breakers_default_rq_open 12. envoy_cluster_circuit_breakers_default_rq_pending_open 13. envoy_cluster_circuit_breakers_default_rq_retry_open 14. envoy_cluster_circuit_breakers_high_cx_open 15. envoy_cluster_circuit_breakers_high_cx_pool_open 16. envoy_cluster_circuit_breakers_high_rq_open 17. envoy_cluster_circuit_breakers_high_rq_pending_open 18. envoy_cluster_circuit_breakers_high_rq_retry_open 19. envoy_cluster_client_ssl_socket_factory_downstream_context_secrets_not_ready 20. envoy_cluster_client_ssl_socket_factory_ssl_context_update_by_sds 21. envoy_cluster_client_ssl_socket_factory_upstream_context_secrets_not_ready 22. envoy_cluster_default_total_match_count 23. envoy_cluster_init_fetch_timeout 24. envoy_cluster_lb_healthy_panic 25. envoy_cluster_lb_local_cluster_not_ok 26. envoy_cluster_lb_recalculate_zone_structures 27. envoy_cluster_lb_subsets_active 28. envoy_cluster_lb_subsets_created 29. envoy_cluster_lb_subsets_fallback_panic 30. envoy_cluster_lb_subsets_fallback 31. envoy_cluster_lb_subsets_removed 32. envoy_cluster_lb_subsets_selected 33. envoy_cluster_lb_zone_cluster_too_small 34. envoy_cluster_lb_zone_no_capacity_left 35. envoy_cluster_lb_zone_number_differs 36. envoy_cluster_lb_zone_routing_all_directly 37. envoy_cluster_lb_zone_routing_cross_zone 38. envoy_cluster_lb_zone_routing_sampled 39. envoy_cluster_max_host_weight 40. envoy_cluster_membership_change 41. envoy_cluster_membership_degraded 42. envoy_cluster_membership_excluded 43. envoy_cluster_membership_healthy 44. envoy_cluster_membership_total 45. envoy_cluster_metadata_exchange_alpn_protocol_found 46. envoy_cluster_metadata_exchange_alpn_protocol_not_found 47. envoy_cluster_metadata_exchange_header_not_found 48. envoy_cluster_metadata_exchange_initial_header_not_found 49. envoy_cluster_metadata_exchange_metadata_added 50. envoy_cluster_original_dst_host_invalid 51. envoy_cluster_retry_or_shadow_abandoned 52. envoy_cluster_ssl_connection_error 53. envoy_cluster_ssl_fail_verify_cert_hash 54. envoy_cluster_ssl_fail_verify_error 55. envoy_cluster_ssl_fail_verify_no_cert 56. envoy_cluster_ssl_fail_verify_san 57. envoy_cluster_ssl_handshake 58. envoy_cluster_ssl_no_certificate 59. envoy_cluster_ssl_ocsp_staple_failed 60. envoy_cluster_ssl_ocsp_staple_omitted 61. envoy_cluster_ssl_ocsp_staple_requests 62. envoy_cluster_ssl_ocsp_staple_responses 63. envoy_cluster_ssl_session_reused 64. envoy_cluster_tlsMode_disabled_total_match_count 65. envoy_cluster_tlsMode_istio_total_match_count 66. envoy_cluster_update_attempt 67. envoy_cluster_update_duration_bucket 68. envoy_cluster_update_duration_count 69. envoy_cluster_update_duration_sum 70. envoy_cluster_update_empty 71. envoy_cluster_update_failure 72. envoy_cluster_update_no_rebuild 73. envoy_cluster_update_rejected 74. envoy_cluster_update_success 75. envoy_cluster_update_time 76. envoy_cluster_upstream_cx_active 77. envoy_cluster_upstream_cx_close_notify 78. envoy_cluster_upstream_cx_connect_attempts_exceeded 79. envoy_cluster_upstream_cx_connect_fail 80. envoy_cluster_upstream_cx_connect_ms_bucket 81. envoy_cluster_upstream_cx_connect_ms_count 82. envoy_cluster_upstream_cx_connect_ms_sum 83. envoy_cluster_upstream_cx_connect_timeout 84. envoy_cluster_upstream_cx_connect_with_0_rtt 85. envoy_cluster_upstream_cx_destroy_local_with_active_rq 86. envoy_cluster_upstream_cx_destroy_local 87. envoy_cluster_upstream_cx_destroy_remote_with_active_rq 88. envoy_cluster_upstream_cx_destroy_remote 89. envoy_cluster_upstream_cx_destroy_with_active_rq 90. envoy_cluster_upstream_cx_destroy 103 91. envoy_cluster_upstream_cx_http1_total 92. envoy_cluster_upstream_cx_http2_total 93. envoy_cluster_upstream_cx_http3_total 94. envoy_cluster_upstream_cx_idle_timeout 95. envoy_cluster_upstream_cx_length_ms_bucket 96. envoy_cluster_upstream_cx_length_ms_count 97. envoy_cluster_upstream_cx_length_ms_sum 98. envoy_cluster_upstream_cx_max_duration_reached 99. envoy_cluster_upstream_cx_max_requests 100. envoy_cluster_upstream_cx_none_healthy 101. envoy_cluster_upstream_cx_overflow 102. envoy_cluster_upstream_cx_pool_overflow 103. envoy_cluster_upstream_cx_protocol_error 104. envoy_cluster_upstream_cx_rx_bytes_buffered 105. envoy_cluster_upstream_cx_rx_bytes_total 106. envoy_cluster_upstream_cx_total 107. envoy_cluster_upstream_cx_tx_bytes_buffered 108. envoy_cluster_upstream_cx_tx_bytes_total 109. envoy_cluster_upstream_flow_control_backed_up_total 110. envoy_cluster_upstream_flow_control_drained_total 111. envoy_cluster_upstream_flow_control_paused_reading_total 112. envoy_cluster_upstream_flow_control_resumed_reading_total 113. envoy_cluster_upstream_internal_redirect_failed_total 114. envoy_cluster_upstream_internal_redirect_succeeded_total 115. envoy_cluster_upstream_rq_active 116. envoy_cluster_upstream_rq_cancelled 117. envoy_cluster_upstream_rq_completed 118. envoy_cluster_upstream_rq_maintenance_mode 119. envoy_cluster_upstream_rq_max_duration_reached 120. envoy_cluster_upstream_rq_pending_active 121. envoy_cluster_upstream_rq_pending_failure_eject 122. envoy_cluster_upstream_rq_pending_overflow 123. envoy_cluster_upstream_rq_pending_total 124. envoy_cluster_upstream_rq_per_try_idle_timeout 125. envoy_cluster_upstream_rq_per_try_timeout 126. envoy_cluster_upstream_rq_retry_backoff_exponential 127. envoy_cluster_upstream_rq_retry_backoff_ratelimited 128. envoy_cluster_upstream_rq_retry_limit_exceeded 129. envoy_cluster_upstream_rq_retry_overflow 130. envoy_cluster_upstream_rq_retry_success 131. envoy_cluster_upstream_rq_retry 132. envoy_cluster_upstream_rq_rx_reset 133. envoy_cluster_upstream_rq_timeout 134. envoy_cluster_upstream_rq_total 135. envoy_cluster_upstream_rq_tx_reset 136. envoy_cluster_version

AbemaTV, inc. 1. envoy_cluster_assignment_stale 2. envoy_cluster_assignment_timeout_received 3. envoy_cluster_bind_errors 4. envoy_cluster_circuit_breakers_default_cx_open
5. envoy_cluster_circuit_breakers_default_cx_pool_open 6. envoy_cluster_circuit_breakers_default_remaining_cx_pools 7. envoy_cluster_circuit_breakers_default_remaining_cx 8. envoy_cluster_circuit_breakers_default_remaining_pending 9. envoy_cluster_circuit_breakers_default_remaining_retries 10. envoy_cluster_circuit_breakers_default_remaining_rq 11. envoy_cluster_circuit_breakers_default_rq_open 12. envoy_cluster_circuit_breakers_default_rq_pending_open 13. envoy_cluster_circuit_breakers_default_rq_retry_open 14. envoy_cluster_circuit_breakers_high_cx_open 15. envoy_cluster_circuit_breakers_high_cx_pool_open 16. envoy_cluster_circuit_breakers_high_rq_open 17. envoy_cluster_circuit_breakers_high_rq_pending_open 18. envoy_cluster_circuit_breakers_high_rq_retry_open 19. envoy_cluster_client_ssl_socket_factory_downstream_context_secrets_not_ready 20. envoy_cluster_client_ssl_socket_factory_ssl_context_update_by_sds 21. envoy_cluster_client_ssl_socket_factory_upstream_context_secrets_not_ready 22. envoy_cluster_default_total_match_count 23. envoy_cluster_init_fetch_timeout 24. envoy_cluster_lb_healthy_panic 25. envoy_cluster_lb_local_cluster_not_ok 26. envoy_cluster_lb_recalculate_zone_structures 27. envoy_cluster_lb_subsets_active 28. envoy_cluster_lb_subsets_created 29. envoy_cluster_lb_subsets_fallback_panic 30. envoy_cluster_lb_subsets_fallback 31. envoy_cluster_lb_subsets_removed 32. envoy_cluster_lb_subsets_selected 33. envoy_cluster_lb_zone_cluster_too_small 34. envoy_cluster_lb_zone_no_capacity_left 35. envoy_cluster_lb_zone_number_differs 36. envoy_cluster_lb_zone_routing_all_directly 37. envoy_cluster_lb_zone_routing_cross_zone 38. envoy_cluster_lb_zone_routing_sampled 39. envoy_cluster_max_host_weight 40. envoy_cluster_membership_change 41. envoy_cluster_membership_degraded 42. envoy_cluster_membership_excluded 43. envoy_cluster_membership_healthy 44. envoy_cluster_membership_total 45. envoy_cluster_metadata_exchange_alpn_protocol_found 46. envoy_cluster_metadata_exchange_alpn_protocol_not_found 47. envoy_cluster_metadata_exchange_header_not_found 48. envoy_cluster_metadata_exchange_initial_header_not_found 49. envoy_cluster_metadata_exchange_metadata_added 50. envoy_cluster_original_dst_host_invalid 51. envoy_cluster_retry_or_shadow_abandoned 52. envoy_cluster_ssl_connection_error 53. envoy_cluster_ssl_fail_verify_cert_hash 54. envoy_cluster_ssl_fail_verify_error 55. envoy_cluster_ssl_fail_verify_no_cert 56. envoy_cluster_ssl_fail_verify_san 57. envoy_cluster_ssl_handshake 58. envoy_cluster_ssl_no_certificate 59. envoy_cluster_ssl_ocsp_staple_failed 60. envoy_cluster_ssl_ocsp_staple_omitted 61. envoy_cluster_ssl_ocsp_staple_requests 62. envoy_cluster_ssl_ocsp_staple_responses 63. envoy_cluster_ssl_session_reused 64. envoy_cluster_tlsMode_disabled_total_match_count 65. envoy_cluster_tlsMode_istio_total_match_count 66. envoy_cluster_update_attempt 67. envoy_cluster_update_duration_bucket 68. envoy_cluster_update_duration_count 69. envoy_cluster_update_duration_sum 70. envoy_cluster_update_empty 71. envoy_cluster_update_failure 72. envoy_cluster_update_no_rebuild 73. envoy_cluster_update_rejected 74. envoy_cluster_update_success 75. envoy_cluster_update_time 76. envoy_cluster_upstream_cx_active 77. envoy_cluster_upstream_cx_close_notify 78. envoy_cluster_upstream_cx_connect_attempts_exceeded 79. envoy_cluster_upstream_cx_connect_fail 80. envoy_cluster_upstream_cx_connect_ms_bucket 81. envoy_cluster_upstream_cx_connect_ms_count 82. envoy_cluster_upstream_cx_connect_ms_sum 83. envoy_cluster_upstream_cx_connect_timeout 84. envoy_cluster_upstream_cx_connect_with_0_rtt 85. envoy_cluster_upstream_cx_destroy_local_with_active_rq 86. envoy_cluster_upstream_cx_destroy_local 87. envoy_cluster_upstream_cx_destroy_remote_with_active_rq 88. envoy_cluster_upstream_cx_destroy_remote 89. envoy_cluster_upstream_cx_destroy_with_active_rq 90. envoy_cluster_upstream_cx_destroy 104 91. envoy_cluster_upstream_cx_http1_total 92. envoy_cluster_upstream_cx_http2_total 93. envoy_cluster_upstream_cx_http3_total 94. envoy_cluster_upstream_cx_idle_timeout 95. envoy_cluster_upstream_cx_length_ms_bucket 96. envoy_cluster_upstream_cx_length_ms_count 97. envoy_cluster_upstream_cx_length_ms_sum 98. envoy_cluster_upstream_cx_max_duration_reached 99. envoy_cluster_upstream_cx_max_requests 100. envoy_cluster_upstream_cx_none_healthy 101. envoy_cluster_upstream_cx_overflow 102. envoy_cluster_upstream_cx_pool_overflow 103. envoy_cluster_upstream_cx_protocol_error 104. envoy_cluster_upstream_cx_rx_bytes_buffered 105. envoy_cluster_upstream_cx_rx_bytes_total 106. envoy_cluster_upstream_cx_total 107. envoy_cluster_upstream_cx_tx_bytes_buffered 108. envoy_cluster_upstream_cx_tx_bytes_total 109. envoy_cluster_upstream_flow_control_backed_up_total 110. envoy_cluster_upstream_flow_control_drained_total 111. envoy_cluster_upstream_flow_control_paused_reading_total 112. envoy_cluster_upstream_flow_control_resumed_reading_total 113. envoy_cluster_upstream_internal_redirect_failed_total 114. envoy_cluster_upstream_internal_redirect_succeeded_total 115. envoy_cluster_upstream_rq_active 116. envoy_cluster_upstream_rq_cancelled 117. envoy_cluster_upstream_rq_completed 118. envoy_cluster_upstream_rq_maintenance_mode 119. envoy_cluster_upstream_rq_max_duration_reached 120. envoy_cluster_upstream_rq_pending_active 121. envoy_cluster_upstream_rq_pending_failure_eject 122. envoy_cluster_upstream_rq_pending_overflow 123. envoy_cluster_upstream_rq_pending_total 124. envoy_cluster_upstream_rq_per_try_idle_timeout 125. envoy_cluster_upstream_rq_per_try_timeout 126. envoy_cluster_upstream_rq_retry_backoff_exponential 127. envoy_cluster_upstream_rq_retry_backoff_ratelimited 128. envoy_cluster_upstream_rq_retry_limit_exceeded 129. envoy_cluster_upstream_rq_retry_overflow 130. envoy_cluster_upstream_rq_retry_success 131. envoy_cluster_upstream_rq_retry 132. envoy_cluster_upstream_rq_rx_reset 133. envoy_cluster_upstream_rq_timeout 134. envoy_cluster_upstream_rq_total 135. envoy_cluster_upstream_rq_tx_reset 136. envoy_cluster_version 指定 Endpoint に対する 136 種類の Metrics が出⼒実際に利⽤していない Metrics も多いため、 Endpoint と Metrics の両⽅を指定する⽅法を模索中

AbemaTV, inc. 重要なサービス間通信は Network Metrics の監視を強化 Network Metrics 可視化 105

AbemaTV, inc. 重要なサービス間通信は Network Metrics の監視を強化 Network Metrics 可視化 106
envoy_cluster_upstream_rq_total envoy_cluster_upstream_rq_time_bucket envoy_cluster_upstream_rq envoy_cluster_upstream_rq_timeout envoy_cluster_upstream_rq_total envoy_cluster_upstream_rq_retry

AbemaTV, inc. Network Metrics ( Retry ) 可視化 107

AbemaTV, inc. Network Metrics ( Retry ) 可視化 108

AbemaTV, inc. Network Metrics ( Retry ) 可視化 rate(envoy_cluster_upstream_rq_retry{platform="$platform", platform_id="$platform_id",
region="$region", cluster="$cluster", namespace="$namespace", service="$service", cluster_name!="xds-grpc"})[1m] 109

AbemaTV, inc. 課題 • ⼀部のサービスで発⽣した障害が全体に連鎖する • ⼩さく壊れるアーキテクチャに改善する必要がある⼩さく壊れるアーキテクチャ 111

AbemaTV, inc. 課題 • ⼀部のサービスで発⽣した障害が全体に連鎖する • ⼩さく壊れるアーキテクチャに改善する必要がある Example • ABEMA
は 300 以上のマイクロサービスによって展開されている • 特に外部サービスのスローダウン影響が連鎖しないように防ぎたい⼩さく壊れるアーキテクチャ 112

AbemaTV, inc. DestinationRules • Active Requests や 5xx 系エラーの発⽣数に応じて、通信を遮断 •
バックエンドに必要以上の負荷が発⽣しないように制御できる ASM による Circuit Breaker 113

AbemaTV, inc. Active Requests • Envoy の max_requests から判定する •
Request per Sec ( RPS ) ではなく、その瞬間の流量となる • 仮に Capacity 160 RPS で、平均 Latency が 200ms の場合は 32 となる ASM による Circuit Breaker 114

AbemaTV, inc. 設定例 Active Requests が 4000 以上 • http2MaxRequests
連続した 5xx Error が 5 回以上 • consecutive5xxErrors apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: chat-server namespace: chat spec: host: chat-server.chat.svc.cluster.local traﬃcPolicy: connectionPool: http: http2MaxRequests: 4000 outlierDetection: baseEjectionTime: 30s consecutive5xxErrors: 5 maxEjectionPercent: 30 115

AbemaTV, inc. Circuit Breaker 可視化 116 Request Metrics • Istio
の istio_requests_total を Promethues に取り込む • Grafana を⽤いた可視化と Grafana Uniﬁed Alert を⽤いた通知を実現

AbemaTV, inc. Circuit Breaker 可視化 117

AbemaTV, inc. Circuit Breaker 可視化 118

AbemaTV, inc. Circuit Breaker 可視化 sum by ( response_code, response_ﬂags,
destination_workload, source_workload, source_workload_namespace ) (rate(istio_requests_total{ reporter="source", request_protocol="http", response_ﬂags="UO" })[1m]) 119

AbemaTV, inc. 課題 • 机上では障害が連鎖しないアーキテクチャを実現した • 期待通りに動作することを確認する障害試験の実現は難しい障害試験の難しさ 121

AbemaTV, inc. 課題 • 机上では障害が連鎖しないアーキテクチャを実現した • 期待通りに動作することを確認する障害試験の実現は難しい Example • 関連サービスの担当者との調整が必要
• 部分的なスローダウンをアプリケーションで実装する⼯数が⽣じる障害試験の難しさ 122

AbemaTV, inc. VirtualService • 遅延や 500 系などの異常系ステータスコードを⼀定の割合で挿⼊できる • Network Layer
で実現できるため、アプリケーションの改修が不要 ASM による Fault Injection 123

AbemaTV, inc. 10 秒の遅延を 100% 挿⼊ • delay.ﬁxedDelay • delay.percentage
500 Error を 5% 挿⼊ • abort.httpStatus • abort.percentage 設定例 apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: targeting-server namespace: user-targeting spec: hosts: - targeting-server.user-targeting.svc.cluster.local http: - name: targeting-server fault: delay: ﬁxedDelay: 10s percentage: value: 100 abort: httpStatus: 500 percentage: value: 5 route: - destination: host: targeting-server.user-targeting.svc.cluster.local 124

AbemaTV, inc. 課題 • 複数サービスと通信するマイクロサービスアーキテクチャ • 各サービスの依存関係の把握や、パフォーマンス測定が難しいマイクロサービスの難しさ
126

AbemaTV, inc. 課題 • 複数サービスと通信するマイクロサービスアーキテクチャ • 各サービスの依存関係の把握や、パフォーマンス測定が難しい Example
• ABEMA は 300 以上のマイクロサービスによって展開されている • 統合負荷試験でボトルネック箇所の特定が難しい状況が発⽣したマイクロサービスの難しさ 127

AbemaTV, inc. Global Setting • 初期値としては Sampling Rate を 0%
とする Workload Setting • リクエストの起因となる Workload で Sampling Rate を変更する ASM による分散トレーシング 128

AbemaTV, inc. GKE 設定例 Cloud Trace NS: istio-system Pod video-clip
Pod api NS: chat Caller の判定を尊重する Head-based Sampling Pod text-filter apiVersion: v1 kind: ConfigMap metadata: name: istio-asm-managed namespace: istio-system data: mesh: | defaultConfig: tracing: stackdriver: {} sampling: 0 annotations: proxy.istio.io/config: | tracing: stackdriver: {} sampling: 0.05 Sampling 0.00% Sampling 0.00% Sampling 0.05% Sidecar Injection by Admission Webhook Override Sampling Rate ASM Global Setting 129

AbemaTV, inc. GKE 設定例 Cloud Trace NS: istio-system Pod video-clip
Pod api NS: chat リクエスト起因で Sampling Rate を設定 Pod text-filter apiVersion: v1 kind: ConfigMap metadata: name: istio-asm-managed namespace: istio-system data: mesh: | defaultConfig: tracing: stackdriver: {} sampling: 0 annotations: proxy.istio.io/config: | tracing: stackdriver: {} sampling: 0.05 Sampling 0.00% Sampling 0.00% Sampling 0.05% Sidecar Injection by Admission Webhook Override Sampling Rate ASM Global Setting 130

4. リトライによる⾃動回復性 5. サーキットブレーカーによる障害の局所化 6. Fault Injection による障害試験 7. 分散トレーシングによるボトルネック箇所の特定 Anthos Service Mesh 活⽤術まとめ 131

AbemaTV, inc. Anthos Service Mesh 活⽤術まとめ適⽤前 • 共通
SDK によるクライアント側サービス間通信全体的に適⽤後 • ASM の恩恵を最⼤限に享受し、堅牢なアーキテクチャを実現 • 重要な Metrics を可視化し、早期 Alert による初動対応を強化 • 障害試験と分散トレーシングを⽤いた分析による耐障害性の実現 132

AbemaTV, inc. Anthos Service Mesh 活⽤術まとめ 133 今後の展望 •
リージョンレベルの耐障害性など、更なる活⽤を進めている • Multi-cluster Gateway API と Multi-cluster ASM の動向を注視

ABEMA における GKE スケール戦略と Anthos Service Mesh 活用事例...

ABEMA における GKE スケール戦略と Anthos Service Mesh 活用事例 Deep Dive

More Decks by Katsutoshi Nagaoka

Other Decks in Technology

Featured

Transcript