Slide 1

Slide 1 text

© 2024 Wantedly, Inc. Kubernetes で Datadog を飼うなら オートディスカバリーを使わないと損 Japan Datadog User Group Meetup#5 Aug. 7 2024 - Atsushi Tanaka @bgpat

Slide 2

Slide 2 text

© 2024 Wantedly, Inc. $ whoami @bgpat / Atsushi Tanaka ウォンテッドリー株式会社 Infrastructure Engineer Kubernetes / Terraform SRE / Platform Engineering Datadog 歴 6〜7年くらい

Slide 3

Slide 3 text

© 2024 Wantedly, Inc. オートディスカバリー機能 https://docs.datadoghq.com/ja/getting_started/containers/autodiscovery

Slide 4

Slide 4 text

© 2024 Wantedly, Inc. オートディスカバリー機能

Slide 5

Slide 5 text

© 2024 Wantedly, Inc. 設定が書ける箇所 Annotation に書く ● Pod: ad.datadoghq.com/.checks ● Service: ad.datadoghq.com/service.checks ● Endpoint: ad.datadoghq.com/endpoints.checks (オートディスカバリに近い?機能) ● Tag Labels: ○ tags.datadoghq.com/env ○ tags.datadoghq.com/version ○ tags.datadoghq.com/service ● ConfigMap: ad_identifiers に一致するイメージに適用

Slide 6

Slide 6 text

© 2024 Wantedly, Inc. 利用できるテンプレート変数 https://docs.datadoghq.com/containers/guide/template_variables/ ● %%host%%, %%host_%% ● %%port%%, %%port_%%, %%port_%% ● %%pid%%, %%hostname%% ● %%env_%% ● %%kube_namespace%%, %%kube_pod_name%%, %%kube_pod_uid%% https://docs.datadoghq.com/ja/agent/configuration/secrets-management/ ● ENC[file@/path/to/file] ● ENC[k8s_secret@some_namespace/some_name/a_key]

Slide 7

Slide 7 text

© 2024 Wantedly, Inc. 使用例 apiVersion: v1 kind: Pod metadata: annotations: ad.datadoghq.com/redis.checks: | { "redisdb": { "init_config": {}, "instances": [ { "host": "%%host%%", "port":"%%port%%", "password":"%%env_REDIS_PASSWORD%%" } ] } } ad.datadoghq.com/redis.logs: '[{"source":"redis"}]' spec: containers: - name: redis ︙ Pod の annotation に書く

Slide 8

Slide 8 text

© 2024 Wantedly, Inc. 使用例 apiVersion: v1 kind: Service metadata: annotations: ad.datadoghq.com/service.checks: | { "redisdb": { "init_config": {}, "instances": [ { "host": "%%host%%", "port":"%%port%%", "password":"ENC[k8s_secret@my-redis/redis-secret/password]" } ] } } ︙ Service の annotation に書く

Slide 9

Slide 9 text

© 2024 Wantedly, Inc. 応用例

Slide 10

Slide 10 text

© 2024 Wantedly, Inc. 動作確認 (デバッグ) 方法 1. (Cluster Agent の場合) 監視している Agent を探す i. Cluster Agent のリーダーを調べる ii. agent clusterchecks を実行して対象の Check を探す iii. 対象 Agent の pod を頑張って探す 2. Agent のステータスを確認 i. agent status を実行して対象の Check を探す

Slide 11

Slide 11 text

© 2024 Wantedly, Inc. 動作確認 (デバッグ) 方法 (Cluster Agent の場合) 監視している Agent を探す # Cluster Agent のリーダーを調べる $ kubectl -n default get cm datadog-agent-leader-election -o json | \ jq '.metadata.annotations["control-plane.alpha.kubernetes.io/leader"] | fromjson.holderIdentity' "datadog-agent-cluster-agent-7474855779-z2zjf" # agent clusterchecks を実行して対象の Check を探す $ kubectl -n default exec datadog-agent-cluster-agent-7474855779-z2zjf -- agent clusterchecks ︙ ===== Checks on i-0123456789abcdef0 ===== === postgres check === Configuration provider: kubernetes-services Configuration source: kube_services:kube_service://default/aurora-postgres Config for instance ID: postgres:65af62e418817e1e ︙ # 対象 Agent の pod を頑張って探す (ウォンテッドリーの環境は providerID から特定できた) $ kubectl get no -o json | \ jq '.items[] | select(.spec.providerID | endswith(" i-0123456789abcdef0")) | .metadata.name' "ip-10-3-96-189.ap-northeast-1.compute.internal" $ kube sandbox -n default get po \ --field-selector spec.nodeName= ip-10-3-96-189.ap-northeast-1.compute.internal -l app=datadog-agent NAME READY STATUS RESTARTS AGE datadog-agent-dncs6 3/3 Running 0 61m

Slide 12

Slide 12 text

© 2024 Wantedly, Inc. 動作確認 (デバッグ) 方法 Agent のステータスを確認 # agent status を実行して対象の Check を探す $ kubectl -n default exec datadog-agent-dncs6 -- agent status Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init) disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively Getting the status from the agent. =============== Agent (v7.54.0) =============== ︙ postgres (18.2.2) ----------------- Instance ID: postgres:65af62e418817e1e [OK] Configuration Source: kube_services:kube_service://default/aurora-postgres Total Runs: 33 Metric Samples: Last Run: 10,503, Total: 346,599 Events: Last Run: 0, Total: 0 Database Monitoring Metadata Samples: Last Run: 1, Total: 2 Service Checks: Last Run: 1, Total: 33 Average Execution Time : 498ms ︙

Slide 13

Slide 13 text

© 2024 Wantedly, Inc. まとめ ● Kubernetes と Datadog は相性がいい ● 監視対象はコンテナとサービスが選択可能 ● 便利なテンプレート機能も利用可能 ● デバッグ方法に難あり (いい方法があれば知りたい)