Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

◆ ◆ ◆ ◼ ◼ ◼ ◼ ◼ ◼ ◼

Slide 3

Slide 3 text

◆ ◼ ◆ ◼ ◆ ◼ ⚫ ⚫ ◼

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

◆ ◆ ◼ ◆ ◼ ◼ ◼

Slide 6

Slide 6 text

✓ ✓ ✓ ✓ ✓ ✓

Slide 7

Slide 7 text

◆ ◆ ◼ ◼ ◆ ◼ ◼

Slide 8

Slide 8 text

Controller Workers Workers Workers Workers Reconciler Informer

Slide 9

Slide 9 text

Controller Workers Workers Workers Workers Reconciler Informer

Slide 10

Slide 10 text

◆ ◼ ◆ ◼ ◼ ◆ ◼ ◼

Slide 11

Slide 11 text

◆ ◼ ◼ ◆ ◼ ◼ ◆ ◼ ◼

Slide 12

Slide 12 text

◆ ◼ ◼ ◆

Slide 13

Slide 13 text

◆ ◼ https://github.com/kubernetes/enhancements/issues/1602 ◆ ◼ https://kubernetes.io/docs/reference/instrumentation/metrics/ ◆ ◼ https://kubernetes.io/docs/concepts/cluster-administration/system-traces/

Slide 14

Slide 14 text

◆ ◼ ◼ ◼ ⚫ ⚫ https://cybozu-go.github.io/moco/metrics.html ⚫

Slide 15

Slide 15 text

◆ ◼ ◼ ⚫ ⚫ ⚫ ◼

Slide 16

Slide 16 text

◆ ◼ ◼ ◼ ◆ ◼ ◼ ◼ https://github.com/cybozu-go/moco/pull/500

Slide 17

Slide 17 text

◆ ◼ ◼ ◼ ◼ ◼ ◆

Slide 18

Slide 18 text

◆ ◼ ◼ ⚫ ◆ ◼

Slide 19

Slide 19 text

◆ ◆ ◼ ◼ ◼ ◼ ◼ ◼

Slide 20

Slide 20 text

◆ ◼ ◆ ◆ ◼ ◆

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

◆ ◼ ◼ ◆ ◼ ◼ ◆ ◼

Slide 23

Slide 23 text

Kubernetes Cluster Application Controller ArgoCD Server Repo Server Application Resource Application Resource

Slide 24

Slide 24 text

application-controller Workers Workers Workers Workers Status Processors Workers Workers Operation Processors Application Resource Informer Informer watch Events Application Resource

Slide 25

Slide 25 text

◆ ◆ ◼ ◼ ◼

Slide 26

Slide 26 text

◆ ◼ ◆ ◼ ◆ ◼ ◆ ◼

Slide 27

Slide 27 text

◆ ◼ ◼ ◆

Slide 28

Slide 28 text

◆ ◼ ◼

Slide 29

Slide 29 text

application-controller Workers Workers Workers Workers Status Processors Workers Workers Operation Processors Application Resource Informer Informer watch Events

Slide 30

Slide 30 text

◆ ◼ ◼ ◆ ◼ ◆ ◼

Slide 31

Slide 31 text

◆ ◆

Slide 32

Slide 32 text

◆ ◼ ◼ ◼ ◆

Slide 33

Slide 33 text

workqueue_depth{job="kube-controller-manager",name="volumes"}

Slide 34

Slide 34 text

histogram_quantile(0.99, sum(rate( rest_client_rate_limiter_duration_seconds_bucket{ job="kube-controller-manager" }[1m] )) by (le))

Slide 35

Slide 35 text

kube-controller-manager PersistentVolume Controller

Slide 36

Slide 36 text

◆ ◼ --kube-api-qps ◆ ◼ ◆ ◼ ◼

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

◆ ◆ ◆

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

◆ ◼ https://github.com/zoetrope/kubbernecker ◼ ◼ ⚫ ◼ ⚫ ⚫

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

# Reconcile 99 histogram_quantile(0.99, sum( rate(controller_runtime_reconcile_time_seconds_bucket[1m]) ) by(job, controller, le) ) # Reconcile sum(rate(controller_runtime_reconcile_total[1m]))by(job, controller, result)

Slide 43

Slide 43 text

# 99 histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket[1m])) by(job, name, le)) # sum(workqueue_depth) by (job, name)

Slide 44

Slide 44 text

◆ ◆ import ( "context" "net/url" "time" "github.com/prometheus/client_golang/prometheus" clmetrics "k8s.io/client-go/tools/metrics" crmetrics "sigs.k8s.io/controller-runtime/pkg/metrics" ) var ( rateLimiterDelay = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "rest_client_rate_limiter_duration_seconds", Help: "client-go rate limiter delay in seconds. Broken down by verb, and host.", Buckets: []float64{0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15.0, 30.0, 60.0}, }, []string{"verb", "host"}, ) _ clmetrics.LatencyMetric = &latencyAdapter{} ) func init() { crmetrics.Registry.MustRegister(rateLimiterDelay) adapter := latencyAdapter{ metric: rateLimiterDelay, } clmetrics.RateLimiterLatency = &adapter } type latencyAdapter struct { metric *prometheus.HistogramVec } func (c *latencyAdapter) Observe(_ context.Context, verb string, u url.URL, latency time.Duration) { c.metric.WithLabelValues(verb, u.Host).Observe(latency.Seconds()) }

Slide 45

Slide 45 text

# Rate Limiter 99 histogram_quantile(0.99, sum( rate(rest_client_rate_limiter_duration_seconds_bucket[1m]) ) by(job, verb, le) )

Slide 46

Slide 46 text

# Application Reconcile Status Processor {job=~"argocd/argocd-application-controller"} | logfmt | msg ="Reconciliation completed" | line_format "{{.application}}: {{.time_ms}}" # Application Reconcile Operation Processor {job=~"argocd/argocd-application-controller"} | logfmt | msg = "sync/terminate complete" | line_format "{{.application}}: {{.duration}}"

Slide 47

Slide 47 text

# {job=~"argocd/argocd-application-controller"} | logfmt | level = "debug" msg =~ "Refreshing app .*" apiVersion: v1 kind: ConfigMap metadata: name: argocd-cmd-params-cm data: # Application Controller debug default "info" controller.log.level: "debug"

Slide 48

Slide 48 text

Slide 49

Slide 49 text

◆ $ kubectl port-forward svc/argocd-application-controller-metrics -n argocd 8082:8082 # 30 $ curl localhost:8082/debug/pprof/profile > cpu.pprof # goroutine $ curl localhost:8082/debug/pprof/goroutine?debug=1

Slide 50

Slide 50 text

◆ ◆ --otlp-address ◆

Slide 51

Slide 51 text

apiVersion: v1 kind: ConfigMap metadata: name: argocd-cmd-params-cm data: # Number of application status processors (default 20) controller.status.processors: "20" # Number of application operation processors (default 10) controller.operation.processors: "10" ◆ ◆

Slide 52

Slide 52 text

import ctrl "sigs.k8s.io/controller-runtime" // ・・・途中省略・・・ cfg, err := ctrl.GetConfig() if err != nil { return err } cfg.QPS = 50 cfg.Burst = int(cfg.QPS * 1.5) mgr, err := ctrl.NewManager(cfg, ctrl.Options{ ... })