Slide 1

Slide 1 text

Metriken: von Spring Boot bis zu Grafana-Charts Frank Gerberding, 10.11.2021

Slide 2

Slide 2 text

@ MilesBehind69 2 Metrics & 
 Dashboards

Slide 3

Slide 3 text

@ MilesBehind69 3 „Man muss messen, was messbar ist, und messbar machen, was noch nicht messbar ist.“ ( Archimedes) „There is only one boss. The customer.“ ( Sam Walton) @ MilesBehind69

Slide 4

Slide 4 text

@ MilesBehind69 4 Iteration User/Customer Development Analysis learn build measure

Slide 5

Slide 5 text

@ MilesBehind69 5 @ MilesBehind69 3 Pillars of Observability Logging Tracing Metrics

Slide 6

Slide 6 text

@ MilesBehind69 6 Informationen und deren Nutzung Über das Verhalten der Anwender Über technische interne Details Verbesserung bei UX Verbesserung der Stabilität des Systems Alerting (bei Über-/Unterschreitung von Schwellwerten) Post-Mortem-Analyse bei Ausfällen des Systems Informationen Nutzung

Slide 7

Slide 7 text

@ MilesBehind69 7 Fragen und Messungen ( 1 ) Wo muss die Performance 
 verbessert werden? Wie lange dauern 
 Requests ins Backend? Frage Messung

Slide 8

Slide 8 text

@ MilesBehind69 8 Fragen und Messungen ( 2 ) Wo liegen gute Wartungsfenster? Welche Zeitpunkte sind 
 für Mailings usw. günstig? Nutzung zu welchen 
 Tageszeiten und Wochentagen? Fragen Messung

Slide 9

Slide 9 text

@ MilesBehind69 9 Fragen und Messungen ( 3 ) Welche Funktionen sollten 
 wo im User Interface auftauchen? 
 
 Welche Funktionen werden kaum genutzt und können entfernt werden? Wie häufig werden 
 welche Funktionen genutzt? Frage Messung

Slide 10

Slide 10 text

@ MilesBehind69 10 Fragen und Messungen ( 4 ) Wie groß sollte ein Cache sein? Wie ist die Hit-Rate des Caches? Frage Messung

Slide 11

Slide 11 text

@ MilesBehind69 11 Fragen und Messungen ( 5 ) Welche Datenmengen 
 sind zu erwarten? Wie oft werden welche 
 neuen Daten gespeichert? Frage Messung

Slide 12

Slide 12 text

@ MilesBehind69 12 Tools/Frameworks/Libraries Spring Boot Micrometer Prometheus Grafana

Slide 13

Slide 13 text

@ MilesBehind69 13 Zusammenspiel Spring Boot 
 Application Spring Boot Actuator scrape Abstract 
 Micrometer 
 Interface Micrometer Prometheus 
 Implementation Application 
 Code Prometheus Grafana query 
 & 
 aggregate collect and store metrics graphical visualization HTTP ( S ) HTTP ( S )

Slide 14

Slide 14 text

@ MilesBehind69 14 Spring Boot Actuator + Micrometer/Prometheus Dependencies application.yml management: metrics: export: prometheus: enabled: true tags: environment: "development" endpoints: web: exposure: include: metrics, prometheus, health #--------------------------------- # production Profile #--------------------------------- --- spring: profiles: production management: metrics: tags: environment: "production" dependencies { 
 implementation("org.springframework.boot", "spring-boot-starter-actuator") implementation("io.micrometer", "micrometer-registry-prometheus", "1.7.5") 
 }

Slide 15

Slide 15 text

@ MilesBehind69 15 Micrometer-JVM - Dashboard (#4701 )

Slide 16

Slide 16 text

@ MilesBehind69 16 Zusammenspiel Spring Boot 
 Application Grafana collect and store metrics graphical visualization Spring Boot 
 Application Fabio load balancer Consul service discovery Prometheus Analysis balance scrape discover instances discover instances register Gatling load test

Slide 17

Slide 17 text

@ MilesBehind69 17 Zusammenspiel Spring Boot 
 Application Grafana collect and store metrics graphical visualization Spring Boot 
 Application Fabio load balancer Consul service discovery Prometheus Analysis balance scrape discover instances discover instances register Gatling load test

Slide 18

Slide 18 text

@ MilesBehind69 18 Zusammenspiel Spring Boot 
 Application Grafana collect and store metrics graphical visualization Spring Boot 
 Application Fabio load balancer Consul service discovery Prometheus Analysis balance scrape discover instances discover instances register Gatling load test

Slide 19

Slide 19 text

@ MilesBehind69 19 Metrics Counters Histograms and Percentiles Gauges Timers Distribution Summaries Long Task Timers

Slide 20

Slide 20 text

@ MilesBehind69 20 Metrics Counters Histograms and Percentiles Gauges Timers Distribution Summaries Long Task Timers

Slide 21

Slide 21 text

@ MilesBehind69 21 Counter val taxComputerCounter = Counter .builder("metrics_demo.tax_computer.count") .description("counts number of calls to tax computer") .baseUnit(BaseUnits.OPERATIONS) .register(meterRegistry) fun computeTax(…) { taxComputerCounter.increment() return … } Counter-Definition Counter-Nutzung

Slide 22

Slide 22 text

@ MilesBehind69 22 Counter and Rate http_server_requests_seconds_count{uri="/computeTaxRate"} rate(http_server_requests_seconds_count{uri="/computeTaxRate" } [5m])

Slide 23

Slide 23 text

@ MilesBehind69 23 Single Instance vs. Sum rate(http_server_requests_seconds_count{uri="/computeTaxRate" } [5m]) sum(rate(http_server_requests_seconds_count{uri="/computeTaxRate" } [5m]))

Slide 24

Slide 24 text

@ MilesBehind69 24 Counter mit Tags private val maximumIncome = 300_000 private val incomeGranularity = 10_000 private val taxableIncomeTenThousands: List = 
 (0..maximumIncome step incomeGranularity).map { 
 Counter .builder("metrics_demo.taxable_income.ten_thousands") .description("distribution of taxable incomes in steps of ten-thousand") .tag("ten_thousands", it.toString()) .register(meterRegistry) 
 } val bucketIndex = (taxableIncome / incomeGranularity) .coerceAtMost(taxableIncomeTenThousands.size - 1) taxableIncomeTenThousands[bucketIndex].increment() Counter-Definition Counter-Nutzung

Slide 25

Slide 25 text

@ MilesBehind69 25 Single Instance vs. Sum sum(increase(metrics_demo_taxable_income_ten_thousands_total[5m])) by (ten_thousands)

Slide 26

Slide 26 text

@ MilesBehind69 26 Counter in Multi-Node-Umgebungen Counter mit Single-Node: t node 1 t1 100 t5 220 t node 1 t1 100 t2 120 t3 180 t4 200 t5 220 query scrape ………….. ………….. t sum delta t1 180 0 t2 120 20 t3 180 60 t4 200 20 t5 220 20 Grafana Micrometer Prometheus

Slide 27

Slide 27 text

@ MilesBehind69 27 Counter in Multi-Node-Umgebungen Counter mit Single-Node: t node 1 t1 100 t5 220 t node 1 t1 100 t2 120 t3 180 t4 200 t5 220 query scrape ………….. ………….. Counter mit Multi-Nodes: t node 1 t1 100 t5 220 t node 1 node 2 t1 100 80 t2 120 110 t3 180 160 t4 200 190 t5 220 200 t sum delta t1 180 0 t2 230 50 t3 340 110 t4 390 50 t5 420 30 query scrape ….. ….. t node 2 t1 ——,80 t5 200 ….. ….. scrape t sum delta t1 100 0 t2 120 20 t3 180 60 t4 200 20 t5 220 20 Grafana Grafana Micrometer Prometheus Micrometer Micrometer Prometheus

Slide 28

Slide 28 text

@ MilesBehind69 28 Perzentile @ MilesBehind69

Slide 29

Slide 29 text

@ MilesBehind69 29 Perzentile 99% Percentile

Slide 30

Slide 30 text

@ MilesBehind69 30 Perzentile

Slide 31

Slide 31 text

@ MilesBehind69 31 Timer @GetMapping("/computeTaxRate") @Timed(description = "duration of tax computation requests", histogram = true) fun computeTaxRate(@RequestParam taxableIncome: Euro): ResponseEntity<…> { log.debug("compute tax rate for taxable income {}", taxableIncome) return ResponseEntity.ok(cachingTaxComputer.computeTax(taxableIncome)) } val taxComputerTimer = Timer .builder("metrics_demo.tax_computer.requests_time") .description("duration of tax computation") .publishPercentileHistogram() .register(meterRegistry) taxComputerTimer.record { … } taxComputerTimer.record( 
 Duration.ofMillis(…) ) Timer per Annotation Timer manuell

Slide 32

Slide 32 text

@ MilesBehind69 32 Perzentile in Multi-Node-Umgebungen Counter mit Single-Node: percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms query scrape percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms Micrometer Prometheus Grafana

Slide 33

Slide 33 text

@ MilesBehind69 33 Perzentile in Multi-Node-Umgebungen Counter mit Single-Node: percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms query scrape Counter mit Multi-Nodes: query scrape scrape percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms Micrometer Prometheus Grafana Micrometer Micrometer Prometheus Grafana percentile limit 80 % < 80ms 90 % < 120ms 95 % < 140ms 99 % < 200ms percentile limit 80 % < 70ms 90 % < 110ms 95 % < 130ms 99 % < 180ms 218180ms percentile node 1 node 2 80 % < 80ms 80ms < 70ms 80ms 90 % < 120ms < 110ms 95 % < 140ms < 130ms 99 % < 200ms < 180ms percentile sum 80 % ??? 90 % ??? 95 % ??? 99 % ???

Slide 34

Slide 34 text

@ MilesBehind69 34 Perzentile in Multi-Node-Umgebungen mit Buckets Counter mit Multi-Nodes: query scrape scrape Micrometer Micrometer Prometheus Grafana le count 1 ms 1 2 ms 1 4 ms 2 8 ms 2 16 ms 3 32 ms 5 64 ms 5 le node 1 node 2 1 ms 1 0 2 ms 1 1 4 ms 2 1 8 ms 2 3 16 ms 3 3 32 ms 5 4 64 ms 5 5 le sum 1 ms 1 2 ms 2 4 ms 3 8 ms 5 16 ms 6 32 ms 9 64 ms 10 record 1 ms 3 ms 10 ms 20 ms 25 ms Micrometer le count 1 ms 0 2 ms 1 4 ms 1 8 ms 3 16 ms 3 32 ms 4 64 ms 5 record 2 ms 5 ms 8 ms 30 ms 33 ms pctl ms 50 % 8 ms 80 % 32 ms 90 % 32 ms

Slide 35

Slide 35 text

@ MilesBehind69 35 Perzentile histogram_quantile(0.90, sum(rate(http_server_requests_seconds_bucket[5m])) by (le)) histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket[5m])) by (le)) histogram_quantile(0.99, sum(rate(http_server_requests_seconds_bucket[5m])) by (le))

Slide 36

Slide 36 text

@ MilesBehind69 36 @ MilesBehind69 Demo

Slide 37

Slide 37 text

@ MilesBehind69 37 Wrap-Up Welche Informationen 
 sind nützlich? Welche Metriken 
 sind geeignet? Implementierung der Metriken im Code Umsetzung in Dashboards Analyse 
 der Daten Anpassungen in UI und Backend Last-Tests

Slide 38

Slide 38 text

@ MilesBehind69 38 @ MilesBehind69

Slide 39

Slide 39 text

Vielen Dank für 
 Eure Aufmerksamkeit! https://speakerdeck.com/milesbehind69 
 http://frank.gerberding.blog https://twitter.com/milesbehind69 
 https://github.com/Frank-Gerberding/metrics-demo