Slide 1

Slide 1 text

Logging is coming to Grafana David Kaltschmidt @davkals OSMC 2018

Slide 2

Slide 2 text

I’m David All things UX at Grafana Labs If you click and are stuck, reach out to me. [email protected] Twitter: @davkals

Slide 3

Slide 3 text

Outline ● Quick intro ● What’s new since 5.0 ● Logging ● Towards 6.0

Slide 4

Slide 4 text

Grafana intro

Slide 5

Slide 5 text

Grafana From Dashboarding solution To Observability platform

Slide 6

Slide 6 text

Unified way to look at data from different sources Logos of datasources

Slide 7

Slide 7 text

Custom data sources http://docs.grafana.org/plugins/developing/datasources/

Slide 8

Slide 8 text

Create dashboards

Slide 9

Slide 9 text

Define alerts ● Direct manipulation ● Timeseries-based alerts evaluated per panel on the Grafana server

Slide 10

Slide 10 text

Grafana adoption 2016 2017 2018 36K 92K 186K Mid-year

Slide 11

Slide 11 text

New since 5.0

Slide 12

Slide 12 text

Heatmap panel released in 5.1 Prometheus query example: rate(foo_metric_bucket[10m]) Legend format: {{le}} Format as: Heatmap

Slide 13

Slide 13 text

Datasource updates ● New: MS SQL Server ● New: Google Stackdriver ● New: Flux (Influx, BETA) ● ElasticSearch alerting ● Postgres query builder

Slide 14

Slide 14 text

Provisioning API Define data sources and dashboards in files Auto-reload on change Allows version control of files http://docs.grafana.org/administration/provisioning/

Slide 15

Slide 15 text

Grafana is now fully CI’ed With ARM and Windows builds Test new features that are in master: docker run -d --name=grafana -p 3000:3000 grafana/grafana:master https://hub.docker.com/r/grafana/grafana/

Slide 16

Slide 16 text

New: Explore UI (Beta) with Logging

Slide 17

Slide 17 text

Troubleshooting journey

Slide 18

Slide 18 text

Problems once panel is found, it’s difficult to interact with overwhelming style and display options

Slide 19

Slide 19 text

Explore UI wireframes rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s First tab Second tab 3rd tab My tab ╳

Slide 20

Slide 20 text

Now add logging...

Slide 21

Slide 21 text

Extended Explore to have metrics and logs side-by-side rate(http_requests_total{job=”app1”}[5m]) GRAPH TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total{job=”app1”}[5m]) rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s {job=”app1”} DATASOURCE Last 1 hour, Refresh: 10s RUN 4.2s LOGS level=info ts=2018-11-05T17:13:48.774738335Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.4.2, branch=master, revision=3e6b9d43c36921e318a8722772160be4184ddad5)" level=info ts=2018-11-05T17:13:48.775413199Z caller=main.go:245 build_context="(go=go1.10.3, [email protected], date=20181011-08:29:54)" level=info ts=2018-11-05T17:13:48.77545838Z caller=main.go:246 host_details=(darwin) level=info ts=2018-11-05T17:13:48.775499098Z caller=main.go:247 fd_limits="(soft=256, hard=unlimited)" level=info ts=2018-11-05T17:13:48.775545138Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2018-11-05T17:13:48.777071286Z caller=main.go:562 msg="Starting TSDB ..." level=info ts=2018-11-05T17:13:48.778020546Z caller=web.go:399 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2018-11-05T17:13:48.807390226Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539583200000 maxt=1539648000000 ulid=01CT0XT8W5N1E07K3ZQ5PGPFHM level=info ts=2018-11-05T17:13:48.807946341Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539648000000 maxt=1539712800000 ulid=01CT0XT9051Q2D6Q4FD1CN52BG level=info ts=2018-11-05T17:13:48.808972634Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539712800000 maxt=1539777600000 ulid=01CT18NCBATPKCZ9PVFPMSEZD6

Slide 22

Slide 22 text

Demo: http://localhost:3000/explore

Slide 23

Slide 23 text

Goal: Keeping it simple https://twitter.com/alicegoldfuss/status/981947777256079360

Slide 24

Slide 24 text

Logging for Kubernetes {job=”app1”} {job=”app3”} {job=”app2”}

Slide 25

Slide 25 text

Logging for Kubernetes (2) {job=”app1”} {job=”app3”} {job=”app2”}

Slide 26

Slide 26 text

Service Discovery for Grafana Logging ● Prometheus-style service discovery of logging targets ● Labels are indexed as metadata, e.g.: {job=”app1”} ● Relabeling rules

Slide 27

Slide 27 text

Logging architecture {job=”app1”} {job=”app2”} Node Logging agent Logging service Logging datasource

Slide 28

Slide 28 text

Logging TODOs ● Dedup logic ● Pattern engine that emits time series ● Triggers/webhooks ● Cost-effective

Slide 29

Slide 29 text

Logging (BETA) ● Need lots of feedback: [email protected] ● OSS Logging BETA ready in Dec 2018

Slide 30

Slide 30 text

Enable Explore UI (BETA: Prometheus) Behind feature flag. To enable, edit Grafana config ini file [explore] enabled = true Set up a datasource that supports Explore, e.g., Prometheus. Will be released in 6.0 (Feb 2019)

Slide 31

Slide 31 text

What we’re working on

Slide 32

Slide 32 text

Explore UI needs to be refined still behind feature flag, feedback welcome: @davkals or [email protected] UX improvements on logs and metrics views Unify query editors for Explore and dashboards Performance improvements

Slide 33

Slide 33 text

MultiStat panel https://github.com/grafana/grafana/pull/12620

Slide 34

Slide 34 text

New graph panel controller to quickly iterate how to visualize

Slide 35

Slide 35 text

Datasources for all 3 major clouds

Slide 36

Slide 36 text

Dashboard management: Git integration and custom defaults RFCs waiting for feedback: Dashboard changes trigger GitHub PR: https://github.com/grafana/grafana/issues/13823 Reference panels for custom defaults: https://github.com/grafana/grafana/issues/13888

Slide 37

Slide 37 text

One last thing...

Slide 38

Slide 38 text

https://www.grafanacon.org/2019/

Slide 39

Slide 39 text

Tack for listening UX feedback to [email protected] @davkals