OSMC 2018: Logging is coming to Grafana

Logging is coming to Grafana David Kaltschmidt @davkals OSMC 2018

I’m David All things UX at Grafana Labs If you
click and are stuck, reach out to me. [email protected] Twitter: @davkals

Outline • Quick intro • What’s new since 5.0 •
Logging • Towards 6.0

Grafana intro

Grafana From Dashboarding solution To Observability platform

Unified way to look at data from different sources Logos
of datasources

Custom data sources http://docs.grafana.org/plugins/developing/datasources/

Create dashboards

Define alerts • Direct manipulation • Timeseries-based alerts evaluated per
panel on the Grafana server

Grafana adoption 2016 2017 2018 36K 92K 186K Mid-year

New since 5.0

Heatmap panel released in 5.1 Prometheus query example: rate(foo_metric_bucket[10m]) Legend
format: {{le}} Format as: Heatmap

Datasource updates • New: MS SQL Server • New: Google
Stackdriver • New: Flux (Influx, BETA) • ElasticSearch alerting • Postgres query builder

Provisioning API Define data sources and dashboards in files Auto-reload
on change Allows version control of files http://docs.grafana.org/administration/provisioning/

Grafana is now fully CI’ed With ARM and Windows builds
Test new features that are in master: docker run -d --name=grafana -p 3000:3000 grafana/grafana:master https://hub.docker.com/r/grafana/grafana/

New: Explore UI (Beta) with Logging

Troubleshooting journey

Problems once panel is found, it’s difficult to interact with
overwhelming style and display options

Explore UI wireframes rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour,
Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s First tab Second tab 3rd tab My tab ╳

Now add logging...

Extended Explore to have metrics and logs side-by-side rate(http_requests_total{job=”app1”}[5m]) GRAPH
TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total{job=”app1”}[5m]) rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s {job=”app1”} DATASOURCE Last 1 hour, Refresh: 10s RUN 4.2s LOGS level=info ts=2018-11-05T17:13:48.774738335Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.4.2, branch=master, revision=3e6b9d43c36921e318a8722772160be4184ddad5)" level=info ts=2018-11-05T17:13:48.775413199Z caller=main.go:245 build_context="(go=go1.10.3, [email protected], date=20181011-08:29:54)" level=info ts=2018-11-05T17:13:48.77545838Z caller=main.go:246 host_details=(darwin) level=info ts=2018-11-05T17:13:48.775499098Z caller=main.go:247 fd_limits="(soft=256, hard=unlimited)" level=info ts=2018-11-05T17:13:48.775545138Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2018-11-05T17:13:48.777071286Z caller=main.go:562 msg="Starting TSDB ..." level=info ts=2018-11-05T17:13:48.778020546Z caller=web.go:399 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2018-11-05T17:13:48.807390226Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539583200000 maxt=1539648000000 ulid=01CT0XT8W5N1E07K3ZQ5PGPFHM level=info ts=2018-11-05T17:13:48.807946341Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539648000000 maxt=1539712800000 ulid=01CT0XT9051Q2D6Q4FD1CN52BG level=info ts=2018-11-05T17:13:48.808972634Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539712800000 maxt=1539777600000 ulid=01CT18NCBATPKCZ9PVFPMSEZD6

Demo: http://localhost:3000/explore

Goal: Keeping it simple https://twitter.com/alicegoldfuss/status/981947777256079360

Logging for Kubernetes {job=”app1”} {job=”app3”} {job=”app2”}

Logging for Kubernetes (2) {job=”app1”} {job=”app3”} {job=”app2”}

Service Discovery for Grafana Logging • Prometheus-style service discovery of
logging targets • Labels are indexed as metadata, e.g.: {job=”app1”} • Relabeling rules

Logging architecture {job=”app1”} {job=”app2”} Node Logging agent Logging service Logging
datasource

Logging TODOs • Dedup logic • Pattern engine that emits
time series • Triggers/webhooks • Cost-effective

Logging (BETA) • Need lots of feedback: [email protected] • OSS
Logging BETA ready in Dec 2018

Enable Explore UI (BETA: Prometheus) Behind feature flag. To enable,
edit Grafana config ini file [explore] enabled = true Set up a datasource that supports Explore, e.g., Prometheus. Will be released in 6.0 (Feb 2019)

What we’re working on

Explore UI needs to be refined still behind feature flag,
feedback welcome: @davkals or [email protected] UX improvements on logs and metrics views Unify query editors for Explore and dashboards Performance improvements

MultiStat panel https://github.com/grafana/grafana/pull/12620

New graph panel controller to quickly iterate how to visualize

Datasources for all 3 major clouds

Dashboard management: Git integration and custom defaults RFCs waiting for
feedback: Dashboard changes trigger GitHub PR: https://github.com/grafana/grafana/issues/13823 Reference panels for custom defaults: https://github.com/grafana/grafana/issues/13888

One last thing...

https://www.grafanacon.org/2019/

Tack for listening UX feedback to [email protected] @davkals

OSMC 2018: Logging is coming to Grafana

OSMC 2018: Logging is coming to Grafana

More Decks by David

Other Decks in Technology

Featured

Transcript