OSMC 2018: Logging is coming to Grafana

343deb2fbfa0aff9fc98d9b439eb036c?s=47 David
November 06, 2018

OSMC 2018: Logging is coming to Grafana

A preview of Grafana's upcoming log aggregation solution. Also show-casing the new Explore UI for Prometheus.

343deb2fbfa0aff9fc98d9b439eb036c?s=128

David

November 06, 2018
Tweet

Transcript

  1. Logging is coming to Grafana David Kaltschmidt @davkals OSMC 2018

  2. I’m David All things UX at Grafana Labs If you

    click and are stuck, reach out to me. david@grafana.com Twitter: @davkals
  3. Outline • Quick intro • What’s new since 5.0 •

    Logging • Towards 6.0
  4. Grafana intro

  5. Grafana From Dashboarding solution To Observability platform

  6. Unified way to look at data from different sources Logos

    of datasources
  7. Custom data sources http://docs.grafana.org/plugins/developing/datasources/

  8. Create dashboards

  9. Define alerts • Direct manipulation • Timeseries-based alerts evaluated per

    panel on the Grafana server
  10. Grafana adoption 2016 2017 2018 36K 92K 186K Mid-year

  11. New since 5.0

  12. Heatmap panel released in 5.1 Prometheus query example: rate(foo_metric_bucket[10m]) Legend

    format: {{le}} Format as: Heatmap
  13. Datasource updates • New: MS SQL Server • New: Google

    Stackdriver • New: Flux (Influx, BETA) • ElasticSearch alerting • Postgres query builder
  14. Provisioning API Define data sources and dashboards in files Auto-reload

    on change Allows version control of files http://docs.grafana.org/administration/provisioning/
  15. Grafana is now fully CI’ed With ARM and Windows builds

    Test new features that are in master: docker run -d --name=grafana -p 3000:3000 grafana/grafana:master https://hub.docker.com/r/grafana/grafana/
  16. New: Explore UI (Beta) with Logging

  17. Troubleshooting journey

  18. Problems once panel is found, it’s difficult to interact with

    overwhelming style and display options
  19. Explore UI wireframes rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour,

    Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s rate(http_requests_total[5m]) GRAPH TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total[5m]) . . . rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s First tab Second tab 3rd tab My tab ╳
  20. Now add logging...

  21. Extended Explore to have metrics and logs side-by-side rate(http_requests_total{job=”app1”}[5m]) GRAPH

    TABLE BOTH Last 1 hour, Refresh: 10s RUN 1 - rate(http_requests_total{job=”app1”}[5m]) rate(http_requests_total[5m]) 1 - rate(http_requests_total[5m]) 4.2s 3.2s {job=”app1”} DATASOURCE Last 1 hour, Refresh: 10s RUN 4.2s LOGS level=info ts=2018-11-05T17:13:48.774738335Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.4.2, branch=master, revision=3e6b9d43c36921e318a8722772160be4184ddad5)" level=info ts=2018-11-05T17:13:48.775413199Z caller=main.go:245 build_context="(go=go1.10.3, user=david@kenobi.fritz.box, date=20181011-08:29:54)" level=info ts=2018-11-05T17:13:48.77545838Z caller=main.go:246 host_details=(darwin) level=info ts=2018-11-05T17:13:48.775499098Z caller=main.go:247 fd_limits="(soft=256, hard=unlimited)" level=info ts=2018-11-05T17:13:48.775545138Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2018-11-05T17:13:48.777071286Z caller=main.go:562 msg="Starting TSDB ..." level=info ts=2018-11-05T17:13:48.778020546Z caller=web.go:399 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2018-11-05T17:13:48.807390226Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539583200000 maxt=1539648000000 ulid=01CT0XT8W5N1E07K3ZQ5PGPFHM level=info ts=2018-11-05T17:13:48.807946341Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539648000000 maxt=1539712800000 ulid=01CT0XT9051Q2D6Q4FD1CN52BG level=info ts=2018-11-05T17:13:48.808972634Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539712800000 maxt=1539777600000 ulid=01CT18NCBATPKCZ9PVFPMSEZD6
  22. Demo: http://localhost:3000/explore

  23. Goal: Keeping it simple https://twitter.com/alicegoldfuss/status/981947777256079360

  24. Logging for Kubernetes {job=”app1”} {job=”app3”} {job=”app2”}

  25. Logging for Kubernetes (2) {job=”app1”} {job=”app3”} {job=”app2”}

  26. Service Discovery for Grafana Logging • Prometheus-style service discovery of

    logging targets • Labels are indexed as metadata, e.g.: {job=”app1”} • Relabeling rules
  27. Logging architecture {job=”app1”} {job=”app2”} Node Logging agent Logging service Logging

    datasource
  28. Logging TODOs • Dedup logic • Pattern engine that emits

    time series • Triggers/webhooks • Cost-effective
  29. Logging (BETA) • Need lots of feedback: david@grafana.com • OSS

    Logging BETA ready in Dec 2018
  30. Enable Explore UI (BETA: Prometheus) Behind feature flag. To enable,

    edit Grafana config ini file [explore] enabled = true Set up a datasource that supports Explore, e.g., Prometheus. Will be released in 6.0 (Feb 2019)
  31. What we’re working on

  32. Explore UI needs to be refined still behind feature flag,

    feedback welcome: @davkals or david@grafana.com UX improvements on logs and metrics views Unify query editors for Explore and dashboards Performance improvements
  33. MultiStat panel https://github.com/grafana/grafana/pull/12620

  34. New graph panel controller to quickly iterate how to visualize

  35. Datasources for all 3 major clouds

  36. Dashboard management: Git integration and custom defaults RFCs waiting for

    feedback: Dashboard changes trigger GitHub PR: https://github.com/grafana/grafana/issues/13823 Reference panels for custom defaults: https://github.com/grafana/grafana/issues/13888
  37. One last thing...

  38. https://www.grafanacon.org/2019/

  39. Tack for listening UX feedback to david@grafana.com @davkals