Beyond Dashboarding The Grafana Observability Stack by Steve Caron

Beyond Dashboarding The Grafana Observability Stack Steve Caron Staff Solutions
Engineer

How most people started with Grafana

Loki for logs Grafana for visualizations Tempo for traces Mimir
for metrics

Grafana Mimir

Running Prometheus at scale Prometheus is great but… Out of
the box, Prometheus scales only vertically. Centralised view of metrics can only be achieved using hierarchical federation or cross-service federation. Prometheus use local storage. Traditionally retention is set to 15 days and rarely above 30 days. No authentication mechanism or role based access controls for protecting your data. Limited horizontal scalability No robust federation Not designed for long term retention No security model

Prometheus on steroids: Mimir = Mimir + Durable storage Blazing
fast query performance Production-proven dashboards, alerts, and playbooks High availability Horizontal scalability Real multi-tenancy Prometheus

Running Prometheus at scale Application 1 Application 2 Application N
Region A Application 1 Application 2 Application N Region B Remote write Remote write Remote write Remote write For Prometheus users • Leverage your existing investment by using Prometheus as a Metrics forwarder. • 100% compatible with your existing queries, alerts and recording rules are . For all • Get started in a few clicks using the Grafana agent (embeds the Prometheus agent). • Query your Mimir metrics using Grafana. Query

Grafana Loki

Who did we make Loki for? Effective Debugging and troubleshooting
of applications Visualise and alert on services/apps performance metrics Build actionable insights from log data and other supported data sources DevOps SRE DataEng

Format agnostic Efficient at scale Why do they like Loki?
Built for correlation Logs as metrics

Under the hood 2019-12-11T10:01:02.123456789Z {app=”nginx”, env=”dev”} Timestamp with nanosecond precision
Log content JSON, logfmt, custom, etc. Labels/Selectors key-value pairs Indexed Unindexed GET /about 1034 Debug “page not found”

` Get the most out of your logs with LogQL
• Inspired from PromQL syntax for effortless correlations between Metrics and Logs. • Build Metrics from Logs and unlock new use cases. • Use your LogQL queries for creating advanced alerting rules. {app=”nginx”,instance=”1.1.1.1”} Label matchers != "Googlebot/" Line ﬁlters | json Parser | request_time >= 100 and status == 200 Label ﬁlters *Successful requests with a latency superior to 100ms (Googlebot requests excluded)

Promtail Makes logs collection easy with... • Targets discovery for
Kubernetes, Syslog, files and more • Automatically attach labels to your log lines • Advanced pipeline mechanism for parsing, transforming and filtering your logs • Build and expose custom metrics from your logs data But Loki is open. logstash Lambda

Grafana Tempo

What is distributed tracing? A way to observe requests as
they propagate through a distributed system

How to get started with distributed tracing? Instrument your code
using agents and libraries to generate spans for your services. Use tracing pipelines to collect, transform and enrich spans. Store all the traces for querying and building more insights. Use Grafana to detect and investigate service issues. Correlate your traces with metrics and log data. Instrument Collect Store Visualize

• traces_spanmetrics_calls_total - Counter, Total count of the span (Rate,
Error) • traces_spanmetrics_latency - Histogram, Duration of the span (Duration) (includes Exemplars) • traces_spanmetrics_size_total - Counter, Total size of spans ingested (Volume) Metrics Generation

| {} TraceQL { .namespace = “prod” } > {
.service.name = “auth” && { .http.status_code = 500 } { .http.status_code = 500 } | count() > 1 Inspired by PromQL and LogQL Extract insights from traces interactively Analyze traces based on their structure >

Monolithic mode • Simplest deployment mode • All components in
one single process • Great for testing Microservices mode • Maximum scalability • Separate Read/Write paths • Recommended for production deployments and large volumes How to run the Grafana stack? Grafana Cloud • Fully managed by Grafana • Available in 7 regions • Free-forever tier (50GB logs and traces per month, 10K active series, 3 users.

Simplified architecture

How to collect your telemetry data The community way Or
The Grafana way

Anatomy of the Grafana Agent Metrics - Shares the same
codebase as the Prometheus Agent. Logs - Embeds Promtail, the log forwarder built by Grafana, for Loki. Traces - Based on OpenTelemetry Collector.

+ An open source, highly scalable and cost efficient continuous
profiling database

An open source web SDK for frontend application observability 1.5M+
NPM Downloads

Open source eBPF auto-instrumentation for application observability

Have more questions? Join us at community.grafana.com or Grafana public
slack: slack.grafana.com #channel grafana/ community.grafana.com Get involved: Thank you!

Beyond Dashboarding The Grafana Observability S...

Beyond Dashboarding The Grafana Observability Stack by Steve Caron

cncf-canada-meetups

More Decks by cncf-canada-meetups

Other Decks in Technology

Featured

Transcript

Beyond Dashboarding The Grafana Observability Stack Steve Caron Staff Solutions

How most people started with Grafana

Loki for logs Grafana for visualizations Tempo for traces Mimir

Grafana Mimir

Running Prometheus at scale Prometheus is great but… Out of

Prometheus on steroids: Mimir = Mimir + Durable storage Blazing

Running Prometheus at scale Application 1 Application 2 Application N

Grafana Loki

Who did we make Loki for? Effective Debugging and troubleshooting

Format agnostic Efficient at scale Why do they like Loki?

Under the hood 2019-12-11T10:01:02.123456789Z {app=”nginx”, env=”dev”} Timestamp with nanosecond precision

` Get the most out of your logs with LogQL

Promtail Makes logs collection easy with... • Targets discovery for

Grafana Tempo

What is distributed tracing? A way to observe requests as

How to get started with distributed tracing? Instrument your code

• traces_spanmetrics_calls_total - Counter, Total count of the span (Rate,

| {} TraceQL { .namespace = “prod” } > {

Monolithic mode • Simplest deployment mode • All components in

Simplified architecture

How to collect your telemetry data The community way Or

Anatomy of the Grafana Agent Metrics - Shares the same

Demo

+ An open source, highly scalable and cost efficient continuous

An open source web SDK for frontend application observability 1.5M+

Open source eBPF auto-instrumentation for application observability

Have more questions? Join us at community.grafana.com or Grafana public