Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Dashboarding The Grafana Observability Stack by Steve Caron

Beyond Dashboarding The Grafana Observability Stack by Steve Caron

cncf-canada-meetups

October 19, 2023
Tweet

More Decks by cncf-canada-meetups

Other Decks in Technology

Transcript

  1. Running Prometheus at scale Prometheus is great but… Out of

    the box, Prometheus scales only vertically. Centralised view of metrics can only be achieved using hierarchical federation or cross-service federation. Prometheus use local storage. Traditionally retention is set to 15 days and rarely above 30 days. No authentication mechanism or role based access controls for protecting your data. Limited horizontal scalability No robust federation Not designed for long term retention No security model
  2. Prometheus on steroids: Mimir = Mimir + Durable storage Blazing

    fast query performance Production-proven dashboards, alerts, and playbooks High availability Horizontal scalability Real multi-tenancy Prometheus
  3. Running Prometheus at scale Application 1 Application 2 Application N

    Region A Application 1 Application 2 Application N Region B Remote write Remote write Remote write Remote write For Prometheus users • Leverage your existing investment by using Prometheus as a Metrics forwarder. • 100% compatible with your existing queries, alerts and recording rules are . For all • Get started in a few clicks using the Grafana agent (embeds the Prometheus agent). • Query your Mimir metrics using Grafana. Query
  4. Who did we make Loki for? Effective Debugging and troubleshooting

    of applications Visualise and alert on services/apps performance metrics Build actionable insights from log data and other supported data sources DevOps SRE DataEng
  5. Format agnostic Efficient at scale Why do they like Loki?

    Built for correlation Logs as metrics
  6. Under the hood 2019-12-11T10:01:02.123456789Z {app=”nginx”, env=”dev”} Timestamp with nanosecond precision

    Log content JSON, logfmt, custom, etc. Labels/Selectors key-value pairs Indexed Unindexed GET /about 1034 Debug “page not found”
  7. ` Get the most out of your logs with LogQL

    • Inspired from PromQL syntax for effortless correlations between Metrics and Logs. • Build Metrics from Logs and unlock new use cases. • Use your LogQL queries for creating advanced alerting rules. {app=”nginx”,instance=”1.1.1.1”} Label matchers != "Googlebot/" Line filters | json Parser | request_time >= 100 and status == 200 Label filters *Successful requests with a latency superior to 100ms (Googlebot requests excluded)
  8. Promtail Makes logs collection easy with... • Targets discovery for

    Kubernetes, Syslog, files and more • Automatically attach labels to your log lines • Advanced pipeline mechanism for parsing, transforming and filtering your logs • Build and expose custom metrics from your logs data But Loki is open. logstash Lambda
  9. What is distributed tracing? A way to observe requests as

    they propagate through a distributed system
  10. How to get started with distributed tracing? Instrument your code

    using agents and libraries to generate spans for your services. Use tracing pipelines to collect, transform and enrich spans. Store all the traces for querying and building more insights. Use Grafana to detect and investigate service issues. Correlate your traces with metrics and log data. Instrument Collect Store Visualize
  11. • traces_spanmetrics_calls_total - Counter, Total count of the span (Rate,

    Error) • traces_spanmetrics_latency - Histogram, Duration of the span (Duration) (includes Exemplars) • traces_spanmetrics_size_total - Counter, Total size of spans ingested (Volume) Metrics Generation
  12. | {} TraceQL { .namespace = “prod” } > {

    .service.name = “auth” && { .http.status_code = 500 } { .http.status_code = 500 } | count() > 1 Inspired by PromQL and LogQL Extract insights from traces interactively Analyze traces based on their structure >
  13. Monolithic mode • Simplest deployment mode • All components in

    one single process • Great for testing Microservices mode • Maximum scalability • Separate Read/Write paths • Recommended for production deployments and large volumes How to run the Grafana stack? Grafana Cloud • Fully managed by Grafana • Available in 7 regions • Free-forever tier (50GB logs and traces per month, 10K active series, 3 users.
  14. Anatomy of the Grafana Agent Metrics - Shares the same

    codebase as the Prometheus Agent. Logs - Embeds Promtail, the log forwarder built by Grafana, for Loki. Traces - Based on OpenTelemetry Collector.
  15. Have more questions? Join us at community.grafana.com or Grafana public

    slack: slack.grafana.com #channel grafana/ community.grafana.com Get involved: Thank you!