Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Blazin' Fast PromQL

Grafana
August 20, 2019

Blazin' Fast PromQL

(Presented at London Prometheus Meetup 20/09/2019)

PromQL, the Prometheus Query Language, is a concise, powerful and increasingly popular language for querying time series data. But PromQL queries can take a long time when they have to consider >100k series and months of data. Even with Prometheus’ compression, a 90-day query over 200k series can touch ~100GB of data.

In this talk, we will present a series of techniques employed by Cortex (a CNCF project for clustered Prometheus) for accelerating PromQL queries - namely query results caching, time slice parallelisation, aggregation sharding, and automatic recoding rule substitutions.

But there’s more: we will show how you can use this technology to get these improvements with Thanos and Prometheus.

Grafana

August 20, 2019
Tweet

More Decks by Grafana

Other Decks in Technology

Transcript

  1. Cortex is a time-series store built on Prometheus that is:

    - Horizontally scalable - Highly Available - Long-term storage - Multi-tenant - Multi-tenant Cortex: horizontally scalable Prometheus 2 Cortex gives you: - A global view of as many metrics as you need - With no gaps in the charts - On durable, long term storage - Across multiple tenants Cortex is a CNCF Sandbox project: github.com/cortexproject/cortex
  2. 3 Querier PromQL Engine Chunk Store Ingester Client Ingester Ingester

    Ingester Ingester Ingester >1 yr ago NoSQL Index Blob Store
  3. 4 Querier PromQL Engine Chunk Store Ingester Client Ingester Ingester

    Ingester Ingester Ingester Index Memcached Chunk Memcached NoSQL Index Blob Store Caching
  4. 5 Querier Ingester Ingester Ingester Ingester Ingester Index Memcached Chunk

    Memcached NoSQL Index Blob Store Query Frontend Results Memcached More Caching
  5. 6 Query Frontend rate(http_duration_seconds_count{job="shipping"}[1m]) rate... rate... rate... rate... 2. Split

    by day rate... rate... 3. Cache lookup .. 4. Queue & Parallel Dispatch rate(request_durations_seconds_count[1m]) 1. Step align rate(http_duration_seconds_count{job="shipping"}[1m])
  6. 9 $ ./cortex \ -config.file=./docs/prometheus-frontend.yml \ -frontend.downstream-url=http://demo.robustperception.io:9090 ... Try this

    query over 7 days: histogram_quantile(0.50, sum by (job, le) ( rate(prometheus_http_request_duration_seconds_bucket[1m]) ) )
  7. - Start sharding aggregations by series to accelerate high-cardinality queries

    (design doc). - Automatically replace with recording rules where appropriate? - Embed this as a library in Thanos... - Handle gaps from HA pairs... - What do you want to see? What does the future hold? 10
  8. How do we compare to Trickster? - Reusable: set of

    HTTP middlewares, useable as a library - Memcached for “external” cache (vs Redis for Trickster) - We are multi-tenant - We split by day and execute in parallel - We have some rudimentary QOS / queueing / scheduling However: - No “Fast Forward” like Trickster - Trickster more widely used 12 https://github.com/Comcast/trickster