Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Blazin' Fast PromQL

Grafana
August 20, 2019

Blazin' Fast PromQL

(Presented at London Prometheus Meetup 20/09/2019)

PromQL, the Prometheus Query Language, is a concise, powerful and increasingly popular language for querying time series data. But PromQL queries can take a long time when they have to consider >100k series and months of data. Even with Prometheus’ compression, a 90-day query over 200k series can touch ~100GB of data.

In this talk, we will present a series of techniques employed by Cortex (a CNCF project for clustered Prometheus) for accelerating PromQL queries - namely query results caching, time slice parallelisation, aggregation sharding, and automatic recoding rule substitutions.

But there’s more: we will show how you can use this technology to get these improvements with Thanos and Prometheus.

Grafana

August 20, 2019
Tweet

More Decks by Grafana

Other Decks in Technology

Transcript

  1. Blazin’ Fast PromQL
    Prometheus London Meetup, August 2019
    @tom_wilkie

    View Slide

  2. Cortex is a time-series store built on
    Prometheus that is:
    - Horizontally scalable
    - Highly Available
    - Long-term storage
    - Multi-tenant
    - Multi-tenant
    Cortex: horizontally scalable Prometheus
    2
    Cortex gives you:
    - A global view of as many
    metrics as you need
    - With no gaps in the charts
    - On durable, long term storage
    - Across multiple tenants
    Cortex is a CNCF Sandbox project: github.com/cortexproject/cortex

    View Slide

  3. 3
    Querier
    PromQL Engine
    Chunk Store
    Ingester Client
    Ingester
    Ingester
    Ingester
    Ingester
    Ingester
    >1 yr ago
    NoSQL
    Index
    Blob Store

    View Slide

  4. 4
    Querier
    PromQL Engine
    Chunk Store
    Ingester Client
    Ingester
    Ingester
    Ingester
    Ingester
    Ingester
    Index
    Memcached
    Chunk
    Memcached
    NoSQL
    Index
    Blob Store
    Caching

    View Slide

  5. 5
    Querier
    Ingester
    Ingester
    Ingester
    Ingester
    Ingester
    Index
    Memcached
    Chunk
    Memcached
    NoSQL
    Index
    Blob
    Store
    Query Frontend Results
    Memcached
    More Caching

    View Slide

  6. 6
    Query Frontend
    rate(http_duration_seconds_count{job="shipping"}[1m])
    rate... rate... rate... rate...
    2. Split by day
    rate... rate...
    3. Cache lookup
    ..
    4. Queue & Parallel Dispatch
    rate(request_durations_seconds_count[1m])
    1. Step align
    rate(http_duration_seconds_count{job="shipping"}[1m])

    View Slide

  7. But wait! One more thing...
    7

    View Slide

  8. 8
    https://github.com/cortexproject/cortex/pull/1441

    View Slide

  9. 9
    $ ./cortex \
    -config.file=./docs/prometheus-frontend.yml \
    -frontend.downstream-url=http://demo.robustperception.io:9090
    ...
    Try this query over 7 days:
    histogram_quantile(0.50,
    sum by (job, le) (
    rate(prometheus_http_request_duration_seconds_bucket[1m])
    )
    )

    View Slide

  10. - Start sharding aggregations by series to accelerate
    high-cardinality queries (design doc).
    - Automatically replace with recording rules where appropriate?
    - Embed this as a library in Thanos...
    - Handle gaps from HA pairs...
    - What do you want to see?
    What does the future hold?
    10

    View Slide

  11. Thank You!
    11
    @tom_wilkie
    https://github.com/cortexproject/cortex

    View Slide

  12. How do we compare to Trickster?
    - Reusable: set of HTTP middlewares, useable as a library
    - Memcached for “external” cache (vs Redis for Trickster)
    - We are multi-tenant
    - We split by day and execute in parallel
    - We have some rudimentary QOS / queueing / scheduling
    However:
    - No “Fast Forward” like Trickster
    - Trickster more widely used
    12
    https://github.com/Comcast/trickster

    View Slide