Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Thanos Deep Dive: Look into Distributed System

Bartek
November 20, 2019

Thanos Deep Dive: Look into Distributed System

KubeCon 2019

Bartek

November 20, 2019
Tweet

More Decks by Bartek

Other Decks in Programming

Transcript

  1. @ThanosMetrics
    Inside a Distributed Monitoring System
    San Diego, 20th November 2019
    Bartłomiej Płotka
    Frederic Branczyk
    brancz fredbrancz bwplotka

    View full-size slide

  2. @ThanosMetrics
    Speakers
    Frederic Branczyk
    Principal Software Engineer @ Red Hat; OpenShift Monitoring Team
    Prometheus Maintainer; Thanos Maintainer; SIG Instrumentation Lead
    Bartek Plotka
    Principal Software Engineer @ Red Hat; OpenShift Monitoring Team
    Prometheus Maintainer; Thanos Maintainer

    View full-size slide

  3. @ThanosMetrics
    Agenda
    ● Quick intro, reiterate quickly on components
    ● StoreAPI
    ○ Querier (discovery, fanout, filtering)
    ○ Producer vs Browser
    ○ Integrations: OpenTSDB
    ● Downsampling
    ● Horizontal Query scaling
    ● Summary

    View full-size slide

  4. @ThanosMetrics
    Thanos Community
    ● Fully open source from start
    ● Started in Nov 2017
    ● Part of CNCF Sandbox
    ● 4600+ Github stars
    ● 160+ contributors
    ● ~500 slack users
    ● 8 maintainers, 3 triagers from
    7 different companies.
    ● Transparent Governance
    ● Prometheus Ecosystem

    View full-size slide

  5. @ThanosMetrics
    Let’s quickly reiterate on Thanos
    pull
    Querier
    Sidecar exposing StoreAPI

    View full-size slide

  6. @ThanosMetrics
    Let’s quickly reiterate on Thanos
    Querier
    Sidecar
    StoreAPI

    View full-size slide

  7. @ThanosMetrics
    Let’s quickly reiterate on Thanos
    Querier
    Sidecar
    StoreAPI
    StoreAPI
    Gateway

    View full-size slide

  8. @ThanosMetrics
    Let’s quickly reiterate on Thanos
    Querier
    Sidecar
    Gateway Ruler
    StoreAPI
    StoreAPI StoreAPI

    View full-size slide

  9. @ThanosMetrics
    Let’s quickly reiterate on Thanos
    Querier
    Sidecar
    Gateway Ruler
    StoreAPI
    Federated
    Querier

    View full-size slide

  10. @ThanosMetrics
    There was something
    common in all these
    architectures

    View full-size slide

  11. @ThanosMetrics
    StoreAPI

    View full-size slide

  12. @ThanosMetrics
    StoreAPI
    ● Every component in Thanos serves data via gRPC StoreAPI
    ○ sidecar
    ○ store
    ○ rule
    ○ receive (experimental component)
    ○ query
    ● Integrations! https://thanos.io/integrations.md/
    ○ OpenTSDB as StoreAPI: https://github.com/G-Research/geras

    View full-size slide

  13. @ThanosMetrics
    StoreAPI
    From: rpc.proto

    View full-size slide

  14. @ThanosMetrics
    Thanos Query: Store Discovery
    ● --store flag
    ○ Exact endpoints
    ○ DNS discovery: A, AAAA, SRV

    View full-size slide

  15. @ThanosMetrics
    Thanos Query: Store Infos
    ● Every 10s requests Info endpoint
    ● Healthiness
    ● Metadata propagation

    View full-size slide

  16. @ThanosMetrics
    Thanos Query: Life of a query
    ● Query
    ○ Select possible stores
    ○ Fan out to gather data
    ○ Process query

    View full-size slide

  17. @ThanosMetrics
    Thanos Query: Life of a query
    pull
    Querier
    {region=”us-east-1”}
    {region=”us-east-2”}
    {region=”us-west-1”}

    View full-size slide

  18. @ThanosMetrics
    ProxyStore

    View full-size slide

  19. @ThanosMetrics
    Challenges of Querying Years of Data

    View full-size slide

  20. @ThanosMetrics
    Query Resolution
    time
    ● Typical scrape period of Prometheus is 15s
    ● Querying 30 days means ~170k samples

    View full-size slide

  21. @ThanosMetrics
    Query Resolution
    time
    Scrape interval = ~15s step = 1m
    Evaluation
    time

    View full-size slide

  22. @ThanosMetrics
    Query Resolution: 5h range
    time
    time
    Displayed
    Storage
    Step 1m
    Samples: ~250
    Fetched
    Samples: ~1k
    ...

    View full-size slide

  23. @ThanosMetrics
    Query Resolution: 30d range
    time
    time
    Displayed
    Storage
    Step 3h
    Samples: ~250
    Fetched
    Samples: ~170k
    ...

    View full-size slide

  24. @ThanosMetrics
    Chunks
    time
    Chunk Chunk
    Samples are stored in chunks

    View full-size slide

  25. @ThanosMetrics
    Chunks
    time
    Chunk Chunk 1.3
    bytes/sample
    16 bytes/sample
    Samples are stored in chunks

    View full-size slide

  26. @ThanosMetrics
    Chunk tradeoff
    Decompressing one sample takes 10-40 nanoseconds

    View full-size slide

  27. @ThanosMetrics
    Chunk tradeoff
    Query
    Range
    Samples for
    1000 series
    Decompression
    latency
    Chunk data size
    30m ~120 000 ~5ms ~160KB
    1d ~6 millions ~240ms ~8MB
    Decompressing one sample takes 10-40 nanoseconds

    View full-size slide

  28. @ThanosMetrics
    Chunks tradeoff
    Query
    Range
    Samples for
    1000 series
    Decompression
    latency
    Chunk data size
    30m ~120 000 ~5ms ~160KB
    1d ~6 millions ~240ms ~8MB
    30d ~170 millions ~7s ~240MB
    Decompressing one sample takes 10-40 nanoseconds

    View full-size slide

  29. @ThanosMetrics
    Chunks tradeoff
    Query
    Range
    Samples for
    1000 series
    Decompression
    latency
    Chunk data size
    30m ~120 000 ~5ms ~160KB
    1d ~6 millions ~240ms ~8MB
    30d ~170 millions ~7s ~240MB
    1y ~2 billions ~1m20s ~2GB
    Decompressing one sample takes 10-40 nanoseconds

    View full-size slide

  30. @ThanosMetrics
    Downsampling

    View full-size slide

  31. @ThanosMetrics
    Downsampling
    Block
    RAW
    Block
    @ 5m
    Block
    @ 1h
    10-20x 12x

    View full-size slide

  32. @ThanosMetrics
    Downsampling
    chunk
    count sum min max counter
    chunk
    ...

    View full-size slide

  33. @ThanosMetrics
    Downsampling
    count sum min max counter
    count(requests_total)
    count_over_time(requests_total[1h])

    View full-size slide

  34. @ThanosMetrics
    Downsampling
    count sum min max counter
    sum_over_time(requests_total[1h])

    View full-size slide

  35. @ThanosMetrics
    Downsampling
    count sum min max counter
    min(requests_total)
    min_over_time(requests_total[1h])

    View full-size slide

  36. @ThanosMetrics
    Downsampling
    count sum min max counter
    max(requests_total)
    max_over_time(requests_total[1h])

    View full-size slide

  37. @ThanosMetrics
    Downsampling
    count sum min max counter
    rate(requests_total[1h])
    increase(requests_total[1h])

    View full-size slide

  38. @ThanosMetrics
    Downsampling
    count sum min max counter
    requests_total
    avg(requests_total)
    sum(requests_total)
    avg

    View full-size slide

  39. @ThanosMetrics
    Downsampling: What chunk to use on query?
    range query from t0 to t1, step 10s:
    rate(alerts_total[5m])
    PromQL

    View full-size slide

  40. @ThanosMetrics
    Downsampling: What chunk to use on query?
    labels:
    __name__ = “alerts_total”
    time:
    start: t0-5m
    end: t1
    step:
    10s
    read hints:
    func: “rate”
    range query from t0 to t1, step 10s:
    rate(alerts_total[5m])
    PromQL
    Select

    View full-size slide

  41. @ThanosMetrics
    Downsampling: What chunk to use on query?
    labels:
    __name__ = “alerts_total”
    time:
    start: t0-5m
    end: t1
    step:
    10s
    read hints:
    func: “rate”
    range query from t0 to t1, step 10s:
    rate(alerts_total[5m])
    PromQL
    Select
    Fetch
    raw raw
    Fetch

    View full-size slide

  42. @ThanosMetrics
    Downsampling: What chunk to use on query?
    labels:
    __name__ = “alerts_total”
    time:
    start: t0-5m
    end: t1
    step:
    30m
    read hints:
    func: “rate”
    range query from t0 to t1, step 30m:
    rate(alerts_total[1h])
    PromQL
    Select
    Can we fit 5 samples for
    this step with lower
    resolution?

    View full-size slide

  43. @ThanosMetrics
    Downsampling: What chunk to use on query?
    labels:
    __name__ = “alerts_total”
    time:
    start: t0-5m
    end: t1
    step:
    30m
    read hints:
    func: “rate”
    range query from t0 to t1, step 30m:
    rate(alerts_total[1h])
    PromQL
    Select
    Fetch
    counter counter
    Fetch
    Can we fit 5 samples for
    this step with lower
    resolution?
    yes for 5m resolution!

    View full-size slide

  44. @ThanosMetrics
    Downsampling: What chunk to use on query?
    labels:
    __name__ = “alerts”
    state = “active”
    time:
    start: t0
    end: t1
    step:
    30m
    read hints:
    func: “avg”
    range query from t0 to t1, step 30m:
    avg(alerts{state=”active})
    sum sum
    PromQL
    Select
    Fetch
    count count

    View full-size slide

  45. @ThanosMetrics
    Downsampling
    Query
    Range
    Samples for
    1000 series
    Decompression
    latency
    Fetched chunks
    size
    30m ~120 000 ~5ms ~160KB
    1d ~6 millions ~240ms ~8MB
    30d ~170 millions ~7s ~240MB
    30d ~8 millions ~300ms ~9MB
    1y ~2 billions ~80s ~2GB
    1y ~8 millions ~300ms ~9MB
    1h resolution
    [~50d+ queries]
    5m resolution
    [~5d+ queries]

    View full-size slide

  46. @ThanosMetrics
    Downsampling: Caveats
    ● Thanos/Prometheus UI: Step (evaluation interval in seconds)
    ● Grafana: Resolutions (1/x samples per pixel)
    ● rate[<5m] vs rate[1h] / rate[5h] / rate[$_interval]
    ● Storing only downsampled data and trying to zoom-in

    View full-size slide

  47. @ThanosMetrics
    Downsampling: Caveats
    ● Thanos/Prometheus UI: Step (evaluation interval in seconds)
    ● Grafana: Resolutions (1/x samples per pixel)
    ● rate[<5m] vs rate[1h] / rate[5h] / rate[$_interval]
    ● Storing only downsampled data and trying to zoom-in
    Standardize downsampling?

    View full-size slide

  48. @ThanosMetrics
    Horizontal Scaling of Long Term
    Storage Read Path

    View full-size slide

  49. @ThanosMetrics
    Querying long term storage backend
    Querier
    Gateway

    View full-size slide

  50. @ThanosMetrics
    Time partitioning
    Querier
    Gateway: --min-time=1y --max-time=150d
    Gateway: --min-time=150d

    View full-size slide

  51. @ThanosMetrics
    Block Sharding
    Querier
    Gateway: --selector.relabel-config=
    - action: keep
    regex: "eu.*"
    source_labels:
    - region
    Gateway: --selector.relabel-config=
    - action: keep
    regex: "us.*"
    source_labels:
    - region

    View full-size slide

  52. @ThanosMetrics
    Block Sharding
    Querier
    Gateway: --selector.relabel-config=
    - action: keep
    regex: "eu.*"
    source_labels:
    - region
    Gateway: --selector.relabel-config=
    - action: keep
    regex: "us.*"
    source_labels:
    - region

    View full-size slide

  53. @ThanosMetrics
    Common StoreAPI
    Downsampling
    Horizontal Scaling of Long Term Storage
    Summary

    View full-size slide

  54. @ThanosMetrics
    Thank You!
    https://thanos.io

    View full-size slide

  55. @ThanosMetrics
    Bonus: Caching
    Caching
    Querier
    Sidecars
    Gateway
    Cortex Frontend

    View full-size slide

  56. @ThanosMetrics
    Response Caching: Challenges
    ● Extremely useful for rolling windows (e.g Grafana “last 1h”)
    ● Dynamically changing StoreAPIs
    ● Downsampling
    ● Partial Response
    ● Backfilling/Deletion

    View full-size slide