Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to measure Postgres Query Performance

OnGres
April 08, 2022

How to measure Postgres Query Performance

If you don’t know how to measure query performance in Postgres, this talk is for you.

If you know how to do it and consider it a simple and solved problem, this talk is definitely for you.

It doesn’t sound too complicated. There are several tools and techniques to report query performance. But a deeper look reveals some troubling conclusions:

* Some techniques may outright lie to you.
* There are disagreements between different tools/techniques.
* You can only collect some limited information.
* You may not be able to measure what really matters to database users/clients.

If you want to learn more; or maybe even unlearn something, join this talk. We will explore all the caveats and provide some key understandings into how to really measure Postgres query performance.

OnGres

April 08, 2022
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. How to measure query performance
    How to measure
    query performance
    Álvaro Hernández
    @ahachete

    View full-size slide

  2. How to measure query performance
    ` whoami `
    Álvaro Hernández

    aht.es
    ● Founder & CEO, OnGres
    ● 20+ years Postgres user and DBA
    ● Mostly doing R&D to create new, innovative
    software on Postgres
    ● More than 100 tech talks, most about Postgres
    ● Founder and President of the NPO Fundación
    PostgreSQL
    ● AWS Data Hero

    View full-size slide

  3. How to measure query performance
    ` whoami ` : my main dedication
    StackGres.io
    The most Advanced Postgres Platform on Kubernetes

    View full-size slide

  4. How to measure query performance
    How to measure query
    performance
    Or, alternatively, how NOT to measure query
    performance

    View full-size slide

  5. How to measure query performance
    This talk should last 30 seconds

    View full-size slide

  6. How to measure query performance
    EXPLAIN ANALYZE. Easy!
    $ explain analyze UPDATE pgbench_accounts SET abalance = abalance + 2060
    WHERE aid = 287460257;
    QUERY PLAN
    -----------------------------------------------------------------------------
    ------------------------------------------------------------------
    Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0) (actual
    time=0.080..0.080 rows=0 loops=1)
    -> Index Scan using pgbench_accounts_pkey on pgbench_accounts
    (cost=0.57..2.99 rows=1 width=10) (actual time=0.032..0.034 rows=1 loops=1)
    Index Cond: (aid = 287460257)
    Planning Time: 0.042 ms
    Execution Time: 0.104 ms

    View full-size slide

  7. How to measure query performance
    EXPLAIN ANALYZE. Easy! But use “buffers”!
    $ explain (analyze, buffers) UPDATE pgbench_accounts SET abalance = abalance
    + 2060 WHERE aid = 287460257;
    QUERY PLAN
    -----------------------------------------------------------------------------
    ------------- Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0)
    (actual time=0.102..0.103 rows=0 loops=1)
    Buffers: shared hit=20
    -> Index Scan using pgbench_accounts_pkey on pgbench_accounts
    (cost=0.57..2.99 rows=1 width=10) (actual time=0.037..0.038 rows=1 loops=1)
    Index Cond: (aid = 287460257)
    Buffers: shared hit=7
    Planning Time: 0.057 ms
    Execution Time: 0.122 ms

    View full-size slide

  8. How to measure query performance
    So what’s wrong with EXPLAIN ANALYZE?
    ● It shows the result of an isolated query, not a system operating
    concurrently.
    ● It shows the plan for a query with given parameters.
    ● The plan you see, may not be always the plan you get.
    ● Query performance varies over time. For 1,000 runs on idle system:
    avg | percentile_50 | percentile_90 | percentile_95 | percentile_99
    -----+---------------+---------------+---------------+---------------
    99 | 96 | 104 | 118 | 163

    View full-size slide

  9. How to measure query performance
    EXPLAIN ANALYZE may lie to you
    https://ongres.com/blog/explain_analyze_may_be_lying_to_you/

    View full-size slide

  10. How to measure query performance
    Techniques to measure query
    performance.
    Benchmarks

    View full-size slide

  11. How to measure query performance
    Other techniques to measure query performance
    ● Client side
    ● pg_stat_statistics
    ● pg_stat_monitor
    ● log_min_duration_statement = 0
    ● auto_explain
    ● The future?

    View full-size slide

  12. How to measure query performance
    Client side
    ● Any benchmarking tool, but easy ones are:
    ○ pg_bench --report-latencies
    ○ psql’s \timing
    ● It includes network overhead time and variance.
    It’s small on localhost UDS (~ 0.6ms),
    but still introduces overhead and noise.

    View full-size slide

  13. How to measure query performance
    Benchmark
    ● Instance: m5ad.4xlarge (16 vcores, 64GB RAM, local SSD)
    ● Standard pgbench with 24 clients, 12 jobs, scale=5000 (+100GB) and --log
    ● 1 warm up run, 3 runs of 10 minutes:
    ○ Tps: averaged over three runs
    ○ Latencies: computed percentiles over all results
    ● Cache cleaning, vacuum, checkpoints before every run. No autovacuum.
    ● Config:
    https://postgresqlco.nf/manage/config/public/bd513ff0-e16b-4650-9918-a1452437a030
    ● Source code: https://gitlab.com/-/snippets/2290113

    View full-size slide

  14. How to measure query performance
    pg_stat_statements
    query SELECT abalance FROM pgbench_accounts WHERE aid = $1
    plans 0
    calls 29359968
    total_exec_time 442499.3529
    min_exec_time 0.0070
    max_exec_time 22.7373
    mean_exec_time 0.0151
    stddev_exec_time 0.0142

    View full-size slide

  15. How to measure query performance
    pg_stat_statements
    TPS:
    Baseline: 12,471
    12,205
    -2.1%
    Query Latency Percentiles:

    View full-size slide

  16. How to measure query performance
    pg_stat_statements: statements, not TXs!
    pgbench (14.2 (Ubuntu 14.2-1.pgdg20.04+1+b1))

    latency average = 2.342 ms

    statement latencies in milliseconds:
    0.058 BEGIN;
    0.965 UPDATE pgbench_accounts SET abalance = abalance …
    0.139 SELECT abalance FROM pgbench_accounts WHERE aid …
    0.153 UPDATE pgbench_tellers SET tbalance = tbalance …
    0.152 UPDATE pgbench_branches SET bbalance = bbalance …
    0.124 INSERT INTO pgbench_history (tid, bid, aid, …
    0.749 END;

    View full-size slide

  17. How to measure query performance
    pg_stat_statements: conclusions
    ● Low overhead.
    ● No percentiles: min, max, avg and stddev far from enough.
    ● How to see trending, evolution? Sample + compute diff: really hard.
    ● Measures statements, no transactions.
    ● By default doesn’t include plan time (has more overhead).
    ● Doesn’t include parse time.

    View full-size slide

  18. How to measure query performance
    pg_stat_monitor
    TPS:
    Baseline: 12,471
    11,107
    -10.9%
    Query Latency Percentiles:

    View full-size slide

  19. How to measure query performance
    pg_stat_monitor: conclusions
    ● Supports buckets: acceptable approach to trending, diff.
    ● Supports percentiles: good!
    ● Quite noticeable overhead as of today.
    ● Some erratic behavior observed during benchmarks.
    ● Shares most other limitations with pg_stat_statements.

    View full-size slide

  20. How to measure query performance
    log_min_duration_statement=0 (logs on SSD)
    TPS:
    Baseline: 12,471
    11,078
    -11.2%
    Query Latency Percentiles:

    View full-size slide

  21. How to measure query performance
    log_min_duration_statement=0: Conclusions
    ● 12GB of logs per every 10 min test.
    ● Don’t do this in production.
    ● Overhead can be much higher under circumstances like:
    ○ Not using logging_collector.
    ○ Bottlenecked I/O (specially slower, networked disks).
    ○ System under high contention.

    View full-size slide

  22. How to measure query performance
    auto_explain (logging all plans with ANALYZE)
    TPS:
    Baseline: 12,471
    9,658
    -22.6%
    Query Latency Percentiles:

    View full-size slide

  23. How to measure query performance
    auto_explain: Conclusions
    ● With appropriate sampling, it provides help.
    ● Still don’t trust much ANALYZE (it lies to you!).
    ● Need to parse logs or use SaaS tool.
    ● Helps with plans, not with query latencies.

    View full-size slide

  24. How to measure query performance
    The perfect query performance tool

    View full-size slide

  25. How to measure query performance
    The ideal query performance tool
    ● Record the latency of all queries. All.
    ● Can compute trending and percentiles. Trending of percentiles.
    ● Low or negligible overhead.
    ● Can record statements and transaction latency.
    ● Ideally, can also split parse, plan and execution time. And waiting on locks!
    ● Even more ideally, can record the plan on sampled cases.
    Does this tool exist? NO. Can it exist? Maybe ;)

    View full-size slide

  26. How to measure query performance
    Q & A
    Álvaro Hernández
    @ahachete

    View full-size slide