$30 off During Our Annual Pro Sale. View Details »

How to measure Postgres Query Performance

OnGres
April 08, 2022

How to measure Postgres Query Performance

If you don’t know how to measure query performance in Postgres, this talk is for you.

If you know how to do it and consider it a simple and solved problem, this talk is definitely for you.

It doesn’t sound too complicated. There are several tools and techniques to report query performance. But a deeper look reveals some troubling conclusions:

* Some techniques may outright lie to you.
* There are disagreements between different tools/techniques.
* You can only collect some limited information.
* You may not be able to measure what really matters to database users/clients.

If you want to learn more; or maybe even unlearn something, join this talk. We will explore all the caveats and provide some key understandings into how to really measure Postgres query performance.

OnGres

April 08, 2022
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. How to measure query performance ` whoami ` Álvaro Hernández

    <[email protected]> aht.es • Founder & CEO, OnGres • 20+ years Postgres user and DBA • Mostly doing R&D to create new, innovative software on Postgres • More than 100 tech talks, most about Postgres • Founder and President of the NPO Fundación PostgreSQL • AWS Data Hero
  2. How to measure query performance ` whoami ` : my

    main dedication StackGres.io The most Advanced Postgres Platform on Kubernetes
  3. How to measure query performance How to measure query performance

    Or, alternatively, how NOT to measure query performance
  4. How to measure query performance EXPLAIN ANALYZE. Easy! $ explain

    analyze UPDATE pgbench_accounts SET abalance = abalance + 2060 WHERE aid = 287460257; QUERY PLAN ----------------------------------------------------------------------------- ------------------------------------------------------------------ Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0) (actual time=0.080..0.080 rows=0 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.57..2.99 rows=1 width=10) (actual time=0.032..0.034 rows=1 loops=1) Index Cond: (aid = 287460257) Planning Time: 0.042 ms Execution Time: 0.104 ms
  5. How to measure query performance EXPLAIN ANALYZE. Easy! But use

    “buffers”! $ explain (analyze, buffers) UPDATE pgbench_accounts SET abalance = abalance + 2060 WHERE aid = 287460257; QUERY PLAN ----------------------------------------------------------------------------- ------------- Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0) (actual time=0.102..0.103 rows=0 loops=1) Buffers: shared hit=20 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.57..2.99 rows=1 width=10) (actual time=0.037..0.038 rows=1 loops=1) Index Cond: (aid = 287460257) Buffers: shared hit=7 Planning Time: 0.057 ms Execution Time: 0.122 ms
  6. How to measure query performance So what’s wrong with EXPLAIN

    ANALYZE? • It shows the result of an isolated query, not a system operating concurrently. • It shows the plan for a query with given parameters. • The plan you see, may not be always the plan you get. • Query performance varies over time. For 1,000 runs on idle system: avg | percentile_50 | percentile_90 | percentile_95 | percentile_99 -----+---------------+---------------+---------------+--------------- 99 | 96 | 104 | 118 | 163
  7. How to measure query performance EXPLAIN ANALYZE may lie to

    you https://ongres.com/blog/explain_analyze_may_be_lying_to_you/
  8. How to measure query performance Other techniques to measure query

    performance • Client side • pg_stat_statistics • pg_stat_monitor • log_min_duration_statement = 0 • auto_explain • The future?
  9. How to measure query performance Client side • Any benchmarking

    tool, but easy ones are: ◦ pg_bench --report-latencies ◦ psql’s \timing • It includes network overhead time and variance. It’s small on localhost UDS (~ 0.6ms), but still introduces overhead and noise.
  10. How to measure query performance Benchmark • Instance: m5ad.4xlarge (16

    vcores, 64GB RAM, local SSD) • Standard pgbench with 24 clients, 12 jobs, scale=5000 (+100GB) and --log • 1 warm up run, 3 runs of 10 minutes: ◦ Tps: averaged over three runs ◦ Latencies: computed percentiles over all results • Cache cleaning, vacuum, checkpoints before every run. No autovacuum. • Config: https://postgresqlco.nf/manage/config/public/bd513ff0-e16b-4650-9918-a1452437a030 • Source code: https://gitlab.com/-/snippets/2290113
  11. How to measure query performance pg_stat_statements query SELECT abalance FROM

    pgbench_accounts WHERE aid = $1 plans 0 calls 29359968 total_exec_time 442499.3529 min_exec_time 0.0070 max_exec_time 22.7373 mean_exec_time 0.0151 stddev_exec_time 0.0142 …
  12. How to measure query performance pg_stat_statements: statements, not TXs! pgbench

    (14.2 (Ubuntu 14.2-1.pgdg20.04+1+b1)) … latency average = 2.342 ms … statement latencies in milliseconds: 0.058 BEGIN; 0.965 UPDATE pgbench_accounts SET abalance = abalance … 0.139 SELECT abalance FROM pgbench_accounts WHERE aid … 0.153 UPDATE pgbench_tellers SET tbalance = tbalance … 0.152 UPDATE pgbench_branches SET bbalance = bbalance … 0.124 INSERT INTO pgbench_history (tid, bid, aid, … 0.749 END;
  13. How to measure query performance pg_stat_statements: conclusions • Low overhead.

    • No percentiles: min, max, avg and stddev far from enough. • How to see trending, evolution? Sample + compute diff: really hard. • Measures statements, no transactions. • By default doesn’t include plan time (has more overhead). • Doesn’t include parse time.
  14. How to measure query performance pg_stat_monitor: conclusions • Supports buckets:

    acceptable approach to trending, diff. • Supports percentiles: good! • Quite noticeable overhead as of today. • Some erratic behavior observed during benchmarks. • Shares most other limitations with pg_stat_statements.
  15. How to measure query performance log_min_duration_statement=0 (logs on SSD) TPS:

    Baseline: 12,471 11,078 -11.2% Query Latency Percentiles:
  16. How to measure query performance log_min_duration_statement=0: Conclusions • 12GB of

    logs per every 10 min test. • Don’t do this in production. • Overhead can be much higher under circumstances like: ◦ Not using logging_collector. ◦ Bottlenecked I/O (specially slower, networked disks). ◦ System under high contention.
  17. How to measure query performance auto_explain (logging all plans with

    ANALYZE) TPS: Baseline: 12,471 9,658 -22.6% Query Latency Percentiles:
  18. How to measure query performance auto_explain: Conclusions • With appropriate

    sampling, it provides help. • Still don’t trust much ANALYZE (it lies to you!). • Need to parse logs or use SaaS tool. • Helps with plans, not with query latencies.
  19. How to measure query performance The ideal query performance tool

    • Record the latency of all queries. All. • Can compute trending and percentiles. Trending of percentiles. • Low or negligible overhead. • Can record statements and transaction latency. • Ideally, can also split parse, plan and execution time. And waiting on locks! • Even more ideally, can record the plan on sampled cases. Does this tool exist? NO. Can it exist? Maybe ;)