How to measure Postgres Query Performance

Slide 1

Slide 1 text

How to measure query performance How to measure query performance Álvaro Hernández @ahachete

Slide 2

Slide 2 text

How to measure query performance ` whoami ` Álvaro Hernández aht.es ● Founder & CEO, OnGres ● 20+ years Postgres user and DBA ● Mostly doing R&D to create new, innovative software on Postgres ● More than 100 tech talks, most about Postgres ● Founder and President of the NPO Fundación PostgreSQL ● AWS Data Hero

Slide 3

Slide 3 text

How to measure query performance ` whoami ` : my main dedication StackGres.io The most Advanced Postgres Platform on Kubernetes

Slide 4

Slide 4 text

How to measure query performance How to measure query performance Or, alternatively, how NOT to measure query performance

Slide 5

Slide 5 text

How to measure query performance This talk should last 30 seconds

Slide 6

Slide 6 text

How to measure query performance EXPLAIN ANALYZE. Easy! $ explain analyze UPDATE pgbench_accounts SET abalance = abalance + 2060 WHERE aid = 287460257; QUERY PLAN ----------------------------------------------------------------------------- ------------------------------------------------------------------ Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0) (actual time=0.080..0.080 rows=0 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.57..2.99 rows=1 width=10) (actual time=0.032..0.034 rows=1 loops=1) Index Cond: (aid = 287460257) Planning Time: 0.042 ms Execution Time: 0.104 ms

Slide 7

Slide 7 text

How to measure query performance EXPLAIN ANALYZE. Easy! But use “buffers”! $ explain (analyze, buffers) UPDATE pgbench_accounts SET abalance = abalance + 2060 WHERE aid = 287460257; QUERY PLAN ----------------------------------------------------------------------------- ------------- Update on pgbench_accounts (cost=0.57..2.99 rows=0 width=0) (actual time=0.102..0.103 rows=0 loops=1) Buffers: shared hit=20 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.57..2.99 rows=1 width=10) (actual time=0.037..0.038 rows=1 loops=1) Index Cond: (aid = 287460257) Buffers: shared hit=7 Planning Time: 0.057 ms Execution Time: 0.122 ms

Slide 8

Slide 8 text

How to measure query performance So what’s wrong with EXPLAIN ANALYZE? ● It shows the result of an isolated query, not a system operating concurrently. ● It shows the plan for a query with given parameters. ● The plan you see, may not be always the plan you get. ● Query performance varies over time. For 1,000 runs on idle system: avg | percentile_50 | percentile_90 | percentile_95 | percentile_99 -----+---------------+---------------+---------------+--------------- 99 | 96 | 104 | 118 | 163

Slide 9

Slide 9 text

How to measure query performance EXPLAIN ANALYZE may lie to you https://ongres.com/blog/explain_analyze_may_be_lying_to_you/

Slide 10

Slide 10 text

How to measure query performance Techniques to measure query performance. Benchmarks

Slide 11

Slide 11 text

How to measure query performance Other techniques to measure query performance ● Client side ● pg_stat_statistics ● pg_stat_monitor ● log_min_duration_statement = 0 ● auto_explain ● The future?

Slide 12

Slide 12 text

How to measure query performance Client side ● Any benchmarking tool, but easy ones are: ○ pg_bench --report-latencies ○ psql’s \timing ● It includes network overhead time and variance. It’s small on localhost UDS (~ 0.6ms), but still introduces overhead and noise.

Slide 13

Slide 13 text

How to measure query performance Benchmark ● Instance: m5ad.4xlarge (16 vcores, 64GB RAM, local SSD) ● Standard pgbench with 24 clients, 12 jobs, scale=5000 (+100GB) and --log ● 1 warm up run, 3 runs of 10 minutes: ○ Tps: averaged over three runs ○ Latencies: computed percentiles over all results ● Cache cleaning, vacuum, checkpoints before every run. No autovacuum. ● Conﬁg: https://postgresqlco.nf/manage/conﬁg/public/bd513ff0-e16b-4650-9918-a1452437a030 ● Source code: https://gitlab.com/-/snippets/2290113

Slide 14

Slide 14 text

How to measure query performance pg_stat_statements query SELECT abalance FROM pgbench_accounts WHERE aid = $1 plans 0 calls 29359968 total_exec_time 442499.3529 min_exec_time 0.0070 max_exec_time 22.7373 mean_exec_time 0.0151 stddev_exec_time 0.0142 …

Slide 15

Slide 15 text

How to measure query performance pg_stat_statements TPS: Baseline: 12,471 12,205 -2.1% Query Latency Percentiles:

Slide 16

Slide 16 text

How to measure query performance pg_stat_statements: statements, not TXs! pgbench (14.2 (Ubuntu 14.2-1.pgdg20.04+1+b1)) … latency average = 2.342 ms … statement latencies in milliseconds: 0.058 BEGIN; 0.965 UPDATE pgbench_accounts SET abalance = abalance … 0.139 SELECT abalance FROM pgbench_accounts WHERE aid … 0.153 UPDATE pgbench_tellers SET tbalance = tbalance … 0.152 UPDATE pgbench_branches SET bbalance = bbalance … 0.124 INSERT INTO pgbench_history (tid, bid, aid, … 0.749 END;

Slide 17

Slide 17 text

How to measure query performance pg_stat_statements: conclusions ● Low overhead. ● No percentiles: min, max, avg and stddev far from enough. ● How to see trending, evolution? Sample + compute diff: really hard. ● Measures statements, no transactions. ● By default doesn’t include plan time (has more overhead). ● Doesn’t include parse time.

Slide 18

Slide 18 text

How to measure query performance pg_stat_monitor TPS: Baseline: 12,471 11,107 -10.9% Query Latency Percentiles:

Slide 19

Slide 19 text

How to measure query performance pg_stat_monitor: conclusions ● Supports buckets: acceptable approach to trending, diff. ● Supports percentiles: good! ● Quite noticeable overhead as of today. ● Some erratic behavior observed during benchmarks. ● Shares most other limitations with pg_stat_statements.

Slide 20

Slide 20 text

How to measure query performance log_min_duration_statement=0 (logs on SSD) TPS: Baseline: 12,471 11,078 -11.2% Query Latency Percentiles:

Slide 21

Slide 21 text

How to measure query performance log_min_duration_statement=0: Conclusions ● 12GB of logs per every 10 min test. ● Don’t do this in production. ● Overhead can be much higher under circumstances like: ○ Not using logging_collector. ○ Bottlenecked I/O (specially slower, networked disks). ○ System under high contention.

Slide 22

Slide 22 text

How to measure query performance auto_explain (logging all plans with ANALYZE) TPS: Baseline: 12,471 9,658 -22.6% Query Latency Percentiles:

Slide 23

Slide 23 text

How to measure query performance auto_explain: Conclusions ● With appropriate sampling, it provides help. ● Still don’t trust much ANALYZE (it lies to you!). ● Need to parse logs or use SaaS tool. ● Helps with plans, not with query latencies.

Slide 24

Slide 24 text

How to measure query performance The perfect query performance tool

Slide 25

Slide 25 text

How to measure query performance The ideal query performance tool ● Record the latency of all queries. All. ● Can compute trending and percentiles. Trending of percentiles. ● Low or negligible overhead. ● Can record statements and transaction latency. ● Ideally, can also split parse, plan and execution time. And waiting on locks! ● Even more ideally, can record the plan on sampled cases. Does this tool exist? NO. Can it exist? Maybe ;)

Slide 26

Slide 26 text

How to measure query performance Q & A Álvaro Hernández @ahachete