Practical Query Tuning in PostgreSQL

Practical Query Tuning in PostgreSQL From Diagnosis to Optimization PGDay
Chicago 2025 April 25, 2025 Keiko Oda - pganalyze

Table of Contents • Query Tuning Workﬂow ◦ How to
ﬁnd a slow query? ◦ Benchmarking with EXPLAIN • EXPLAIN Refresher ◦ Recap basics • Case 1: Missing Index ◦ Statistics, Selectivity and Index • Case 2: ORDER BY with LIMIT ◦ My query got slow without touching it! • Case 3: Nested Loop with misestimate ◦ JOIN types and order 2

Speaker Intro • Keiko Oda • Product Engineer at pganalyze
◦ Sponsoring! Say hi at our booth! • From Japan ◦ Kanazawa, Ishikawa ◦ The 25 best places in the world to travel to in 2025 by National Geographic 3 I love beers! Let me know if you have any recommendations in Chicago

Speaker Intro • Keiko Oda • Product Engineer at pganalyze
◦ Sponsoring! Say hi at our booth! • From Japan ◦ Kanazawa, Ishikawa ◦ The 25 best places in the world to travel to in 2025 by National Geographic 4 I love beers! Let me know if you have any recommendations in Chicago Map source: JAIST https://www.jaist.ac.jp/english/top/access/

Query Tuning Workﬂow 5

Query Tuning Workﬂow 6 Form hypothesis Apply changes (test) Slow
Query Apply changes (permanent) Benchmark with EXPLAIN

How to ﬁnd a slow query? • Application performance monitoring
(APM) tools ◦ Which endpoint or job is slow? ◦ Using trace views to pinpoint slow parts of the request 8

How to ﬁnd a slow query? • Isolate issues ◦
System level: “Is the query slow, or is the system overloaded”? ◦ Database level (locks, etc.): “Is the query slow, or it is waiting for something else”? 9

How to ﬁnd a slow query? • Isolate issues ◦
System level: “Is the query slow, or is the system overloaded”? ◦ Database level (locks, etc.): “Is the query slow, or it is waiting for something else”? 10

How to ﬁnd a slow query? • pg_stat_statements ◦ Tracks
query execution stats (total time, call counts, rows, I/O metrics) 11

Postgres Platform "Best Practices" for the Modern DBA 📹-3 How
to find more info of a slow query? • log_min_duration_statement (config) ◦ Logs all queries that run longer than the specified duration (e.g. 1000ms) • auto_explain (extension) ◦ Automatically logs the EXPLAIN output of slow queries ▪ It’s supported by most cloud providers and relatively easy to set up ◦ auto_explain configuration options ▪ auto_explain supports many of the same options as the EXPLAIN commands ▪ auto_explain.log_min_duration ▪ auto_explain.log_analyze: Runs EXPLAIN ANALYZE - ON recommended ▪ auto_explain.log_buffers: Runs EXPLAIN BUFFERS - ON recommended ▪ auto_explain.log_timing: OFF as a starter, ON if possible ◦ “Prioritize observability > latency” - by Chelsea Dole 12 Collect Postgres EXPLAIN plans using auto_explain 📖-4

Benchmarking with EXPLAIN • Use EXPLAIN ANALYZE to measure the
baseline ◦ If possible, enable BUFFERS (ON by default with ANALYZE from 18), VERBOSE, I/O Timing (track_io_timing) for more detailed insights • Run EXPLAIN multiple times to reduce cold cache effects ◦ The ﬁrst run can be affected by cache, making it unstable as a benchmark ▪ Cold cache: Disk I/O, buffer cache, CPU overhead, etc. ◦ Important part: measure the benchmark under the same condition ◦ Use pg_buffercache_evict_all() function (new from 18) • Benchmark with different parameter sets ◦ The same query may use different plans depending on the parameter values ◦ A query improvement for one parameter set might hurt performance for others 14

pg_buffercache_evict and pg_buffercache_evict_all • pg_buffercache_evict() ◦ New in Postgres 17
◦ Need to specify the buffer ID • pg_buffercache_evict_all() ◦ New in Postgres 18 ◦ Evict all cache in shared buffers ◦ Do not run this in production 15 Understanding Postgres IOPS: Why They Matter Even When Everything Fits in Cache 📖-2 Hit Read

EXPLAIN Refresher Recap basics 16

What is EXPLAIN? EXPLAIN will show a plan tree the
Planner picked 17 Parser Rewriter Planner Executor SELECT id, name FROM users WHERE org_id = 123; id | name ----+------- 1 | Alice 3 | Beth 7 | Emily Plan tree Query tree Query Result Tweaked Query tree Plan tree 1 Plan tree 2 Plan tree 3

Planner’s role Planner’s role: Creating an optimal execution plan 1.
Consider all(-ish) possible plans for a given query ◦ Whether to use an index or not, which join types to use, and in what order to join tables ◦ In some cases, it doesn’t explore all possible plans (will take too long for planning) 2. Calculate the cost of each plan ◦ Cost: internal unit of effort to compare different execution plans ◦ e.g. Sequential pages are more expensive (seq_page_cost) than cached/indexed pages (random_page_cost) 3. Choose the plan with the lowest estimated cost 18

When the query is slow… The planner is not picking
up “the best” plan • Bad estimates: Stats are stale or too basic • Missing options: No index = no chance to use one • Cost can be tricky: What looks cheap might not be cheap (or fast) • Query shape matters: CTEs, functions, etc. can block optimization 19 Planner Plan tree 1 Plan tree 2 Plan tree 3

Cheapest cost != fastest 20

Case 1: Missing Index Statistics, Selectivity and Index 21

Statistics • Postgres maintains statistics per table ◦ Statistics are
per-column by default ▪ multi-column statistics (extended stats) can be created with CREATE STATISTICS ◦ Collected automatically by autovacuum(analyze) or manually with ANALYZE 22 A Deep Dive into Postgres Statistics 📹-5 column avg_width Average width of the column’s values n_distinct Estimated number of distinct values (-1 indicates all values are unique) most_common_vals A list of the most common values in the column histogram_bounds A list of values that divide the column’s values into approx. equally sized groups

Statistics 23 SELECT attname, avg_width, n_distinct, most_common_vals, histogram_bounds FROM pg_stats
WHERE tablename = 'postgres_roles' AND attname IN ('create_db', 'connection_limit'); -[ RECORD 1 ]----+----------------------------------------------------------------------------- attname | connection_limit avg_width | 4 n_distinct | 27 most_common_vals | {-1,20,5,10} histogram_bounds | {0,0,1,3,3,6,6,30,30,40,50,50,75,100,100,200,325,500,500,999,1000,1024,2000} -[ RECORD 2 ]----+----------------------------------------------------------------------------- attname | create_db avg_width | 1 n_distinct | 2 most_common_vals | {f,t} histogram_bounds | The pg_stats system view

Selectivity and Index • Selectivity: How many rows match a
given search condition ◦ High selectivity (few rows match) → Planner may choose Index Scan ◦ Low selectivity (many rows match) → Planner may prefer Sequential Scan • Not all indexes are worth creating ◦ Low selectivity may be ignored + write overhead and storage cost 24 Postgres' Clever Query Planning System 📖-3

Test data ☕ 25 CREATE TABLE products( id int not
null, name text not null, PRIMARY KEY(id) ); INSERT INTO products (id, name) VALUES (1, 'Latte'), (2, 'Espresso'), (3, 'Cappuccino'), (4, 'Americano'), (5, 'Mocha'); ANALYZE products; ☕ Table: products

Test data 🧾 26 CREATE TABLE orders( id int not
null, product_id int not null, user_id int not null, created_at date not null, PRIMARY KEY (id) ); INSERT INTO orders (id, product_id, user_id, created_at) SELECT generate_series(1, 5000000) AS id, -- 5M orders floor(random() * 5 + 1)::int AS product_id, -- No “users” table, but assuming there are 100k users by issuing 1-100k user IDs floor(random() * 100000 + 1)::int AS user_id, -- 100k users (about ~50 orders per user) date '2024-01-01' + (random() * 365)::int AS created_at; ANALYZE orders; 🧾 Table: orders

Missing index 27 EXPLAIN ANALYZE SELECT count(*) FROM orders WHERE
product_id = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=55117.28..55117.29 rows=1 width=8) (actual time=106.312..106.978 rows=1 loops=1) -> Gather (cost=55117.06..55117.27 rows=2 width=8) (actual time=106.271..106.972 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=54117.06..54117.07 rows=1 width=8) (actual time=99.692..99.692 rows=1 loops=3) -> Parallel Seq Scan on orders (cost=0.00..53069.67 rows=418958 width=0) (actual time=0.138..88.288 rows=333854 loops=3) Filter: (product_id = 5) Rows Removed by Filter: 1332813 Planning Time: 0.487 ms Execution Time: 107.031 ms (10 rows) Total order count of Mocha (product_id = 5)

product_id = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=17186.88..17186.89 rows=1 width=8) (actual time=51.927..52.955 rows=1 loops=1) -> Gather (cost=17186.66..17186.87 rows=2 width=8) (actual time=51.822..52.937 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=16186.66..16186.67 rows=1 width=8) (actual time=46.825..46.826 rows=1 loops=3) -> Parallel Index Only Scan using orders_product_id_idx on orders (cost=0.43..15139.27 rows=418958 width=0) (actual time=0.084..28.528 rows=333854 loops=3) Index Cond: (product_id = 5) Heap Fetches: 0 Planning Time: 0.277 ms Execution Time: 53.019 ms (10 rows) CREATE INDEX ON orders(product_id); Cost: 55,117 → 17,186 (-70%) Execution time: 107ms → 53ms (-50%)

user_id = 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Aggregate (cost=5.45..5.46 rows=1 width=8) (actual time=0.293..0.295 rows=1 loops=1) -> Index Only Scan using orders_user_id_idx on orders (cost=0.43..5.33 rows=51 width=0) (actual time=0.255..0.272 rows=47 loops=1) Index Cond: (user_id = 1) Heap Fetches: 0 Planning Time: 0.827 ms Execution Time: 0.418 ms (6 rows) CREATE INDEX ON orders(user_id); Cost: 17,186 → 5 Execution time: 53ms → 0.4ms High selectivity: the index is used more effectively, scanning fewer rows and reducing total cost

Case 2: ORDER BY with LIMIT My query got slow
without touching it! 30

ORDER BY with LIMIT 32 EXPLAIN ANALYZE SELECT * FROM
orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=204.27..204.34 rows=30 width=16) (actual time=1.116..1.127 rows=30 loops=1) -> Sort (cost=204.27..204.39 rows=51 width=16) (actual time=1.112..1.117 rows=30 loops=1) Sort Key: created_at DESC Sort Method: top-N heapsort Memory: 26kB -> Bitmap Heap Scan on orders (cost=4.83..202.82 rows=51 width=16) (actual time=0.328..1.071 rows=67 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=67 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..4.82 rows=51 width=0) (actual time=0.283..0.283 rows=67 loops=1) Index Cond: (user_id = 1) Planning Time: 1.180 ms Execution Time: 1.174 ms (11 rows) Fetch the 30 most recent orders for user_id=1

ORDER BY with LIMIT 33 EXPLAIN ANALYZE SELECT * FROM
orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=204.27..204.34 rows=30 width=16) (actual time=1.116..1.127 rows=30 loops=1) -> Sort (cost=204.27..204.39 rows=51 width=16) (actual time=1.112..1.117 rows=30 loops=1) Sort Key: created_at DESC Sort Method: top-N heapsort Memory: 26kB -> Bitmap Heap Scan on orders (cost=4.83..202.82 rows=51 width=16) (actual time=0.328..1.071 rows=67 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=67 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..4.82 rows=51 width=0) (actual time=0.283..0.283 rows=67 loops=1) Index Cond: (user_id = 1) Planning Time: 1.180 ms Execution Time: 1.174 ms (11 rows) Fetch the 30 most recent orders for user_id=1

ORDER BY with LIMIT with 100k new orders 34 INSERT
INTO orders SELECT generate_series(5000001, 5100000), -- Adding 100k new orders (on top of existing 5M) floor(random() * 5 + 1)::int AS product_id, floor(random() * 10 + 1)::int AS user_id, -- Order with only user_id 1-10 date '2024-01-01' + (random() * 365)::int AS created_at; Reduce the selectivity of user_id (esp. user_id=1) by adding 100k new orders SELECT n_distinct, most_common_vals, most_common_freqs FROM pg_stats WHERE attname = 'user_id'; -[ RECORD 1 ]-----+--------- n_distinct | 97702 most_common_vals | {53246} most_common_freqs | {0.0002} -[ RECORD 1 ]-----+--------------------------- n_distinct | 92373 most_common_vals | {6,1,3,2,7,8,4,9,10,5} most_common_freqs | {0.0022333334,0.0021,0.0020333333,0.002,0.002,0 .002,0.0018666667,0.0018,0.0017666667,0.0017}

ORDER BY with LIMIT with 100k new orders 35 EXPLAIN
ANALYZE SELECT * FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..605.41 rows=30 width=16) (actual time=0.058..11.740 rows=30 loops=1) -> Index Scan Backward using orders_created_at_idx on orders (cost=0.43..215976.74 rows=10710 width=16) (actual time=0.056..11.734 rows=30 loops=1) Filter: (user_id = 1) Rows Removed by Filter: 7280 Planning Time: 0.210 ms Execution Time: 11.779 ms (6 rows) Fetch the 30 most recent orders for user_id=1 Prev: Index scan with user_id + sort on created_at Now: Index scan with created_at + ﬁlter

ANALYZE SELECT * FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=1000.46..16789.51 rows=1000 width=16) (actual time=2038.116..2046.743 rows=1000 loops=1) -> Gather Merge (cost=1000.46..180837.81 rows=11390 width=16) (actual time=2038.115..2046.706 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Index Scan Backward using orders_created_at_idx on orders (cost=0.43..178523.10 rows=4746 width=16) (actual time=148.825..1387.316 rows=518 loops=3) Filter: (user_id = 1) Rows Removed by Filter: 1632248 Planning Time: 0.236 ms Execution Time: 2046.784 ms (9 rows) Fetch the 1k most recent orders for user_id=1 (new orders are created in the beginning of the year) Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage 📖-5

ANALYZE SELECT * FROM orders WHERE user_id = 1 ORDER BY created_at+0 DESC LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=22119.99..22122.49 rows=1000 width=20) (actual time=10.200..10.389 rows=1000 loops=1) -> Sort (cost=22119.99..22148.47 rows=11390 width=20) (actual time=10.197..10.283 rows=1000 loops=1) Sort Key: ((created_at + 0)) DESC Sort Method: top-N heapsort Memory: 161kB -> Bitmap Heap Scan on orders (cost=136.70..21495.49 rows=11390 width=20) (actual time=1.920..7.302 rows=10280 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=596 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..133.86 rows=11390 width=0) (actual time=1.814..1.814 rows=10280 loops=1) Index Cond: (user_id = 1) Planning Time: 0.826 ms Execution Time: 10.585 ms Fetch the 1k most recent orders for user_id=1 (new orders are created in the beginning of the year)

Case 3: Nested Loop with misestimate JOIN types and order
38

Plan Nodes - Join Nodes Join Nodes Nested Loop For
each row in the outer table, fetch and join with every row in the inner table 󰢐 Effective when the result from the outer table is small (= low loop count) or the inner table can use a parameterized index scan 󰢄 Not effective for large tables, inefficient compared to the other two join types Merge Join Join two children already sorted by their shared join key 󰢐 Good for joining large tables, especially if already sorted 󰢄 Can be slow if there’s no index on the join key Hash Join Build a hash table from the inner table, keyed by the join key, then scan the outer table to find matches 󰢐 Works well when the inner table is small (aka hash table is small) and the outer table is large; best suited for equality joins 󰢄 Very slow if the hash table doesn’t fit in work_mem 39 Postgres' Clever Query Planning System 📖-6

Plan Nodes - Join Nodes Join Nodes Nested Loop SELECT
* FROM small_outer_table JOIN large_inner_table ON small_outer_table.id = large_inner_table.some_id; (index on large_inner_table.some_id) Merge Join SELECT * FROM large_table1 JOIN large_table2 ON large_table1.created_at > large_table2.last_seen; (tables are sorted by join keys or have indexes on join keys) Hash Join SELECT * FROM smaller_table JOIN larger_table ON smaller_table.id = larger_table.some_id; (equi-join) 40 Join Nodes: Combine rows from child nodes Internals of physical join operators 📹-4

Misestimate and Nested Loop 41 Outer Table Inner Table A
B C B D A E B 󰢐 Outer table is small … Outer Table Inner Table A B C B D A E B … Z Y … +1000 󰢄 Outer table is big Loop Count: 3 Loop Count: 1005

Table Stats for Server query 42 WITH stats AS (
SELECT stats_columns FROM postgres_server_stats_35d ), schema_table_infos AS ( SELECT stats_columns FROM schema_table_infos_35d sti_outer JOIN ( SELECT stats_columns FROM schema_table_infos_35d ) sti_inner USING (things) ) SELECT stats_columns FROM unpack_schema_table_stats(function_call) sts JOIN schema_tables st ON sts.schema_table_id = st.id JOIN databases d ON st.database_id = d.id JOIN schema_table_infos sti ON sti.schema_table_id = st.id WHERE some_conditions

SELECT stats_columns FROM postgres_server_stats_35d ), schema_table_infos AS ( SELECT stats_columns FROM schema_table_infos_35d sti_outer JOIN ( SELECT stats_columns FROM schema_table_infos_35d ) sti_inner USING (things) ) SELECT stats_columns FROM unpack_schema_table_stats(function_call) sts JOIN schema_tables st ON sts.schema_table_id = st.id JOIN databases d ON st.database_id = d.id JOIN schema_table_infos sti ON sti.schema_table_id = st.id WHERE some_conditions CTE without MATERIALIZE can use the parameterized index scan (which might not be optimal)

SELECT stats_columns FROM postgres_server_stats_35d ), schema_table_infos AS MATERIALIZED ( SELECT stats_columns FROM schema_table_infos_35d sti_outer JOIN ( SELECT stats_columns FROM schema_table_infos_35d ) sti_inner USING (things) ) SELECT stats_columns FROM unpack_schema_table_stats(function_call) sts JOIN schema_tables st ON sts.schema_table_id = st.id JOIN databases d ON st.database_id = d.id JOIN schema_table_infos sti ON sti.schema_table_id = st.id WHERE some_conditions

Thank You! https://speakerdeck.com/keiko713/practical-query-tuning-in-postgres @keiko713 50

📖 References (Blogs, Documents) 📖 1. Postgres document: Using EXPLAIN
◦ https://www.postgresql.org/docs/current/using-explain.html 2. Understanding Postgres IOPS: Why They Matter Even When Everything Fits in Cache – John Dalton (Crunchy Data) ◦ Due to Postgres’ tendency to access disk frequently, IOPS is an unavoidable topic when discussing its performance. This blog post provides a solid opportunity to learn about Postgres’ I/O behavior in depth. ◦ https://www.crunchydata.com/blog/understanding-postgres-iops 3. Postgres' Clever Query Planning System – Paul Ramsey (Crunchy Data) ◦ Explains the relationship between indexes, selectivity, and statistics using plenty of intuitive and easy-to-understand visuals ◦ https://www.crunchydata.com/blog/indexes-selectivity-and-statistics 51

📖 References (Blogs, Documents) 📖 4. Collect Postgres EXPLAIN plans
using auto_explain – pganalyze docs ◦ Covers how to set up auto_explain per provider, also mentions recommended settings ◦ https://pganalyze.com/docs/explain/setup/auto_explain 5. Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage - Lukas Fittl ◦ Covers the impact of ORDER BY + LIMIT with the real world examples ◦ https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit 6. How Postgres Chooses Which Index To Use For A Query - Lukas Fittl ◦ Talks about why the nested loop is a good ﬁt for the parameterized index scans ◦ https://pganalyze.com/blog/how-postgres-chooses-index 52

📹 References (Videos) 📹 1. Explaining the Postgres Query Optimizer
| Citus Con: An Event for Postgres 2022 - Bruce Momjian ◦ Explains how Postgres’ statistics inﬂuence query planning through real-world examples, making it highly intuitive and easy to grasp ◦ https://www.youtube.com/watch?v=wLpcVM9qxV0 2. Webinar recording: Optimizing slow queries with EXPLAIN to ﬁx bad query plans - Lukas Fittl (pganalyze) ◦ A webinar I referred to while creating this slide. It covers an overview of EXPLAIN and query optimization, with plenty of practical examples. The 70+ Q&A items included in the slides also contain a wealth of valuable insights ◦ https://www.youtube.com/watch?v=NE-cf1h301I ◦ [slides] https://resources.pganalyze.com/pganalyze_Query_Optimization_EXPLAIN_Webinar_Jun e_2024.pdf 53

📹 References (Videos) 📹 3. Postgres Platform "Best Practices" for
the Modern DBA (PGConf.EU 2024) - Chelsea Dole ◦ While it only brieﬂy touches on EXPLAIN (mostly auto_explain), the talk focuses on best practices for managing PostgreSQL for others. It’s full of practical, experience-based tips and is an excellent talk for those involved in PostgreSQL operations ◦ https://www.youtube.com/watch?v=R8RBkx-ysqg 4. Internals of physical join operators (SQL with BRET) - Bret Wagner ◦ A video series explaining how JOINs work in SQL Server. While there may be minor differences from PostgreSQL, the core concepts are the same. The animations make it easy to understand the three main JOIN types ◦ Nested Loop Join: https://www.youtube.com/watch?v=0arjvMJihJo ◦ Merge Join: https://www.youtube.com/watch?v=IFUB8iw46RI ◦ Hash Match Join: https://www.youtube.com/watch?v=59C8c7p_hII 54

📹 References (Videos) 📹 5. A Deep Dive into Postgres
Statistics (PGConf.EU 2024) - Louise Grandjonc ◦ A deep-dive talk on PostgreSQL statistics. Packed with concrete examples and math, it provides a thorough and intuitive understanding of how stats work ◦ https://www.youtube.com/watch?v=ApAClPFJ_rU 55

EXPLAIN ANALYZE overhead 56

EXPLAIN terminology 57 EXPLAIN ANALYZE SELECT * FROM tenk1 t1,
tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------- ----------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) (actual time=.. ) Recheck Cond: (unique1 < 10) Heap Blocks: exact=10 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual…) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) (actual…) Index Cond: (unique2 = t1.unique2) Planning Time: 0.485 ms Execution Time: 0.073 ms Plan tree

EXPLAIN terminology 58 EXPLAIN ANALYZE SELECT * FROM tenk1 t1,
tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------- ----------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) (actual time=.. ) Recheck Cond: (unique1 < 10) Heap Blocks: exact=10 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual…) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) (actual…) Index Cond: (unique2 = t1.unique2) Planning Time: 0.485 ms Execution Time: 0.073 ms Plan node

Reading EXPLAIN result • (Advanced) Manually inspect the text format
output • Or use a EXPLAIN plan visualizer ◦ Explain Dalibo https://explain.dalibo.com/ ◦ Explain Depesz https://explain.depesz.com/ ◦ pgMustard https://www.pgmustard.com/ ◦ pganalyze https://pganalyze.com/ 59

Example of Apply changes • Indexes ◦ Adding missing indexes
or revisit existing ones • Statistics (update selectivity) ◦ Run ANALYZE to refresh outdated statistics ◦ Use CREATE STATISTICS to add extended stats when needed • Query rewrites ◦ Simplify queries or avoid fetching unnecessary columns ◦ Materialize CTEs (WITH x AS MATERIALIZED) • Forcing plan choices ◦ Adjust planner conﬁg parameters (e.g. enable_nestloop = off) ◦ Use pg_hint_plan to suggest speciﬁc strategies 60

Practical Query Tuning in PostgreSQL

Practical Query Tuning in PostgreSQL

More Decks by Keiko Oda

Other Decks in Technology

Featured

Transcript