How Postgres Could Index Itself

github.com/ankane

Read speed vs Write speed Space

Collect queries Analyze queries

pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT
… 40,000 80,000 0.5 SELECT … 30,000 300 100

SELECT * FROM products WHERE store_id = 1

pg query github.com/lﬁttl/pg_query _

SELECT * FROM products WHERE store_id = 1

SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2

Stores have many products Brands have a few products

id store_id brand_id 1 1 2 2 4 8 3
1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 ﬁlter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 ﬁlter store_id = 1 fetch brand_id = 2

pg_stats n_distinct null_frac

store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000 10

store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000,000 10,000

store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100

SELECT * FROM products ORDER BY created_at DESC LIMIT 10

SELECT * FROM products WHERE store_id = 1 ORDER BY
created_at DESC LIMIT 10

Shortcomings

Single table plus Simple WHERE clause and/or Simple ORDER BY
clause

Duplicating planner logic

pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds

most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id =
1 vs store_id = 2

histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs
qty > 5

SELECT * FROM products WHERE store_id = ?

log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE
store_id = 1

Given a query and a set of indexes best indexes
to use

Given a query and all possible indexes best indexes possible

/* Allow a plugin to editorialize on the info we
obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ﬀd2

hypopg github.com/dalibo/hypopg

SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2

EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id
= 1 AND brand_id = 2) Final Cost

Cost Hypothetical Indexes Original 1000

Add hypothetical indexes store_id brand_id

EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108)
Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index

Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id

Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try
diﬀerent sort orders right now)

Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi
Column 45 brand_id, store_id

Dexter github.com/ankane/dexter

tail -F -n +1 <log-ﬁle> | dexter <conn-opts>

--create --exclude big_table --min-time 10

Shortcomings

SELECT * FROM products WHERE a = 1 AND b
= 2 SELECT * FROM products WHERE b = 2

B-TREE Only No Expressions No Partial

SELECT * FROM products WHERE qty = 0

DROP INDEX Unused indexes

HypoPG Extension Support

pg_query HypoPG

Get Involved github.com/ankane/dexter

How Postgres Could Index Itself

How Postgres Could Index Itself

Other Decks in Programming

Featured

Transcript