Slide 1

Slide 1 text

How Postgres Could Index Itself

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

github.com/ankane

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Read speed vs Write speed Space

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

v1

Slide 8

Slide 8 text

Collect queries Analyze queries

Slide 9

Slide 9 text

pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT … 40,000 80,000 0.5 SELECT … 30,000 300 100

Slide 10

Slide 10 text

SELECT * FROM products WHERE store_id = 1

Slide 11

Slide 11 text

pg query github.com/lfittl/pg_query _

Slide 12

Slide 12 text

SELECT * FROM products WHERE store_id = 1

Slide 13

Slide 13 text

SELECT * FROM products WHERE store_id = 1 AND brand_id = 2

Slide 14

Slide 14 text

Stores have many products Brands have a few products

Slide 15

Slide 15 text

id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter store_id = 1 fetch brand_id = 2

Slide 16

Slide 16 text

pg_stats n_distinct null_frac

Slide 17

Slide 17 text

store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100 9,000 Estimated Rows 1,000 10

Slide 18

Slide 18 text

store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100 9,000 Estimated Rows 1,000,000 10,000

Slide 19

Slide 19 text

store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100

Slide 20

Slide 20 text

SELECT * FROM products ORDER BY created_at DESC LIMIT 10

Slide 21

Slide 21 text

SELECT * FROM products WHERE store_id = 1 ORDER BY created_at DESC LIMIT 10

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Shortcomings

Slide 24

Slide 24 text

Single table plus Simple WHERE clause and/or Simple ORDER BY clause

Slide 25

Slide 25 text

Duplicating planner logic

Slide 26

Slide 26 text

pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds

Slide 27

Slide 27 text

most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id = 1 vs store_id = 2

Slide 28

Slide 28 text

histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs qty > 5

Slide 29

Slide 29 text

SELECT * FROM products WHERE store_id = ?

Slide 30

Slide 30 text

v2

Slide 31

Slide 31 text

log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE store_id = 1

Slide 32

Slide 32 text

Given a query and a set of indexes best indexes to use

Slide 33

Slide 33 text

Given a query and all possible indexes best indexes possible

Slide 34

Slide 34 text

/* Allow a plugin to editorialize on the info we obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ffd2

Slide 35

Slide 35 text

hypopg github.com/dalibo/hypopg

Slide 36

Slide 36 text

SELECT * FROM products WHERE store_id = 1 AND brand_id = 2

Slide 37

Slide 37 text

EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id = 1 AND brand_id = 2) Final Cost

Slide 38

Slide 38 text

Cost Hypothetical Indexes Original 1000

Slide 39

Slide 39 text

Add hypothetical indexes store_id brand_id

Slide 40

Slide 40 text

EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108) Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index

Slide 41

Slide 41 text

Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id

Slide 42

Slide 42 text

Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try different sort orders right now)

Slide 43

Slide 43 text

Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi Column 45 brand_id, store_id

Slide 44

Slide 44 text

Dexter github.com/ankane/dexter

Slide 45

Slide 45 text

tail -F -n +1 | dexter

Slide 46

Slide 46 text

--create --exclude big_table --min-time 10

Slide 47

Slide 47 text

Shortcomings

Slide 48

Slide 48 text

SELECT * FROM products WHERE a = 1 AND b = 2 SELECT * FROM products WHERE b = 2

Slide 49

Slide 49 text

B-TREE Only No Expressions No Partial

Slide 50

Slide 50 text

SELECT * FROM products WHERE qty = 0

Slide 51

Slide 51 text

DROP INDEX Unused indexes

Slide 52

Slide 52 text

HypoPG Extension Support

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

pg_query HypoPG

Slide 55

Slide 55 text

Get Involved github.com/ankane/dexter