Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Postgres Could Index Itself
Search
Andrew Kane
September 08, 2017
Programming
0
3.2k
How Postgres Could Index Itself
Andrew Kane
September 08, 2017
Tweet
Share
Other Decks in Programming
See All in Programming
どの様にAIエージェントと 協業すべきだったのか?
takefumiyoshii
2
590
iOSアプリの信頼性を向上させる取り組み/ios-app-improve-reliability
shino8rayu9
0
150
止められない医療アプリ、そっと Swift 6 へ
medley
1
110
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1k
プログラミングどうやる? ~テスト駆動開発から学ぶ達人の型~
a_okui
0
190
Pythonスレッドとは結局何なのか? CPython実装から見るNoGIL時代の変化
curekoshimizu
4
1.3k
育てるアーキテクチャ:戦い抜くPythonマイクロサービスの設計と進化戦略
fujidomoe
1
150
Catch Up: Go Style Guide Update
andpad
0
160
CSC509 Lecture 01
javiergs
PRO
1
430
After go func(): Goroutines Through a Beginner’s Eye
97vaibhav
0
230
NetworkXとGNNで学ぶグラフデータ分析入門〜複雑な関係性を解き明かすPythonの力〜
mhrtech
3
990
猫と暮らすネットワークカメラ生活🐈 ~Vision frameworkでペットを愛でよう~ / iOSDC Japan 2025
yutailang0119
0
220
Featured
See All Featured
Faster Mobile Websites
deanohume
310
31k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
53k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Site-Speed That Sticks
csswizardry
11
880
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Designing Experiences People Love
moore
142
24k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
61k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
Transcript
How Postgres Could Index Itself
None
github.com/ankane
None
Read speed vs Write speed Space
None
v1
Collect queries Analyze queries
pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT
… 40,000 80,000 0.5 SELECT … 30,000 300 100
SELECT * FROM products WHERE store_id = 1
pg query github.com/lfittl/pg_query _
SELECT * FROM products WHERE store_id = 1
SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2
Stores have many products Brands have a few products
id store_id brand_id 1 1 2 2 4 8 3
1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter store_id = 1 fetch brand_id = 2
pg_stats n_distinct null_frac
store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000 10
store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000,000 10,000
store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100
SELECT * FROM products ORDER BY created_at DESC LIMIT 10
SELECT * FROM products WHERE store_id = 1 ORDER BY
created_at DESC LIMIT 10
None
Shortcomings
Single table plus Simple WHERE clause and/or Simple ORDER BY
clause
Duplicating planner logic
pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds
most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id =
1 vs store_id = 2
histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs
qty > 5
SELECT * FROM products WHERE store_id = ?
v2
log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE
store_id = 1
Given a query and a set of indexes best indexes
to use
Given a query and all possible indexes best indexes possible
/* Allow a plugin to editorialize on the info we
obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ffd2
hypopg github.com/dalibo/hypopg
SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2
EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id
= 1 AND brand_id = 2) Final Cost
Cost Hypothetical Indexes Original 1000
Add hypothetical indexes store_id brand_id
EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108)
Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index
Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id
Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try
different sort orders right now)
Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi
Column 45 brand_id, store_id
Dexter github.com/ankane/dexter
tail -F -n +1 <log-file> | dexter <conn-opts>
--create --exclude big_table --min-time 10
Shortcomings
SELECT * FROM products WHERE a = 1 AND b
= 2 SELECT * FROM products WHERE b = 2
B-TREE Only No Expressions No Partial
SELECT * FROM products WHERE qty = 0
DROP INDEX Unused indexes
HypoPG Extension Support
None
pg_query HypoPG
Get Involved github.com/ankane/dexter