Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Postgres Could Index Itself
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Andrew Kane
September 08, 2017
Programming
3.3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
How Postgres Could Index Itself
Andrew Kane
September 08, 2017
Other Decks in Programming
See All in Programming
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
150
運用エージェントは "作る" から "育てる" へ - 記憶と自己進化の3層設計パターン / self-evolving-agents-three-layer-agent-design
gawa
12
3.6k
Spec-Driven Development with AI-Agents: From High-Level Requirements to Working Software
antonarhipov
2
470
Inside Stream API
skrb
1
660
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
250
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
220
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
12k
RTSPクライアントを自作してみた話
simotin13
0
520
Vite+ Unified Toolchain for the Web
naokihaba
0
170
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
Spec Driven Development | AI Summit Lisbon
danielsogl
PRO
0
170
Agentic UI
manfredsteyer
PRO
0
120
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Documentation Writing (for coders)
carmenintech
77
5.4k
BBQ
matthewcrist
89
10k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.5k
Become a Pro
speakerdeck
PRO
31
6k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Marketing to machines
jonoalderson
1
5.4k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
600
The browser strikes back
jonoalderson
0
1.2k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
240
Transcript
How Postgres Could Index Itself
None
github.com/ankane
None
Read speed vs Write speed Space
None
v1
Collect queries Analyze queries
pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT
… 40,000 80,000 0.5 SELECT … 30,000 300 100
SELECT * FROM products WHERE store_id = 1
pg query github.com/lfittl/pg_query _
SELECT * FROM products WHERE store_id = 1
SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2
Stores have many products Brands have a few products
id store_id brand_id 1 1 2 2 4 8 3
1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter store_id = 1 fetch brand_id = 2
pg_stats n_distinct null_frac
store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000 10
store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100
9,000 Estimated Rows 1,000,000 10,000
store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100
SELECT * FROM products ORDER BY created_at DESC LIMIT 10
SELECT * FROM products WHERE store_id = 1 ORDER BY
created_at DESC LIMIT 10
None
Shortcomings
Single table plus Simple WHERE clause and/or Simple ORDER BY
clause
Duplicating planner logic
pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds
most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id =
1 vs store_id = 2
histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs
qty > 5
SELECT * FROM products WHERE store_id = ?
v2
log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE
store_id = 1
Given a query and a set of indexes best indexes
to use
Given a query and all possible indexes best indexes possible
/* Allow a plugin to editorialize on the info we
obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ffd2
hypopg github.com/dalibo/hypopg
SELECT * FROM products WHERE store_id = 1 AND brand_id
= 2
EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id
= 1 AND brand_id = 2) Final Cost
Cost Hypothetical Indexes Original 1000
Add hypothetical indexes store_id brand_id
EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108)
Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index
Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id
Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try
different sort orders right now)
Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi
Column 45 brand_id, store_id
Dexter github.com/ankane/dexter
tail -F -n +1 <log-file> | dexter <conn-opts>
--create --exclude big_table --min-time 10
Shortcomings
SELECT * FROM products WHERE a = 1 AND b
= 2 SELECT * FROM products WHERE b = 2
B-TREE Only No Expressions No Partial
SELECT * FROM products WHERE qty = 0
DROP INDEX Unused indexes
HypoPG Extension Support
None
pg_query HypoPG
Get Involved github.com/ankane/dexter