Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Postgres Could Index Itself

Andrew Kane
September 08, 2017

How Postgres Could Index Itself

Andrew Kane

September 08, 2017
Tweet

More Decks by Andrew Kane

Other Decks in Programming

Transcript

  1. How Postgres Could Index Itself

  2. None
  3. github.com/ankane

  4. None
  5. Read speed vs Write speed Space

  6. None
  7. v1

  8. Collect queries Analyze queries

  9. pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT

    … 40,000 80,000 0.5 SELECT … 30,000 300 100
  10. SELECT * FROM products WHERE store_id = 1

  11. pg query github.com/lfittl/pg_query _

  12. SELECT * FROM products WHERE store_id = 1

  13. SELECT * FROM products WHERE store_id = 1 AND brand_id

    = 2
  14. Stores have many products Brands have a few products

  15. id store_id brand_id 1 1 2 2 4 8 3

    1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter store_id = 1 fetch brand_id = 2
  16. pg_stats n_distinct null_frac

  17. store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100

    9,000 Estimated Rows 1,000 10
  18. store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100

    9,000 Estimated Rows 1,000,000 10,000
  19. store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100

  20. SELECT * FROM products ORDER BY created_at DESC LIMIT 10

  21. SELECT * FROM products WHERE store_id = 1 ORDER BY

    created_at DESC LIMIT 10
  22. None
  23. Shortcomings

  24. Single table plus Simple WHERE clause and/or Simple ORDER BY

    clause
  25. Duplicating planner logic

  26. pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds

  27. most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id =

    1 vs store_id = 2
  28. histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs

    qty > 5
  29. SELECT * FROM products WHERE store_id = ?

  30. v2

  31. log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE

    store_id = 1
  32. Given a query and a set of indexes best indexes

    to use
  33. Given a query and all possible indexes best indexes possible

  34. /* Allow a plugin to editorialize on the info we

    obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ffd2
  35. hypopg github.com/dalibo/hypopg

  36. SELECT * FROM products WHERE store_id = 1 AND brand_id

    = 2
  37. EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id

    = 1 AND brand_id = 2) Final Cost
  38. Cost Hypothetical Indexes Original 1000

  39. Add hypothetical indexes store_id brand_id

  40. EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108)

    Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index
  41. Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id

  42. Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try

    different sort orders right now)
  43. Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi

    Column 45 brand_id, store_id
  44. Dexter github.com/ankane/dexter

  45. tail -F -n +1 <log-file> | dexter <conn-opts>

  46. --create --exclude big_table --min-time 10

  47. Shortcomings

  48. SELECT * FROM products WHERE a = 1 AND b

    = 2 SELECT * FROM products WHERE b = 2
  49. B-TREE Only No Expressions No Partial

  50. SELECT * FROM products WHERE qty = 0

  51. DROP INDEX Unused indexes

  52. HypoPG Extension Support

  53. None
  54. pg_query HypoPG

  55. Get Involved github.com/ankane/dexter