How Postgres Could Index Itself

A2f2287b6054926438eb65f8726cc77d?s=47 Andrew Kane
September 08, 2017

How Postgres Could Index Itself

A2f2287b6054926438eb65f8726cc77d?s=128

Andrew Kane

September 08, 2017
Tweet

Transcript

  1. How Postgres Could Index Itself

  2. None
  3. github.com/ankane

  4. None
  5. Read speed vs Write speed Space

  6. None
  7. v1

  8. Collect queries Analyze queries

  9. pg_stat_statements Query Total Time (ms) Calls Average Time (ms) SELECT

    … 40,000 80,000 0.5 SELECT … 30,000 300 100
  10. SELECT * FROM products WHERE store_id = 1

  11. pg query github.com/lfittl/pg_query _

  12. SELECT * FROM products WHERE store_id = 1

  13. SELECT * FROM products WHERE store_id = 1 AND brand_id

    = 2
  14. Stores have many products Brands have a few products

  15. id store_id brand_id 1 1 2 2 4 8 3

    1 9 4 1 3 fetch store_id = 1 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter brand_id = 2 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 id store_id brand_id 1 1 2 2 4 8 3 1 9 4 1 3 filter store_id = 1 fetch brand_id = 2
  16. pg_stats n_distinct null_frac

  17. store_id brand_id Rows 100,000 100,000 null_frac 0 0.10 n_distinct 100

    9,000 Estimated Rows 1,000 10
  18. store_id brand_id Rows 100,000,000 100,000,000 null_frac 0 0.10 n_distinct 100

    9,000 Estimated Rows 1,000,000 10,000
  19. store_id Rows 10,000 null_frac 0 n_distinct 100 Estimated Rows 100

  20. SELECT * FROM products ORDER BY created_at DESC LIMIT 10

  21. SELECT * FROM products WHERE store_id = 1 ORDER BY

    created_at DESC LIMIT 10
  22. None
  23. Shortcomings

  24. Single table plus Simple WHERE clause and/or Simple ORDER BY

    clause
  25. Duplicating planner logic

  26. pg_stats n_distinct null_frac ✗ most_common_vals ✗ most_common_freqs ✗ histogram_bounds

  27. most_common_vals {2, 5, 1} most_common_freqs {0.9, 0.05, 0.01} store_id =

    1 vs store_id = 2
  28. histogram_bounds {0, 9, 25, 60, 99} qty < 5 vs

    qty > 5
  29. SELECT * FROM products WHERE store_id = ?

  30. v2

  31. log_min_statement_duration duration: 100 ms statement: SELECT * FROM products WHERE

    store_id = 1
  32. Given a query and a set of indexes best indexes

    to use
  33. Given a query and all possible indexes best indexes possible

  34. /* Allow a plugin to editorialize on the info we

    obtained from the catalogs. Actions might include altering the assumed relation size, removing an index, or adding a hypothetical index to the indexlist. */ get_relation_info_hook 604ffd2
  35. hypopg github.com/dalibo/hypopg

  36. SELECT * FROM products WHERE store_id = 1 AND brand_id

    = 2
  37. EXPLAIN Seq Scan on products (cost=0.00..1000.00 rows=100 width=108) Filter: (store_id

    = 1 AND brand_id = 2) Final Cost
  38. Cost Hypothetical Indexes Original 1000

  39. Add hypothetical indexes store_id brand_id

  40. EXPLAIN Index Scan using <41072>hypo_btree on products (cost=0.28..50.29 rows=1 width=108)

    Index Cond: (brand_id = 2) Filter: (store_id = 1) Final Cost Index
  41. Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id

  42. Add hypothetical indexes store_id, brand_id brand_id, store_id (does not try

    different sort orders right now)
  43. Cost Hypothetical Indexes Original 1000 Single Column 50 brand_id Multi

    Column 45 brand_id, store_id
  44. Dexter github.com/ankane/dexter

  45. tail -F -n +1 <log-file> | dexter <conn-opts>

  46. --create --exclude big_table --min-time 10

  47. Shortcomings

  48. SELECT * FROM products WHERE a = 1 AND b

    = 2 SELECT * FROM products WHERE b = 2
  49. B-TREE Only No Expressions No Partial

  50. SELECT * FROM products WHERE qty = 0

  51. DROP INDEX Unused indexes

  52. HypoPG Extension Support

  53. None
  54. pg_query HypoPG

  55. Get Involved github.com/ankane/dexter