Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Postgres Could Index Itself

Andrew Kane
September 08, 2017

How Postgres Could Index Itself

Andrew Kane

September 08, 2017
Tweet

Other Decks in Programming

Transcript

  1. How Postgres
    Could Index
    Itself

    View Slide

  2. View Slide

  3. github.com/ankane

    View Slide

  4. View Slide

  5. Read speed
    vs
    Write speed
    Space

    View Slide

  6. View Slide

  7. v1

    View Slide

  8. Collect queries
    Analyze queries

    View Slide

  9. pg_stat_statements
    Query
    Total Time
    (ms)
    Calls
    Average Time
    (ms)
    SELECT … 40,000 80,000 0.5
    SELECT … 30,000 300 100

    View Slide

  10. SELECT * FROM products
    WHERE store_id = 1

    View Slide

  11. pg query
    github.com/lfittl/pg_query
    _

    View Slide

  12. SELECT * FROM products
    WHERE store_id = 1

    View Slide

  13. SELECT * FROM products
    WHERE store_id = 1
    AND brand_id = 2

    View Slide

  14. Stores have many products
    Brands have a few products

    View Slide

  15. id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    fetch store_id = 1
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    filter brand_id = 2
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    filter store_id = 1
    fetch brand_id = 2

    View Slide

  16. pg_stats
    n_distinct
    null_frac

    View Slide

  17. store_id brand_id
    Rows 100,000 100,000
    null_frac 0 0.10
    n_distinct 100 9,000
    Estimated Rows 1,000 10

    View Slide

  18. store_id brand_id
    Rows 100,000,000 100,000,000
    null_frac 0 0.10
    n_distinct 100 9,000
    Estimated Rows 1,000,000 10,000

    View Slide

  19. store_id
    Rows 10,000
    null_frac 0
    n_distinct 100
    Estimated Rows 100

    View Slide

  20. SELECT * FROM products
    ORDER BY created_at DESC
    LIMIT 10

    View Slide

  21. SELECT * FROM products
    WHERE store_id = 1
    ORDER BY created_at DESC
    LIMIT 10

    View Slide

  22. View Slide

  23. Shortcomings

    View Slide

  24. Single table
    plus
    Simple WHERE clause
    and/or
    Simple ORDER BY clause

    View Slide

  25. Duplicating
    planner logic

    View Slide

  26. pg_stats
    n_distinct
    null_frac
    ✗ most_common_vals
    ✗ most_common_freqs
    ✗ histogram_bounds

    View Slide

  27. most_common_vals
    {2, 5, 1}
    most_common_freqs
    {0.9, 0.05, 0.01}
    store_id = 1 vs store_id = 2

    View Slide

  28. histogram_bounds
    {0, 9, 25, 60, 99}
    qty < 5 vs qty > 5

    View Slide

  29. SELECT * FROM products
    WHERE store_id = ?

    View Slide

  30. v2

    View Slide

  31. log_min_statement_duration
    duration: 100 ms statement:
    SELECT * FROM products WHERE
    store_id = 1

    View Slide

  32. Given a query and
    a set of indexes
    best indexes to use

    View Slide

  33. Given a query and
    all possible indexes
    best indexes possible

    View Slide

  34. /* Allow a plugin to editorialize on
    the info we obtained from the
    catalogs. Actions might include
    altering the assumed relation size,
    removing an index, or adding a
    hypothetical index to the indexlist. */
    get_relation_info_hook
    604ffd2

    View Slide

  35. hypopg
    github.com/dalibo/hypopg

    View Slide

  36. SELECT * FROM products
    WHERE store_id = 1
    AND brand_id = 2

    View Slide

  37. EXPLAIN
    Seq Scan on products (cost=0.00..1000.00
    rows=100 width=108)
    Filter: (store_id = 1 AND brand_id = 2)
    Final Cost

    View Slide

  38. Cost
    Hypothetical
    Indexes
    Original 1000

    View Slide

  39. Add hypothetical indexes
    store_id
    brand_id

    View Slide

  40. EXPLAIN
    Index Scan using <41072>hypo_btree on
    products (cost=0.28..50.29 rows=1 width=108)
    Index Cond: (brand_id = 2)
    Filter: (store_id = 1) Final Cost
    Index

    View Slide

  41. Cost
    Hypothetical
    Indexes
    Original 1000
    Single Column 50 brand_id

    View Slide

  42. Add hypothetical indexes
    store_id, brand_id
    brand_id, store_id
    (does not try different sort orders right now)

    View Slide

  43. Cost
    Hypothetical
    Indexes
    Original 1000
    Single Column 50 brand_id
    Multi Column 45 brand_id, store_id

    View Slide

  44. Dexter
    github.com/ankane/dexter

    View Slide

  45. tail -F -n +1 | dexter

    View Slide

  46. --create
    --exclude big_table
    --min-time 10

    View Slide

  47. Shortcomings

    View Slide

  48. SELECT * FROM products
    WHERE a = 1 AND b = 2
    SELECT * FROM products
    WHERE b = 2

    View Slide

  49. B-TREE Only
    No Expressions
    No Partial

    View Slide

  50. SELECT * FROM products
    WHERE qty = 0

    View Slide

  51. DROP INDEX
    Unused indexes

    View Slide

  52. HypoPG
    Extension Support

    View Slide

  53. View Slide

  54. pg_query
    HypoPG

    View Slide

  55. Get Involved
    github.com/ankane/dexter

    View Slide