Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Postgres Could Index Itself

Andrew Kane
September 08, 2017

How Postgres Could Index Itself

Andrew Kane

September 08, 2017
Tweet

Other Decks in Programming

Transcript

  1. How Postgres
    Could Index
    Itself

    View full-size slide

  2. github.com/ankane

    View full-size slide

  3. Read speed
    vs
    Write speed
    Space

    View full-size slide

  4. Collect queries
    Analyze queries

    View full-size slide

  5. pg_stat_statements
    Query
    Total Time
    (ms)
    Calls
    Average Time
    (ms)
    SELECT … 40,000 80,000 0.5
    SELECT … 30,000 300 100

    View full-size slide

  6. SELECT * FROM products
    WHERE store_id = 1

    View full-size slide

  7. pg query
    github.com/lfittl/pg_query
    _

    View full-size slide

  8. SELECT * FROM products
    WHERE store_id = 1

    View full-size slide

  9. SELECT * FROM products
    WHERE store_id = 1
    AND brand_id = 2

    View full-size slide

  10. Stores have many products
    Brands have a few products

    View full-size slide

  11. id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    fetch store_id = 1
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    filter brand_id = 2
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    id store_id brand_id
    1 1 2
    2 4 8
    3 1 9
    4 1 3
    filter store_id = 1
    fetch brand_id = 2

    View full-size slide

  12. pg_stats
    n_distinct
    null_frac

    View full-size slide

  13. store_id brand_id
    Rows 100,000 100,000
    null_frac 0 0.10
    n_distinct 100 9,000
    Estimated Rows 1,000 10

    View full-size slide

  14. store_id brand_id
    Rows 100,000,000 100,000,000
    null_frac 0 0.10
    n_distinct 100 9,000
    Estimated Rows 1,000,000 10,000

    View full-size slide

  15. store_id
    Rows 10,000
    null_frac 0
    n_distinct 100
    Estimated Rows 100

    View full-size slide

  16. SELECT * FROM products
    ORDER BY created_at DESC
    LIMIT 10

    View full-size slide

  17. SELECT * FROM products
    WHERE store_id = 1
    ORDER BY created_at DESC
    LIMIT 10

    View full-size slide

  18. Shortcomings

    View full-size slide

  19. Single table
    plus
    Simple WHERE clause
    and/or
    Simple ORDER BY clause

    View full-size slide

  20. Duplicating
    planner logic

    View full-size slide

  21. pg_stats
    n_distinct
    null_frac
    ✗ most_common_vals
    ✗ most_common_freqs
    ✗ histogram_bounds

    View full-size slide

  22. most_common_vals
    {2, 5, 1}
    most_common_freqs
    {0.9, 0.05, 0.01}
    store_id = 1 vs store_id = 2

    View full-size slide

  23. histogram_bounds
    {0, 9, 25, 60, 99}
    qty < 5 vs qty > 5

    View full-size slide

  24. SELECT * FROM products
    WHERE store_id = ?

    View full-size slide

  25. log_min_statement_duration
    duration: 100 ms statement:
    SELECT * FROM products WHERE
    store_id = 1

    View full-size slide

  26. Given a query and
    a set of indexes
    best indexes to use

    View full-size slide

  27. Given a query and
    all possible indexes
    best indexes possible

    View full-size slide

  28. /* Allow a plugin to editorialize on
    the info we obtained from the
    catalogs. Actions might include
    altering the assumed relation size,
    removing an index, or adding a
    hypothetical index to the indexlist. */
    get_relation_info_hook
    604ffd2

    View full-size slide

  29. hypopg
    github.com/dalibo/hypopg

    View full-size slide

  30. SELECT * FROM products
    WHERE store_id = 1
    AND brand_id = 2

    View full-size slide

  31. EXPLAIN
    Seq Scan on products (cost=0.00..1000.00
    rows=100 width=108)
    Filter: (store_id = 1 AND brand_id = 2)
    Final Cost

    View full-size slide

  32. Cost
    Hypothetical
    Indexes
    Original 1000

    View full-size slide

  33. Add hypothetical indexes
    store_id
    brand_id

    View full-size slide

  34. EXPLAIN
    Index Scan using <41072>hypo_btree on
    products (cost=0.28..50.29 rows=1 width=108)
    Index Cond: (brand_id = 2)
    Filter: (store_id = 1) Final Cost
    Index

    View full-size slide

  35. Cost
    Hypothetical
    Indexes
    Original 1000
    Single Column 50 brand_id

    View full-size slide

  36. Add hypothetical indexes
    store_id, brand_id
    brand_id, store_id
    (does not try different sort orders right now)

    View full-size slide

  37. Cost
    Hypothetical
    Indexes
    Original 1000
    Single Column 50 brand_id
    Multi Column 45 brand_id, store_id

    View full-size slide

  38. Dexter
    github.com/ankane/dexter

    View full-size slide

  39. tail -F -n +1 | dexter

    View full-size slide

  40. --create
    --exclude big_table
    --min-time 10

    View full-size slide

  41. Shortcomings

    View full-size slide

  42. SELECT * FROM products
    WHERE a = 1 AND b = 2
    SELECT * FROM products
    WHERE b = 2

    View full-size slide

  43. B-TREE Only
    No Expressions
    No Partial

    View full-size slide

  44. SELECT * FROM products
    WHERE qty = 0

    View full-size slide

  45. DROP INDEX
    Unused indexes

    View full-size slide

  46. HypoPG
    Extension Support

    View full-size slide

  47. pg_query
    HypoPG

    View full-size slide

  48. Get Involved
    github.com/ankane/dexter

    View full-size slide