Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres Performance for Humans (Ancient City Ruby)

Postgres Performance for Humans (Ancient City Ruby)

Craig Kerstiens

April 07, 2014
Tweet

More Decks by Craig Kerstiens

Other Decks in Technology

Transcript

  1. Postgres Performance
    for Humans
    @craigkerstiens

    View full-size slide

  2. Shameless plugs
    http://www.postgresweekly.com!
    http://www.craigkerstiens.com!
    http://www.postgresguide.com!
    http://www.postgresapp.com!
    !
    http://postgres.heroku.com!

    View full-size slide

  3. Postgres
    It might help to explain that the pronunciation is "post-gres" or!
    "post-gres-cue-ell", not "post-gray-something".!
    !
    I heard people making this same mistake in presentations at this!
    past weekend's Postgres Anniversary Conference :-( Arguably,!
    the 1996 decision to call it PostgreSQL instead of reverting to!
    plain Postgres was the single worst mistake this project ever
    made.!
    It seems far too late to change now, though.!
    !
    ! ! ! regards, tom lane!

    View full-size slide

  4. Postgres - TLDR

    View full-size slide

  5. Postgres - TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum

    View full-size slide

  6. TLDR in a quote
    http://www.craigkerstiens.com/2012/04/30/why-postgres/
    “Its the emacs of databases”

    View full-size slide

  7. OLTP vs OLAP

    View full-size slide

  8. OLTP vs OLAP
    Web apps

    View full-size slide

  9. OLTP vs OLAP
    BI/Reporting

    View full-size slide

  10. Postgres Setup/Config
    On Amazon
    Use Heroku OR ‘postgresql when its not your dayjob’
    Other clouds
    ‘postgresql when its not your dayjob’
    Real hardware
    High performance PostgreSQL
    !
    http://thebuild.com/blog/2012/06/04/postgresql-when-its-not-your-job-at-djangocon-europe/

    View full-size slide

  11. Cache rules
    everything around me

    View full-size slide

  12. Cache Hit Rate
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),
    '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)

    View full-size slide

  13. Cache Hit Rate
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99

    View full-size slide

  14. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;
    !
    !
    !

    View full-size slide

  15. Index Hit Rate
    relname | percent_of_times_index_used | rows_in_table
    ---------------------+-----------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059
    schema_migrations | 0 | 2
    authorizations | 0 | 0
    delayed_jobs | 23 | 0

    View full-size slide

  16. Rough guidelines
    Cache hit rate >= 99%
    !
    Index hit rate >= 95%
    where on > 10,000 rows

    View full-size slide

  17. $ cat ~/.psqlrc
    !
    \set ON_ERROR_ROLLBACK interactive
    !
    -- automatically switch between extended and normal
    \x auto
    !
    -- always show how long a query takes
    \timing
    !
    \set show_slow_queries
    'SELECT
    (total_time / 1000 / 60) as total_minutes,
    (total_time/calls) as average_time, query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;'
    !
    psql

    View full-size slide

  18. $ cat ~/.psqlrc
    !
    \set ON_ERROR_ROLLBACK interactive
    !
    -- automatically switch between extended and normal
    \x auto
    !
    -- always show how long a query takes
    \timing
    !
    \set show_slow_queries
    'SELECT
    (total_time / 1000 / 60) as total_minutes,
    (total_time/calls) as average_time, query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;'
    !
    psql

    View full-size slide

  19. How Data is
    Retrieved

    View full-size slide

  20. Sequential Scanning
    Record 1 Record 2 Record 3 Record 4 Record 5 Record …
    Record 1 Record 2 Record 3 Record 4 Record 5 Record …

    View full-size slide

  21. Index Scans
    A-F G-L M-R S-Z
    G H I J K L
    Record 57
    Record … Record …

    View full-size slide

  22. Sequential Scans
    Good for large reports
    !
    Computing over lots of
    data (1k + rows)
    Index Scans
    Good for small results
    !
    Most common queries in
    your app

    View full-size slide

  23. Understanding
    Specific Query
    Performance

    View full-size slide

  24. Understanding Query Performance
    !
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;

    View full-size slide

  25. Explain
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View full-size slide

  26. Explain
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    (cost=0.00..35811.00 rows=1

    View full-size slide

  27. Explain Analyze
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    actual time
    2.401..295.247 rows=1428
    295.379

    View full-size slide

  28. Rough guidelines
    Page response times < 100 ms
    Common queries < 10ms
    Rare queries < 100ms

    View full-size slide

  29. Explain Analyze
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    actual time
    2.401..295.247 rows=1428
    295.379

    View full-size slide

  30. # CREATE INDEX idx_emps ON employees (salary);
    Indexes!

    View full-size slide

  31. Indexes!
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View full-size slide

  32. pg_stat_statements
    $ select * from pg_stat_statements where query ~ 'from users where email';

    !
    !
    userid │ 16384
    dbid │ 16388
    query │ select * from users where email = ?;
    calls │ 2
    total_time │ 0.000268
    rows │ 2
    shared_blks_hit │ 16
    shared_blks_read │ 0
    shared_blks_dirtied │ 0
    shared_blks_written │ 0
    local_blks_hit │ 0
    local_blks_read │ 0
    local_blks_dirtied │ 0
    local_blks_written │ 0
    ...

    View full-size slide

  33. SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;
    pg_stat_statements

    View full-size slide

  34. total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)
    !
    pg_stat_statements

    View full-size slide

  35. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)

    View full-size slide

  36. Indexes
    Which do I use?

    View full-size slide

  37. BTree
    !
    !
    !
    This is what you usually want

    View full-size slide

  38. Generalized Inverted Index (GIN)
    !
    !
    !
    Use with multiple values in 1 column
    Array/hStore

    View full-size slide

  39. Generalized Search Tree (GIST)
    !
    !
    !
    Full text search
    Shapes

    View full-size slide

  40. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    VODKA (Coming soon)

    View full-size slide

  41. More indexes

    View full-size slide

  42. Indexes
    Conditional
    Functional
    Concurrent creation

    View full-size slide

  43. Conditional
    > SELECT *
    FROM places;
    !
    name | population
    -----------------------------------
    ACMAR | 6055
    ARAB | 13650

    View full-size slide

  44. Conditional
    > SELECT *
    FROM places
    WHERE population > 10000;
    !
    name | population
    -----------------------------------
    ARAB | 13650

    View full-size slide

  45. Conditional
    > CREATE INDEX idx_large_population ON
    places(name) where population > 10000;
    !

    View full-size slide

  46. Functional
    > SELECT *
    FROM places;
    !
    data
    -----------------------------------
    {"city": "ACMAR", "pop": 6055}!
    {"city": "ARAB", "pop": 13650}!

    View full-size slide

  47. > SELECT *
    FROM places
    WHERE get_numeric('pop', data) > 10000;
    !
    data
    -----------------------------------
    {"city": "ARAB", "pop": 13650}
    Functional

    View full-size slide

  48. > CREATE INDEX idx_large_population ON
    places(get_numeric('pop', data));
    !
    Functional

    View full-size slide

  49. Conditional and Functional
    > CREATE INDEX idx_large_population ON
    places(data) WHERE
    get_numeric('pop', data) > 10000;
    !

    View full-size slide

  50. CREATE INDEX CONCURRENTLY ...
    !
    roughly 2-3x slower
    Doesn’t lock table
    One more thing

    View full-size slide

  51. hstore / JSON

    View full-size slide

  52. hstore
    CREATE EXTENSION hstore;
    CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255),
    data hstore,
    created_at timestamp without time zone,
    last_login timestamp without time zone
    );

    View full-size slide

  53. hstore
    INSERT INTO users
    VALUES (
    1,
    '[email protected]',
    'sex => "M", state => “California”',
    now(),
    now()
    );
    !

    View full-size slide

  54. SELECT
    '{"id":1,"email":
    "[email protected]",}'::json;
    JSON

    View full-size slide

  55. hstore
    Indexes work
    gin
    gist

    View full-size slide

  56. json
    Functional indexes work
    have fun

    View full-size slide

  57. jsonb
    The world is better

    View full-size slide

  58. !
    Application/Framework layer
    !
    Stand alone daemon
    Options

    View full-size slide

  59. pgbouncer
    pgpool
    PG options

    View full-size slide

  60. Adding Cache

    View full-size slide

  61. Replication options
    slony
    londiste
    bucardo
    pgpool
    !
    !
    wal-e
    barman

    View full-size slide

  62. Replication options
    slony
    londiste
    bucardo
    pgpool
    !
    !
    wal-e
    barman

    View full-size slide

  63. Logical
    pg_dump
    !
    can be human readable, is portable

    View full-size slide

  64. Physical
    The bytes on disk
    !
    Base backup

    View full-size slide

  65. Logical
    Good across architectures
    Good for portability
    !
    Has load on DB
    !
    Works < 50 GB
    Physical
    More initial setup
    Less portability
    !
    Limited load on system
    !
    Use above 50 GB

    View full-size slide

  66. OLAP
    !
    Whole other talk
    Disk IO is important
    Order on disk is helpful (pg-reorg)
    MPP solutions on top of Postgres
    Recap

    View full-size slide

  67. OLTP (webapps)
    Ensure bulk of data is cache
    Optimize overall query load with pg_stat_statements
    Efficient use of indexes
    When cache sucks, throw more at it
    Recap

    View full-size slide

  68. Questions
    http://www.speakerdeck.com/u/
    craigkerstiens/!

    View full-size slide