Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres Performance for Humans (Ancient City Ruby)

Postgres Performance for Humans (Ancient City Ruby)

Craig Kerstiens

April 07, 2014
Tweet

More Decks by Craig Kerstiens

Other Decks in Technology

Transcript

  1. Postgres Performance
    for Humans
    @craigkerstiens

    View Slide

  2. Shameless plugs
    http://www.postgresweekly.com!
    http://www.craigkerstiens.com!
    http://www.postgresguide.com!
    http://www.postgresapp.com!
    !
    http://postgres.heroku.com!

    View Slide

  3. Postgres
    It might help to explain that the pronunciation is "post-gres" or!
    "post-gres-cue-ell", not "post-gray-something".!
    !
    I heard people making this same mistake in presentations at this!
    past weekend's Postgres Anniversary Conference :-( Arguably,!
    the 1996 decision to call it PostgreSQL instead of reverting to!
    plain Postgres was the single worst mistake this project ever
    made.!
    It seems far too late to change now, though.!
    !
    ! ! ! regards, tom lane!

    View Slide

  4. Postgres - TLDR

    View Slide

  5. Postgres - TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum

    View Slide

  6. TLDR in a quote
    http://www.craigkerstiens.com/2012/04/30/why-postgres/
    “Its the emacs of databases”

    View Slide

  7. OLTP vs OLAP

    View Slide

  8. OLTP vs OLAP
    Web apps

    View Slide

  9. OLTP vs OLAP
    BI/Reporting

    View Slide

  10. Postgres Setup/Config
    On Amazon
    Use Heroku OR ‘postgresql when its not your dayjob’
    Other clouds
    ‘postgresql when its not your dayjob’
    Real hardware
    High performance PostgreSQL
    !
    http://thebuild.com/blog/2012/06/04/postgresql-when-its-not-your-job-at-djangocon-europe/

    View Slide

  11. View Slide

  12. Cache rules
    everything around me

    View Slide

  13. 80/20 rule

    View Slide

  14. Cache Hit Rate
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),
    '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)

    View Slide

  15. Cache Hit Rate
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99

    View Slide

  16. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;
    !
    !
    !

    View Slide

  17. Index Hit Rate
    relname | percent_of_times_index_used | rows_in_table
    ---------------------+-----------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059
    schema_migrations | 0 | 2
    authorizations | 0 | 0
    delayed_jobs | 23 | 0

    View Slide

  18. Rough guidelines
    Cache hit rate >= 99%
    !
    Index hit rate >= 95%
    where on > 10,000 rows

    View Slide

  19. Shortcuts

    View Slide

  20. $ cat ~/.psqlrc
    !
    \set ON_ERROR_ROLLBACK interactive
    !
    -- automatically switch between extended and normal
    \x auto
    !
    -- always show how long a query takes
    \timing
    !
    \set show_slow_queries
    'SELECT
    (total_time / 1000 / 60) as total_minutes,
    (total_time/calls) as average_time, query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;'
    !
    psql

    View Slide

  21. $ cat ~/.psqlrc
    !
    \set ON_ERROR_ROLLBACK interactive
    !
    -- automatically switch between extended and normal
    \x auto
    !
    -- always show how long a query takes
    \timing
    !
    \set show_slow_queries
    'SELECT
    (total_time / 1000 / 60) as total_minutes,
    (total_time/calls) as average_time, query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;'
    !
    psql

    View Slide

  22. How Data is
    Retrieved

    View Slide

  23. Sequential Scanning
    Record 1 Record 2 Record 3 Record 4 Record 5 Record …
    Record 1 Record 2 Record 3 Record 4 Record 5 Record …

    View Slide

  24. Index Scans
    A-F G-L M-R S-Z
    G H I J K L
    Record 57
    Record … Record …

    View Slide

  25. Sequential Scans
    Good for large reports
    !
    Computing over lots of
    data (1k + rows)
    Index Scans
    Good for small results
    !
    Most common queries in
    your app

    View Slide

  26. Understanding
    Specific Query
    Performance

    View Slide

  27. Understanding Query Performance
    !
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;

    View Slide

  28. Explain
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    !
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View Slide

  29. Explain
    # EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees
    width=6)
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    (cost=0.00..35811.00 rows=1

    View Slide

  30. Explain Analyze
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    actual time
    2.401..295.247 rows=1428
    295.379

    View Slide

  31. Rough guidelines
    Page response times < 100 ms
    Common queries < 10ms
    Rare queries < 100ms

    View Slide

  32. Explain Analyze
    # EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    !
    Filter: (salary >= 50000)
    (3 rows)
    startup time max time rows return
    actual time
    2.401..295.247 rows=1428
    295.379

    View Slide

  33. # CREATE INDEX idx_emps ON employees (salary);
    Indexes!

    View Slide

  34. Indexes!
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View Slide

  35. pg_stat_statements
    $ select * from pg_stat_statements where query ~ 'from users where email';

    !
    !
    userid │ 16384
    dbid │ 16388
    query │ select * from users where email = ?;
    calls │ 2
    total_time │ 0.000268
    rows │ 2
    shared_blks_hit │ 16
    shared_blks_read │ 0
    shared_blks_dirtied │ 0
    shared_blks_written │ 0
    local_blks_hit │ 0
    local_blks_read │ 0
    local_blks_dirtied │ 0
    local_blks_written │ 0
    ...

    View Slide

  36. SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;
    pg_stat_statements

    View Slide

  37. total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)
    !
    pg_stat_statements

    View Slide

  38. Indexes

    View Slide

  39. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)

    View Slide

  40. Indexes
    Which do I use?

    View Slide

  41. BTree
    !
    !
    !
    This is what you usually want

    View Slide

  42. Generalized Inverted Index (GIN)
    !
    !
    !
    Use with multiple values in 1 column
    Array/hStore

    View Slide

  43. Generalized Search Tree (GIST)
    !
    !
    !
    Full text search
    Shapes

    View Slide

  44. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    VODKA (Coming soon)

    View Slide

  45. More indexes

    View Slide

  46. Indexes
    Conditional
    Functional
    Concurrent creation

    View Slide

  47. Conditional
    > SELECT *
    FROM places;
    !
    name | population
    -----------------------------------
    ACMAR | 6055
    ARAB | 13650

    View Slide

  48. Conditional
    > SELECT *
    FROM places
    WHERE population > 10000;
    !
    name | population
    -----------------------------------
    ARAB | 13650

    View Slide

  49. Conditional
    > CREATE INDEX idx_large_population ON
    places(name) where population > 10000;
    !

    View Slide

  50. Functional
    > SELECT *
    FROM places;
    !
    data
    -----------------------------------
    {"city": "ACMAR", "pop": 6055}!
    {"city": "ARAB", "pop": 13650}!

    View Slide

  51. > SELECT *
    FROM places
    WHERE get_numeric('pop', data) > 10000;
    !
    data
    -----------------------------------
    {"city": "ARAB", "pop": 13650}
    Functional

    View Slide

  52. > CREATE INDEX idx_large_population ON
    places(get_numeric('pop', data));
    !
    Functional

    View Slide

  53. Conditional and Functional
    > CREATE INDEX idx_large_population ON
    places(data) WHERE
    get_numeric('pop', data) > 10000;
    !

    View Slide

  54. CREATE INDEX CONCURRENTLY ...
    !
    roughly 2-3x slower
    Doesn’t lock table
    One more thing

    View Slide

  55. hstore / JSON

    View Slide

  56. hstore
    CREATE EXTENSION hstore;
    CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255),
    data hstore,
    created_at timestamp without time zone,
    last_login timestamp without time zone
    );

    View Slide

  57. hstore
    INSERT INTO users
    VALUES (
    1,
    '[email protected]',
    'sex => "M", state => “California”',
    now(),
    now()
    );
    !

    View Slide

  58. SELECT
    '{"id":1,"email":
    "[email protected]",}'::json;
    JSON

    View Slide

  59. hstore
    Indexes work
    gin
    gist

    View Slide

  60. json
    Functional indexes work
    have fun

    View Slide

  61. jsonb
    The world is better

    View Slide

  62. Pooling

    View Slide

  63. !
    Application/Framework layer
    !
    Stand alone daemon
    Options

    View Slide

  64. pgbouncer
    pgpool
    PG options

    View Slide

  65. Adding Cache

    View Slide

  66. Replication options
    slony
    londiste
    bucardo
    pgpool
    !
    !
    wal-e
    barman

    View Slide

  67. Replication options
    slony
    londiste
    bucardo
    pgpool
    !
    !
    wal-e
    barman

    View Slide

  68. Backups

    View Slide

  69. Logical
    pg_dump
    !
    can be human readable, is portable

    View Slide

  70. Physical
    The bytes on disk
    !
    Base backup

    View Slide

  71. Logical
    Good across architectures
    Good for portability
    !
    Has load on DB
    !
    Works < 50 GB
    Physical
    More initial setup
    Less portability
    !
    Limited load on system
    !
    Use above 50 GB

    View Slide

  72. Recap

    View Slide

  73. OLAP
    !
    Whole other talk
    Disk IO is important
    Order on disk is helpful (pg-reorg)
    MPP solutions on top of Postgres
    Recap

    View Slide

  74. OLTP (webapps)
    Ensure bulk of data is cache
    Optimize overall query load with pg_stat_statements
    Efficient use of indexes
    When cache sucks, throw more at it
    Recap

    View Slide

  75. Questions
    http://www.speakerdeck.com/u/
    craigkerstiens/!

    View Slide