Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres at any scale

Postgres at any scale

Craig Kerstiens

January 18, 2018
Tweet

More Decks by Craig Kerstiens

Other Decks in Technology

Transcript

  1. Postgres at any scale
    Craig Kerstiens @ Citus Data

    View Slide

  2. Who am I?
    • Craig Kerstiens
    • Run fully managed database-as-a-service
    • Curate Postgres Weekly
    • Blog at craigkerstiens.com
    • Previously Heroku, Truviso, Accenture

    View Slide

  3. Itty bitty
    little data

    View Slide

  4. Database is just
    a hash

    View Slide

  5. Postgres
    • Datatypes
    • GIS
    • Index types
    • Full text search
    • JSONB

    View Slide

  6. Datatypes
    • Varchar(255)
    • Timestamp vs. Timestamptz
    • JSON vs. JSONB

    View Slide

  7. Constraints
    • Django/SQL Alchemy good by default
    • Is this string really under 255 characters?
    • Is the IP address really an IP address?
    • Does the foreign key exist?
    • What about if you delete a user, do you clean up the
    rest of the records?

    View Slide

  8. Backups
    Backups let you
    1. recover data, but also
    2. verify quality of your data
    Backups don’t exist unless you test them

    View Slide

  9. Medium

    View Slide

  10. Configure your system
    • Setup your logs
    • Configure your memory
    • Tweak vacuum
    • Adjust checkpoints

    View Slide

  11. Don’t learn it all
    http://pgtune.leopard.in.ua/

    View Slide

  12. Cache rules
    everything around me

    View Slide

  13. Cache hit rate
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit +
    idx_blks_read), '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)

    View Slide

  14. Cache hit rate
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99

    View Slide

  15. Index hit rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;

    View Slide

  16. Index hit rate
    relname | percent_of_times_index_used | rows_in_table
    ---------------------+-----------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059

    View Slide

  17. Rough guidelines
    Cache hit rate >= 99%
    Index hit rate >= 95%
    where on > 10,000 rows

    View Slide

  18. Understanding Specific
    Query Performance

    View Slide

  19. Understanding Specific
    Query Performance
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;

    View Slide

  20. Understanding Specific
    Query Performance
    EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------
    ------
    Seq Scan on employees (cost=0.00..35811.00
    rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View Slide

  21. Understanding Specific
    Query Performance
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)

    View Slide

  22. Rough guidelines
    Page response times < 100 ms
    Common queries < 10ms
    Rare queries < 100ms

    View Slide

  23. Understanding Specific
    Query Performance
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)

    View Slide

  24. Indexes
    CREATE INDEX idx_emps ON employees (salary);

    View Slide

  25. Indexes
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View Slide

  26. pg_stat_statements
    $ select * from pg_stat_statements where query ~ 'from users
    where email';
    userid │ 16384
    dbid │ 16388
    query │ select * from users where email = ?;
    calls │ 2
    total_time │ 0.000268
    rows │ 2
    shared_blks_hit │ 16
    shared_blks_read │ 0
    shared_blks_dirtied │ 0
    shared_blks_written │ 0
    local_blks_hit │ 0
    local_blks_read │ 0
    local_blks_dirtied │ 0
    local_blks_written │ 0
    ...

    View Slide

  27. pg_stat_statements
    SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;

    View Slide

  28. pg_stat_statements
    total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)

    View Slide

  29. Indexes

    View Slide

  30. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    Block Range Index (BRIN)

    View Slide

  31. Which do you use?

    View Slide

  32. BTree
    You usually want this

    View Slide

  33. Generalized Inverted Index (GIN)
    Use with multiple values in 1 column
    Array/hStore

    View Slide

  34. Generalized Search Tree (GIST)
    Full text search
    Shapes

    View Slide

  35. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)
    Block Range Index (BRIN)

    View Slide

  36. Other index tips

    View Slide

  37. Be specific
    • Composite indexes
    • Functional indexes
    • Conditional indexes

    View Slide

  38. Faster writes?
    Faster reads?

    View Slide

  39. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Return

    View Slide

  40. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Return

    View Slide

  41. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Update the other index
    6. Return

    View Slide

  42. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Update the other index
    6. Update the other other index
    7. Return

    View Slide

  43. Unused indexes?
    SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid))
    AS index_size,
    idx_scan as index_scans
    FROM pg_stat_user_indexes ui
    JOIN pg_index i ON ui.indexrelid = i.indexrelid
    WHERE NOT indisunique AND idx_scan < 50 AND
    pg_relation_size(relid) > 5 * 8192
    ORDER BY pg_relation_size(i.indexrelid) /
    nullif(idx_scan, 0) DESC NULLS FIRST,
    pg_relation_size(i.indexrelid) DESC;

    View Slide

  44. Unused indexes
    table | index | index_size | index_scans
    -------------------------+-----------------------------+------------+-------------
    public.formations | index_formations_on_user_id | 40 Gb | 30
    public.resources | slack_pool_created_at_idx | 10 Gb | 40
    (2 rows)

    View Slide

  45. Migrations
    not null is good
    except in a migration

    View Slide

  46. Proper migrations
    1 becomes 3:
    1. Allow nulls, set a default value though
    2. Gradually backfill old data
    3. Add your constraint

    View Slide

  47. Large scale

    View Slide

  48. Index creation
    CREATE INDEX CONCURRENTLY ...
    roughly 2-3x slower
    Doesn’t lock table

    View Slide

  49. Connection pooling
    Application/Framework layer
    Stand alone daemon

    View Slide

  50. Pooler options
    pgBouncer
    pgpool

    View Slide

  51. Scaling cache

    View Slide

  52. Scaling cache

    View Slide

  53. Offload read traffic
    to a replica

    View Slide

  54. Replication
    slony
    londiste
    bucardo
    pgpool
    wal-e
    barman

    View Slide

  55. Replication
    slony
    londiste
    bucardo
    pgpool
    wal-e/wal-g
    barman

    View Slide

  56. Sharding
    • Split up large tables
    • Split up data by customer
    • 1 database per customer
    • 1 schema per customer
    • Shard within your application

    View Slide

  57. Sharding
    • Scalable options
    • Split out large tables
    • Multi-tenant application level sharding

    View Slide

  58. Citus

    View Slide

  59. Citus

    View Slide

  60. Logical backup
    pg_dump
    can be human readable, is portable

    View Slide

  61. Physical backup
    The bytes on disk
    Base backup

    View Slide

  62. Logical
    Good across architectures
    Good for portability
    Has load on DB
    Works < 50 GB
    More initial setup
    Less portability
    Limited load on system
    Use above 50 GB
    Physical

    View Slide

  63. Recap

    View Slide

  64. Small
    • Leverage datatypes, it’s not a dumb hash
    • Test your backups
    • Take time to

    View Slide

  65. Medium
    • Tune Postgres well
    • Watch cache hit ratio
    • Be generous with indexes

    View Slide

  66. Large
    • Move away from pg_dump
    • Setup connection pooling
    • If you need to shard, invest in the right way or use a
    tool to help

    View Slide

  67. Thanks!
    Questions?

    View Slide