Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres at any scale | PyCon CA | Craig Kerstiens

Citus Data
November 18, 2017

Postgres at any scale | PyCon CA | Craig Kerstiens

We'll start with the basics you need to know as an app developer about interacting with your database, then dig into how you can start to analyze performance. We'll look at things you need to know for a small application, then the things you should be cautious of as you start to layer in other items you need to be aware of for performance.

Citus Data

November 18, 2017
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Postgres at any scale
    Craig Kerstiens @ Citus Data

    View Slide

  2. Who am I?
    • Craig Kerstiens
    • Run fully managed database-as-a-service
    • Curate Postgres Weekly
    • Blog at craigkerstiens.com
    • Previously Heroku, Truviso, Accenture

    View Slide

  3. Itty bitty
    little data

    View Slide

  4. Database is just
    a hash

    View Slide

  5. Why Postgres?

    View Slide

  6. “It’s the emacs of databases”
    @craigkerstiens

    View Slide

  7. Postgres
    • Datatypes
    • GIS
    • Index types
    • Full text search
    • JSONB

    View Slide

  8. Datatypes
    • Varchar(255)
    • Timestamp vs. Timestamptz
    • JSON vs. JSONB

    View Slide

  9. Constraints
    • Django/SQL Alchemy good by default
    • Is this string really under 255 characters?
    • Is the IP address really an IP address?
    • Does the foreign key exist?
    • What about if you delete a user, do you clean up the
    rest of the records?

    View Slide

  10. Master your tools
    psql is great, leverage it
    psqlrc
    (\x auto, \timing, saving history)

    View Slide

  11. Backups
    Backups let you
    1. recover data, but also
    2. verify quality of your data
    Backups don’t exist unless you test them

    View Slide

  12. Medium

    View Slide

  13. Configure your system
    • Setup your logs
    • Configure your memory
    • Tweak vacuum
    • Adjust checkpoints

    View Slide

  14. Don’t learn it all
    http://pgtune.leopard.in.ua/

    View Slide

  15. Cache rules
    everything around me

    View Slide

  16. Cache hit rate
    SELECT
    'index hit rate' as name,
    (sum(idx_blks_hit) - sum(idx_blks_read)) /
    sum(idx_blks_hit + idx_blks_read) as ratio
    FROM pg_statio_user_indexes
    union all
    SELECT
    'cache hit rate' as name,
    case sum(idx_blks_hit)
    when 0 then 'NaN'::numeric
    else to_char((sum(idx_blks_hit) -
    sum(idx_blks_read)) / sum(idx_blks_hit +
    idx_blks_read), '99.99')::numeric
    end as ratio
    FROM pg_statio_user_indexes)

    View Slide

  17. Cache hit rate
    name | ratio
    ----------------+------------------------
    cache hit rate | 0.99

    View Slide

  18. Index hit rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;

    View Slide

  19. Index hit rate
    relname | percent_of_times_index_used | rows_in_table
    ---------------------+-----------------------------+---------------
    events | 0 | 669917
    app_infos_user_info | 0 | 198218
    app_infos | 50 | 175640
    user_info | 3 | 46718
    rollouts | 0 | 34078
    favorites | 0 | 3059

    View Slide

  20. Rough guidelines
    Cache hit rate >= 99%
    Index hit rate >= 95%
    where on > 10,000 rows

    View Slide

  21. Understanding Specific
    Query Performance

    View Slide

  22. Understanding Specific
    Query Performance
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;

    View Slide

  23. Understanding Specific
    Query Performance
    EXPLAIN
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------
    ------
    Seq Scan on employees (cost=0.00..35811.00
    rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View Slide

  24. Understanding Specific
    Query Performance
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)

    View Slide

  25. Rough guidelines
    Page response times < 100 ms
    Common queries < 10ms
    Rare queries < 100ms

    View Slide

  26. Understanding Specific
    Query Performance
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1
    width=6) (actual time=2.401..295.247 rows=1428
    loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)

    View Slide

  27. Indexes
    CREATE INDEX idx_emps ON employees (salary);

    View Slide

  28. Indexes
    EXPLAIN ANALYZE
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    QUERY PLAN
    --------------------------------------------------
    Index Scan using idx_emps on employees
    (cost=0.00..8.49 rows=1 width=6) (actual time =
    0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View Slide

  29. pg_stat_statements
    $ select * from pg_stat_statements where query ~ 'from
    users where email';
    userid | 16384
    dbid | 16388
    query | select * from users where email = ?;
    calls | 2
    total_time | 0.000268
    rows | 2
    shared_blks_hit | 16
    shared_blks_read | 0
    shared_blks_dirtied | 0
    shared_blks_written | 0
    local_blks_hit | 0
    local_blks_read | 0
    local_blks_dirtied | 0
    local_blks_written | 0
    ...

    View Slide

  30. pg_stat_statements
    SELECT
    (total_time / 1000 / 60) as total,
    (total_time/calls) as avg,
    query
    FROM pg_stat_statements
    ORDER BY 1 DESC
    LIMIT 100;

    View Slide

  31. pg_stat_statements
    total | avg | query
    --------+--------+-------------------------
    295.76 | 10.13 | SELECT id FROM users...
    219.13 | 80.24 | SELECT * FROM ...
    (2 rows)

    View Slide

  32. Indexes

    View Slide

  33. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    Space Partitioned GIST (SP-GIST)
    Block Range Index (BRIN)

    View Slide

  34. Which do you use?

    View Slide

  35. BTree
    You usually want this

    View Slide

  36. Generalized Inverted Index (GIN)
    Use with multiple values in 1 column
    Array/hStore

    View Slide

  37. Generalized Search Tree (GIST)
    Full text search
    Shapes

    View Slide

  38. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    Space Partitioned GIST (SP-GIST)
    Block Range Index (BRIN)

    View Slide

  39. Other index tips

    View Slide

  40. Be specific
    • Composite indexes
    • Functional indexes
    • Conditional indexes

    View Slide

  41. Faster writes?
    Faster reads?

    View Slide

  42. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Return

    View Slide

  43. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Return

    View Slide

  44. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Update the other index
    6. Return

    View Slide

  45. Every write you make
    1. Figure out query plan
    2. Write to disk
    3. Wait for ack
    4. Update the index
    5. Update the other index
    6. Update the other other index
    7. Return

    View Slide

  46. Unused indexes?
    SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid))
    AS index_size,
    idx_scan as index_scans
    FROM pg_stat_user_indexes ui
    JOIN pg_index i ON ui.indexrelid = i.indexrelid
    WHERE NOT indisunique AND idx_scan < 50 AND
    pg_relation_size(relid) > 5 * 8192
    ORDER BY pg_relation_size(i.indexrelid) /
    nullif(idx_scan, 0) DESC NULLS FIRST,
    pg_relation_size(i.indexrelid) DESC;

    View Slide

  47. Unused indexes
    table | index | index_size | index_scans
    -------------------------+-----------------------------+------------+-------------
    public.formations | index_formations_on_user_id | 40 Gb | 30
    public.resources | slack_pool_created_at_idx | 10 Gb | 40
    (2 rows)

    View Slide

  48. Migrations
    not null is good
    except in a migration

    View Slide

  49. Proper migrations
    1 becomes 3:
    1. Allow nulls, set a default value though
    2. Gradually backfill old data
    3. Add your constraint

    View Slide

  50. Large scale

    View Slide

  51. Index creation
    CREATE INDEX CONCURRENTLY ...
    roughly 2-3x slower
    Doesn’t lock table

    View Slide

  52. Connection pooling
    Application/Framework layer
    Stand alone daemon

    View Slide

  53. Pooler options
    pgBouncer
    pgpool

    View Slide

  54. Scaling cache

    View Slide

  55. Scaling cache

    View Slide

  56. Offload read traffic
    to a replica

    View Slide

  57. Replication
    slony
    londiste
    bucardo
    pgpool
    wal-e
    barman

    View Slide

  58. Replication
    slony
    londiste
    bucardo
    pgpool
    wal-e/wal-g
    barman

    View Slide

  59. Sharding
    • Split up large tables
    • Split up data by customer
    • 1 database per customer
    • 1 schema per customer
    • Shard within your application

    View Slide

  60. Sharding
    • Scalable options
    • Split out large tables
    • Multi-tenant application level sharding

    View Slide

  61. Citus

    View Slide

  62. Citus

    View Slide

  63. Logical backup
    pg_dump
    can be human readable, is portable

    View Slide

  64. Physical backup
    The bytes on disk
    Base backup

    View Slide

  65. Logical
    Good across architectures
    Good for portability
    Has load on DB
    Works < 50 GB
    More initial setup
    Less portability
    Limited load on system
    Use above 50 GB
    Physical

    View Slide

  66. Bloat

    View Slide

  67. WITH constants AS (
    SELECT current_setting('block_size')::numeric AS bs, 23 AS hdr, 4 AS ma
    ), bloat_info AS (
    SELECT
    ma,bs,schemaname,tablename,
    (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
    (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
    FROM (
    SELECT
    schemaname, tablename, hdr, ma, bs,
    SUM((1-null_frac)*avg_width) AS datawidth,
    MAX(null_frac) AS maxfracsum,
    hdr+(
    SELECT 1+count(*)/8
    FROM pg_stats s2
    WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
    ) AS nullhdr
    FROM pg_stats s, constants
    GROUP BY 1,2,3,4,5
    ) AS foo
    ), table_bloat AS (
    SELECT
    schemaname, tablename, cc.relpages, bs,
    CEIL((cc.reltuples*((datahdr+ma-
    (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta
    FROM bloat_info
    JOIN pg_class cc ON cc.relname = bloat_info.tablename
    JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema'
    ), index_bloat AS (
    SELECT
    schemaname, tablename, bs,
    COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
    FROM bloat_info
    JOIN pg_class cc ON cc.relname = bloat_info.tablename
    JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema'
    JOIN pg_index i ON indrelid = cc.oid
    JOIN pg_class c2 ON c2.oid = i.indexrelid
    )
    SELECT
    type, schemaname, object_name, bloat, pg_size_pretty(raw_waste) as waste
    FROM
    (SELECT
    'table' as type,
    schemaname,
    tablename as object_name,
    ROUND(CASE WHEN otta=0 THEN 0.0 ELSE table_bloat.relpages/otta::numeric END,1) AS bloat,
    CASE WHEN relpages < otta THEN '0' ELSE (bs*(table_bloat.relpages-otta)::bigint)::bigint END AS raw_waste
    FROM
    table_bloat
    UNION
    SELECT
    'index' as type,
    schemaname,
    tablename || '::' || iname as object_name,
    ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS bloat,
    CASE WHEN ipages < iotta THEN '0' ELSE (bs*(ipages-iotta))::bigint END AS raw_waste
    FROM
    index_bloat) bloat_summary
    ORDER BY raw_waste DESC, bloat DESC
    Bloat

    View Slide

  68. Bloat
    type | schemaname | object_name | bloat | waste
    -------+------------+---------------------------------------------------+-------+------------
    index | public | pg_depend::pg_depend_reference_index | 2.1 | 336 kB
    index | public | pg_depend::pg_depend_depender_index | 1.9 | 272 kB
    table | public | users | 1.3 | 256 kB

    View Slide

  69. Recap

    View Slide

  70. Small
    • Leverage datatypes, it’s not a dumb hash
    • Test your backups
    • Take time to master your tools

    View Slide

  71. Medium
    • Tune Postgres well
    • Watch cache hit ratio
    • Be generous with indexes

    View Slide

  72. Large
    • Move away from pg_dump
    • Setup connection pooling
    • If you need to shard, invest in the right way or use a
    tool to help

    View Slide

  73. Thanks!
    Questions?

    View Slide