Postgres at any scale | PyCon CA | Craig Kerstiens

024d6a0dd14fb31c804969a57a06dfbe?s=47 Citus Data
November 18, 2017

Postgres at any scale | PyCon CA | Craig Kerstiens

We'll start with the basics you need to know as an app developer about interacting with your database, then dig into how you can start to analyze performance. We'll look at things you need to know for a small application, then the things you should be cautious of as you start to layer in other items you need to be aware of for performance.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

November 18, 2017
Tweet

Transcript

  1. Postgres at any scale Craig Kerstiens @ Citus Data

  2. Who am I? • Craig Kerstiens • Run fully managed

    database-as-a-service • Curate Postgres Weekly • Blog at craigkerstiens.com • Previously Heroku, Truviso, Accenture
  3. Itty bitty little data

  4. Database is just a hash

  5. Why Postgres?

  6. “It’s the emacs of databases” @craigkerstiens

  7. Postgres • Datatypes • GIS • Index types • Full

    text search • JSONB
  8. Datatypes • Varchar(255) • Timestamp vs. Timestamptz • JSON vs.

    JSONB
  9. Constraints • Django/SQL Alchemy good by default • Is this

    string really under 255 characters? • Is the IP address really an IP address? • Does the foreign key exist? • What about if you delete a user, do you clean up the rest of the records?
  10. Master your tools psql is great, leverage it psqlrc (\x

    auto, \timing, saving history)
  11. Backups Backups let you 1. recover data, but also 2.

    verify quality of your data Backups don’t exist unless you test them
  12. Medium

  13. Configure your system • Setup your logs • Configure your

    memory • Tweak vacuum • Adjust checkpoints
  14. Don’t learn it all http://pgtune.leopard.in.ua/

  15. Cache rules everything around me

  16. Cache hit rate SELECT 'index hit rate' as name, (sum(idx_blks_hit)

    - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes)
  17. Cache hit rate name | ratio ----------------+------------------------ cache hit rate

    | 0.99
  18. Index hit rate SELECT relname, 100 * idx_scan / (seq_scan

    + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
  19. Index hit rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events

    | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059
  20. Rough guidelines Cache hit rate >= 99% Index hit rate

    >= 95% where on > 10,000 rows
  21. Understanding Specific Query Performance

  22. Understanding Specific Query Performance SELECT last_name FROM employees WHERE salary

    >= 50000;
  23. Understanding Specific Query Performance EXPLAIN SELECT last_name FROM employees WHERE

    salary >= 50000; QUERY PLAN -------------------------------------------- ------ Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  24. Understanding Specific Query Performance EXPLAIN ANALYZE SELECT last_name FROM employees

    WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  25. Rough guidelines Page response times < 100 ms Common queries

    < 10ms Rare queries < 100ms
  26. Understanding Specific Query Performance EXPLAIN ANALYZE SELECT last_name FROM employees

    WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  27. Indexes CREATE INDEX idx_emps ON employees (salary);

  28. Indexes EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >=

    50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  29. pg_stat_statements $ select * from pg_stat_statements where query ~ 'from

    users where email'; userid | 16384 dbid | 16388 query | select * from users where email = ?; calls | 2 total_time | 0.000268 rows | 2 shared_blks_hit | 16 shared_blks_read | 0 shared_blks_dirtied | 0 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 ...
  30. pg_stat_statements SELECT (total_time / 1000 / 60) as total, (total_time/calls)

    as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;
  31. pg_stat_statements total | avg | query --------+--------+------------------------- 295.76 | 10.13

    | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows)
  32. Indexes

  33. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    Space Partitioned GIST (SP-GIST) Block Range Index (BRIN)
  34. Which do you use?

  35. BTree You usually want this

  36. Generalized Inverted Index (GIN) Use with multiple values in 1

    column Array/hStore
  37. Generalized Search Tree (GIST) Full text search Shapes

  38. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    Space Partitioned GIST (SP-GIST) Block Range Index (BRIN)
  39. Other index tips

  40. Be specific • Composite indexes • Functional indexes • Conditional

    indexes
  41. Faster writes? Faster reads?

  42. Every write you make 1. Figure out query plan 2.

    Write to disk 3. Wait for ack 4. Return
  43. Every write you make 1. Figure out query plan 2.

    Write to disk 3. Wait for ack 4. Update the index 5. Return
  44. Every write you make 1. Figure out query plan 2.

    Write to disk 3. Wait for ack 4. Update the index 5. Update the other index 6. Return
  45. Every write you make 1. Figure out query plan 2.

    Write to disk 3. Wait for ack 4. Update the index 5. Update the other index 6. Update the other other index 7. Return
  46. Unused indexes? SELECT schemaname || '.' || relname AS table,

    indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;
  47. Unused indexes table | index | index_size | index_scans -------------------------+-----------------------------+------------+-------------

    public.formations | index_formations_on_user_id | 40 Gb | 30 public.resources | slack_pool_created_at_idx | 10 Gb | 40 (2 rows)
  48. Migrations not null is good except in a migration

  49. Proper migrations 1 becomes 3: 1. Allow nulls, set a

    default value though 2. Gradually backfill old data 3. Add your constraint
  50. Large scale

  51. Index creation CREATE INDEX CONCURRENTLY ... roughly 2-3x slower Doesn’t

    lock table
  52. Connection pooling Application/Framework layer Stand alone daemon

  53. Pooler options pgBouncer pgpool

  54. Scaling cache

  55. Scaling cache

  56. Offload read traffic to a replica

  57. Replication slony londiste bucardo pgpool wal-e barman

  58. Replication slony londiste bucardo pgpool wal-e/wal-g barman

  59. Sharding • Split up large tables • Split up data

    by customer • 1 database per customer • 1 schema per customer • Shard within your application
  60. Sharding • Scalable options • Split out large tables •

    Multi-tenant application level sharding
  61. Citus

  62. Citus

  63. Logical backup pg_dump can be human readable, is portable

  64. Physical backup The bytes on disk Base backup

  65. Logical Good across architectures Good for portability Has load on

    DB Works < 50 GB More initial setup Less portability Limited load on system Use above 50 GB Physical
  66. Bloat

  67. WITH constants AS ( SELECT current_setting('block_size')::numeric AS bs, 23 AS

    hdr, 4 AS ma ), bloat_info AS ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, constants GROUP BY 1,2,3,4,5 ) AS foo ), table_bloat AS ( SELECT schemaname, tablename, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta FROM bloat_info JOIN pg_class cc ON cc.relname = bloat_info.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema' ), index_bloat AS ( SELECT schemaname, tablename, bs, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM bloat_info JOIN pg_class cc ON cc.relname = bloat_info.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema' JOIN pg_index i ON indrelid = cc.oid JOIN pg_class c2 ON c2.oid = i.indexrelid ) SELECT type, schemaname, object_name, bloat, pg_size_pretty(raw_waste) as waste FROM (SELECT 'table' as type, schemaname, tablename as object_name, ROUND(CASE WHEN otta=0 THEN 0.0 ELSE table_bloat.relpages/otta::numeric END,1) AS bloat, CASE WHEN relpages < otta THEN '0' ELSE (bs*(table_bloat.relpages-otta)::bigint)::bigint END AS raw_waste FROM table_bloat UNION SELECT 'index' as type, schemaname, tablename || '::' || iname as object_name, ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS bloat, CASE WHEN ipages < iotta THEN '0' ELSE (bs*(ipages-iotta))::bigint END AS raw_waste FROM index_bloat) bloat_summary ORDER BY raw_waste DESC, bloat DESC Bloat
  68. Bloat type | schemaname | object_name | bloat | waste

    -------+------------+---------------------------------------------------+-------+------------ index | public | pg_depend::pg_depend_reference_index | 2.1 | 336 kB index | public | pg_depend::pg_depend_depender_index | 1.9 | 272 kB table | public | users | 1.3 | 256 kB …
  69. Recap

  70. Small • Leverage datatypes, it’s not a dumb hash •

    Test your backups • Take time to master your tools
  71. Medium • Tune Postgres well • Watch cache hit ratio

    • Be generous with indexes
  72. Large • Move away from pg_dump • Setup connection pooling

    • If you need to shard, invest in the right way or use a tool to help
  73. Thanks! Questions?