Slide 1

Slide 1 text

Postgres at any scale Craig Kerstiens @ Citus Data

Slide 2

Slide 2 text

Who am I? • Craig Kerstiens • Run fully managed database-as-a-service • Curate Postgres Weekly • Blog at craigkerstiens.com • Previously Heroku, Truviso, Accenture

Slide 3

Slide 3 text

Itty bitty little data

Slide 4

Slide 4 text

Database is just a hash

Slide 5

Slide 5 text

Postgres • Datatypes • GIS • Index types • Full text search • JSONB

Slide 6

Slide 6 text

Datatypes • Varchar(255) • Timestamp vs. Timestamptz • JSON vs. JSONB

Slide 7

Slide 7 text

Constraints • Django/SQL Alchemy good by default • Is this string really under 255 characters? • Is the IP address really an IP address? • Does the foreign key exist? • What about if you delete a user, do you clean up the rest of the records?

Slide 8

Slide 8 text

Backups Backups let you 1. recover data, but also 2. verify quality of your data Backups don’t exist unless you test them

Slide 9

Slide 9 text

Medium

Slide 10

Slide 10 text

Configure your system • Setup your logs • Configure your memory • Tweak vacuum • Adjust checkpoints

Slide 11

Slide 11 text

Don’t learn it all http://pgtune.leopard.in.ua/

Slide 12

Slide 12 text

Cache rules everything around me

Slide 13

Slide 13 text

Cache hit rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes)

Slide 14

Slide 14 text

Cache hit rate name | ratio ----------------+------------------------ cache hit rate | 0.99

Slide 15

Slide 15 text

Index hit rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;

Slide 16

Slide 16 text

Index hit rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059

Slide 17

Slide 17 text

Rough guidelines Cache hit rate >= 99% Index hit rate >= 95% where on > 10,000 rows

Slide 18

Slide 18 text

Understanding Specific Query Performance

Slide 19

Slide 19 text

Understanding Specific Query Performance SELECT last_name FROM employees WHERE salary >= 50000;

Slide 20

Slide 20 text

Understanding Specific Query Performance EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------- ------ Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)

Slide 21

Slide 21 text

Understanding Specific Query Performance EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)

Slide 22

Slide 22 text

Rough guidelines Page response times < 100 ms Common queries < 10ms Rare queries < 100ms

Slide 23

Slide 23 text

Understanding Specific Query Performance EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)

Slide 24

Slide 24 text

Indexes CREATE INDEX idx_emps ON employees (salary);

Slide 25

Slide 25 text

Indexes EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)

Slide 26

Slide 26 text

pg_stat_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ...

Slide 27

Slide 27 text

pg_stat_statements SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;

Slide 28

Slide 28 text

pg_stat_statements total | avg | query --------+--------+------------------------- 295.76 | 10.13 | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows)

Slide 29

Slide 29 text

Indexes

Slide 30

Slide 30 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) Block Range Index (BRIN)

Slide 31

Slide 31 text

Which do you use?

Slide 32

Slide 32 text

BTree You usually want this

Slide 33

Slide 33 text

Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore

Slide 34

Slide 34 text

Generalized Search Tree (GIST) Full text search Shapes

Slide 35

Slide 35 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) Block Range Index (BRIN)

Slide 36

Slide 36 text

Other index tips

Slide 37

Slide 37 text

Be specific • Composite indexes • Functional indexes • Conditional indexes

Slide 38

Slide 38 text

Faster writes? Faster reads?

Slide 39

Slide 39 text

Every write you make 1. Figure out query plan 2. Write to disk 3. Wait for ack 4. Return

Slide 40

Slide 40 text

Every write you make 1. Figure out query plan 2. Write to disk 3. Wait for ack 4. Update the index 5. Return

Slide 41

Slide 41 text

Every write you make 1. Figure out query plan 2. Write to disk 3. Wait for ack 4. Update the index 5. Update the other index 6. Return

Slide 42

Slide 42 text

Every write you make 1. Figure out query plan 2. Write to disk 3. Wait for ack 4. Update the index 5. Update the other index 6. Update the other other index 7. Return

Slide 43

Slide 43 text

Unused indexes? SELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;

Slide 44

Slide 44 text

Unused indexes table | index | index_size | index_scans -------------------------+-----------------------------+------------+------------- public.formations | index_formations_on_user_id | 40 Gb | 30 public.resources | slack_pool_created_at_idx | 10 Gb | 40 (2 rows)

Slide 45

Slide 45 text

Migrations not null is good except in a migration

Slide 46

Slide 46 text

Proper migrations 1 becomes 3: 1. Allow nulls, set a default value though 2. Gradually backfill old data 3. Add your constraint

Slide 47

Slide 47 text

Large scale

Slide 48

Slide 48 text

Index creation CREATE INDEX CONCURRENTLY ... roughly 2-3x slower Doesn’t lock table

Slide 49

Slide 49 text

Connection pooling Application/Framework layer Stand alone daemon

Slide 50

Slide 50 text

Pooler options pgBouncer pgpool

Slide 51

Slide 51 text

Scaling cache

Slide 52

Slide 52 text

Scaling cache

Slide 53

Slide 53 text

Offload read traffic to a replica

Slide 54

Slide 54 text

Replication slony londiste bucardo pgpool wal-e barman

Slide 55

Slide 55 text

Replication slony londiste bucardo pgpool wal-e/wal-g barman

Slide 56

Slide 56 text

Sharding • Split up large tables • Split up data by customer • 1 database per customer • 1 schema per customer • Shard within your application

Slide 57

Slide 57 text

Sharding • Scalable options • Split out large tables • Multi-tenant application level sharding

Slide 58

Slide 58 text

Citus

Slide 59

Slide 59 text

Citus

Slide 60

Slide 60 text

Logical backup pg_dump can be human readable, is portable

Slide 61

Slide 61 text

Physical backup The bytes on disk Base backup

Slide 62

Slide 62 text

Logical Good across architectures Good for portability Has load on DB Works < 50 GB More initial setup Less portability Limited load on system Use above 50 GB Physical

Slide 63

Slide 63 text

Recap

Slide 64

Slide 64 text

Small • Leverage datatypes, it’s not a dumb hash • Test your backups • Take time to

Slide 65

Slide 65 text

Medium • Tune Postgres well • Watch cache hit ratio • Be generous with indexes

Slide 66

Slide 66 text

Large • Move away from pg_dump • Setup connection pooling • If you need to shard, invest in the right way or use a tool to help

Slide 67

Slide 67 text

Thanks! Questions?