Slide 1

Slide 1 text

Postgres Performance for Humans @craigkerstiens citusdata.com | @citusdata

Slide 2

Slide 2 text

Shameless plugs http://www.postgresweekly.com http://www.craigkerstiens.com http://www.postgresguide.com http://www.postgresapp.com http://www.citusdata.com

Slide 3

Slide 3 text

Postgres - TLDR

Slide 4

Slide 4 text

Postgres - TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions JSONB Momentum

Slide 5

Slide 5 text

TLDR in a quote http://www.craigkerstiens.com/2012/04/30/why-postgres/ “It’s the emacs of databases”

Slide 6

Slide 6 text

OLTP vs OLAP

Slide 7

Slide 7 text

OLTP vs OLAP Web apps

Slide 8

Slide 8 text

OLTP vs OLAP BI/Reporting

Slide 9

Slide 9 text

Postgres Setup/Config On Amazon Use RDS, Heroku, Citus OR ‘postgresql when its not your dayjob’ Other clouds ‘postgresql when its not your dayjob’ Real hardware High performance PostgreSQL http://thebuild.com/blog/2012/06/04/postgresql-when-its-not-your-job-at-djangocon-europe/

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Cache rules everything around me

Slide 12

Slide 12 text

Cache Hit Rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes)

Slide 13

Slide 13 text

Cache Hit Rate name | ratio ----------------+------------------------ cache hit rate | 0.99

Slide 14

Slide 14 text

Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;

Slide 15

Slide 15 text

Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059

Slide 16

Slide 16 text

Rough guidelines Cache hit rate >= 99% Index hit rate >= 95% where on > 10,000 rows

Slide 17

Slide 17 text

Shortcuts

Slide 18

Slide 18 text

$ cat ~/.psqlrc \set ON_ERROR_ROLLBACK interactive -- automatically switch between extended and normal \x auto -- always show how long a query takes \timing \set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' psql

Slide 19

Slide 19 text

$ cat ~/.psqlrc \set ON_ERROR_ROLLBACK interactive -- automatically switch between extended and normal \x auto -- always show how long a query takes \timing \set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' psql

Slide 20

Slide 20 text

Understanding Specific Query Performance

Slide 21

Slide 21 text

Understanding Query Performance SELECT last_name FROM employees WHERE salary >= 50000;

Slide 22

Slide 22 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)

Slide 23

Slide 23 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees width=6) Filter: (salary >= 50000) (3 rows) startup time max time rows return (cost=0.00..35811.00 rows=1

Slide 24

Slide 24 text

Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379

Slide 25

Slide 25 text

Rough guidelines Page response times < 100 ms Common queries < 10ms Rare queries < 100ms

Slide 26

Slide 26 text

Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379

Slide 27

Slide 27 text

# CREATE INDEX idx_emps ON employees (salary); Indexes!

Slide 28

Slide 28 text

Indexes! EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)

Slide 29

Slide 29 text

pg_stat_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ...

Slide 30

Slide 30 text

SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; pg_stat_statements

Slide 31

Slide 31 text

total | avg | query --------+--------+------------------------- 295.76 | 10.13 | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) pg_stat_statements

Slide 32

Slide 32 text

Indexes

Slide 33

Slide 33 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) Space Partitioned GIST (SP-GIST) Block Range (BRIN)

Slide 34

Slide 34 text

Indexes Which do I use?

Slide 35

Slide 35 text

BTree This is what you usually want

Slide 36

Slide 36 text

Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore/JSONB

Slide 37

Slide 37 text

Generalized Search Tree (GIST) Full text search Shapes

Slide 38

Slide 38 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) Space Partitioned GIST (SP-GIST) Block Range (BRIN) VODKA (Coming soon)

Slide 39

Slide 39 text

More indexes

Slide 40

Slide 40 text

Indexes Conditional Functional Concurrent creation

Slide 41

Slide 41 text

Conditional > SELECT * FROM places; name | population ----------------------------------- ACMAR | 6055 ARAB | 13650

Slide 42

Slide 42 text

Conditional > SELECT * FROM places WHERE population > 10000; name | population ----------------------------------- ARAB | 13650

Slide 43

Slide 43 text

Conditional > CREATE INDEX idx_large_population ON places(name) where population > 10000;

Slide 44

Slide 44 text

Functional > SELECT * FROM places; data ----------------------------------- {"city": "ACMAR", "pop": 6055} {"city": "ARAB", "pop": 13650}

Slide 45

Slide 45 text

> SELECT * FROM places WHERE get_numeric('pop', data) > 10000; data ----------------------------------- {"city": "ARAB", "pop": 13650} Functional

Slide 46

Slide 46 text

> CREATE INDEX idx_large_population ON places(get_numeric('pop', data)); Functional

Slide 47

Slide 47 text

Conditional and Functional > CREATE INDEX idx_large_population ON places(data) WHERE get_numeric('pop', data) > 10000;

Slide 48

Slide 48 text

CREATE INDEX CONCURRENTLY ... roughly 2-3x slower Doesn’t lock table One more thing

Slide 49

Slide 49 text

hstore / JSON / JSONB

Slide 50

Slide 50 text

hstore / JSON / JSONB

Slide 51

Slide 51 text

JSONB CREATE TABLE users ( id integer NOT NULL, email character varying(255), data jsonb, created_at timestamp without time zone, last_login timestamp without time zone );

Slide 52

Slide 52 text

SELECT '{"id":1,"email": “[email protected]",}'::jsonb; SELECT * FROM talks WHERE data->’conference’ = ‘AllThingsOpen’; JSONB

Slide 53

Slide 53 text

JSONB Indexes work gin gist

Slide 54

Slide 54 text

Pooling

Slide 55

Slide 55 text

Application/Framework layer Stand alone daemon Options

Slide 56

Slide 56 text

pgbouncer pgpool PG options

Slide 57

Slide 57 text

Backups

Slide 58

Slide 58 text

Logical pg_dump can be human readable, is portable

Slide 59

Slide 59 text

Physical The bytes on disk Base backup

Slide 60

Slide 60 text

Logical Good across architectures Good for portability Has load on DB Works < 50 GB Physical More initial setup Less portability Limited load on system Use above 50 GB

Slide 61

Slide 61 text

Adding Cache

Slide 62

Slide 62 text

Increase cache with cash

Slide 63

Slide 63 text

Vertical scaling

Slide 64

Slide 64 text

Vertical scaling

Slide 65

Slide 65 text

Horizontal scaling Reads to a replica

Slide 66

Slide 66 text

Replication slony londiste bucardo pgpool wal-e barman

Slide 67

Slide 67 text

Replication slony londiste bucardo pgpool wal-e barman

Slide 68

Slide 68 text

Horizontal scaling Reads to a replica Split up large tables Split up data by customer • 1 database per customer • 1 schema per customer Shard within your application

Slide 69

Slide 69 text

Horizontal scaling Use something like Citus

Slide 70

Slide 70 text

Recap

Slide 71

Slide 71 text

OLAP Whole other talk Disk IO is important Order on disk is helpful (pg-reorg) MPP solutions on top of Postgres Recap

Slide 72

Slide 72 text

OLTP (webapps) Ensure bulk of data is cache Optimize overall query load with pg_stat_statements Efficient use of indexes When cache sucks, throw more at it Recap

Slide 73

Slide 73 text

Thanks! Questions? http://www.speakerdeck.com/u/craigkerstiens/ @craigkerstiens www.citusdata.com | @citusdata