Slide 1

Slide 1 text

Postgres Performance for Humans @craigkerstiens

Slide 2

Slide 2 text

Shameless plugs http://www.postgresweekly.com! http://www.craigkerstiens.com! http://www.postgresguide.com! http://www.postgresapp.com! ! http://postgres.heroku.com!

Slide 3

Slide 3 text

Postgres It might help to explain that the pronunciation is "post-gres" or! "post-gres-cue-ell", not "post-gray-something".! ! I heard people making this same mistake in presentations at this! past weekend's Postgres Anniversary Conference :-( Arguably,! the 1996 decision to call it PostgreSQL instead of reverting to! plain Postgres was the single worst mistake this project ever made.! It seems far too late to change now, though.! ! ! ! ! regards, tom lane!

Slide 4

Slide 4 text

Postgres - TLDR

Slide 5

Slide 5 text

Postgres - TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum

Slide 6

Slide 6 text

TLDR in a quote http://www.craigkerstiens.com/2012/04/30/why-postgres/ “Its the emacs of databases”

Slide 7

Slide 7 text

OLTP vs OLAP

Slide 8

Slide 8 text

OLTP vs OLAP Web apps

Slide 9

Slide 9 text

OLTP vs OLAP BI/Reporting

Slide 10

Slide 10 text

Postgres Setup/Config On Amazon Use Heroku OR ‘postgresql when its not your dayjob’ Other clouds ‘postgresql when its not your dayjob’ Real hardware High performance PostgreSQL ! http://thebuild.com/blog/2012/06/04/postgresql-when-its-not-your-job-at-djangocon-europe/

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Cache rules everything around me

Slide 13

Slide 13 text

80/20 rule

Slide 14

Slide 14 text

Cache Hit Rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes)

Slide 15

Slide 15 text

Cache Hit Rate name | ratio ----------------+------------------------ cache hit rate | 0.99

Slide 16

Slide 16 text

Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC; ! ! !

Slide 17

Slide 17 text

Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0

Slide 18

Slide 18 text

Rough guidelines Cache hit rate >= 99% ! Index hit rate >= 95% where on > 10,000 rows

Slide 19

Slide 19 text

Shortcuts

Slide 20

Slide 20 text

$ cat ~/.psqlrc ! \set ON_ERROR_ROLLBACK interactive ! -- automatically switch between extended and normal \x auto ! -- always show how long a query takes \timing ! \set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' ! psql

Slide 21

Slide 21 text

$ cat ~/.psqlrc ! \set ON_ERROR_ROLLBACK interactive ! -- automatically switch between extended and normal \x auto ! -- always show how long a query takes \timing ! \set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' ! psql

Slide 22

Slide 22 text

How Data is Retrieved

Slide 23

Slide 23 text

Sequential Scanning Record 1 Record 2 Record 3 Record 4 Record 5 Record … Record 1 Record 2 Record 3 Record 4 Record 5 Record …

Slide 24

Slide 24 text

Index Scans A-F G-L M-R S-Z G H I J K L Record 57 Record … Record …

Slide 25

Slide 25 text

Sequential Scans Good for large reports ! Computing over lots of data (1k + rows) Index Scans Good for small results ! Most common queries in your app

Slide 26

Slide 26 text

Understanding Specific Query Performance

Slide 27

Slide 27 text

Understanding Query Performance ! SELECT last_name FROM employees WHERE salary >= 50000;

Slide 28

Slide 28 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)

Slide 29

Slide 29 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees width=6) Filter: (salary >= 50000) (3 rows) startup time max time rows return (cost=0.00..35811.00 rows=1

Slide 30

Slide 30 text

Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) ! Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379

Slide 31

Slide 31 text

Rough guidelines Page response times < 100 ms Common queries < 10ms Rare queries < 100ms

Slide 32

Slide 32 text

Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) ! Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379

Slide 33

Slide 33 text

# CREATE INDEX idx_emps ON employees (salary); Indexes!

Slide 34

Slide 34 text

Indexes! EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)

Slide 35

Slide 35 text

pg_stat_statements $ select * from pg_stat_statements where query ~ 'from users where email'; ! ! userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ...

Slide 36

Slide 36 text

SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; pg_stat_statements

Slide 37

Slide 37 text

total | avg | query --------+--------+------------------------- 295.76 | 10.13 | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) ! pg_stat_statements

Slide 38

Slide 38 text

Indexes

Slide 39

Slide 39 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)

Slide 40

Slide 40 text

Indexes Which do I use?

Slide 41

Slide 41 text

BTree ! ! ! This is what you usually want

Slide 42

Slide 42 text

Generalized Inverted Index (GIN) ! ! ! Use with multiple values in 1 column Array/hStore

Slide 43

Slide 43 text

Generalized Search Tree (GIST) ! ! ! Full text search Shapes

Slide 44

Slide 44 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) VODKA (Coming soon)

Slide 45

Slide 45 text

More indexes

Slide 46

Slide 46 text

Indexes Conditional Functional Concurrent creation

Slide 47

Slide 47 text

Conditional > SELECT * FROM places; ! name | population ----------------------------------- ACMAR | 6055 ARAB | 13650

Slide 48

Slide 48 text

Conditional > SELECT * FROM places WHERE population > 10000; ! name | population ----------------------------------- ARAB | 13650

Slide 49

Slide 49 text

Conditional > CREATE INDEX idx_large_population ON places(name) where population > 10000; !

Slide 50

Slide 50 text

Functional > SELECT * FROM places; ! data ----------------------------------- {"city": "ACMAR", "pop": 6055}! {"city": "ARAB", "pop": 13650}!

Slide 51

Slide 51 text

> SELECT * FROM places WHERE get_numeric('pop', data) > 10000; ! data ----------------------------------- {"city": "ARAB", "pop": 13650} Functional

Slide 52

Slide 52 text

> CREATE INDEX idx_large_population ON places(get_numeric('pop', data)); ! Functional

Slide 53

Slide 53 text

Conditional and Functional > CREATE INDEX idx_large_population ON places(data) WHERE get_numeric('pop', data) > 10000; !

Slide 54

Slide 54 text

CREATE INDEX CONCURRENTLY ... ! roughly 2-3x slower Doesn’t lock table One more thing

Slide 55

Slide 55 text

hstore / JSON

Slide 56

Slide 56 text

hstore CREATE EXTENSION hstore; CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone );

Slide 57

Slide 57 text

hstore INSERT INTO users VALUES ( 1, '[email protected]', 'sex => "M", state => “California”', now(), now() ); !

Slide 58

Slide 58 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON

Slide 59

Slide 59 text

hstore Indexes work gin gist

Slide 60

Slide 60 text

json Functional indexes work have fun

Slide 61

Slide 61 text

jsonb The world is better

Slide 62

Slide 62 text

Pooling

Slide 63

Slide 63 text

! Application/Framework layer ! Stand alone daemon Options

Slide 64

Slide 64 text

pgbouncer pgpool PG options

Slide 65

Slide 65 text

Adding Cache

Slide 66

Slide 66 text

Replication options slony londiste bucardo pgpool ! ! wal-e barman

Slide 67

Slide 67 text

Replication options slony londiste bucardo pgpool ! ! wal-e barman

Slide 68

Slide 68 text

Backups

Slide 69

Slide 69 text

Logical pg_dump ! can be human readable, is portable

Slide 70

Slide 70 text

Physical The bytes on disk ! Base backup

Slide 71

Slide 71 text

Logical Good across architectures Good for portability ! Has load on DB ! Works < 50 GB Physical More initial setup Less portability ! Limited load on system ! Use above 50 GB

Slide 72

Slide 72 text

Recap

Slide 73

Slide 73 text

OLAP ! Whole other talk Disk IO is important Order on disk is helpful (pg-reorg) MPP solutions on top of Postgres Recap

Slide 74

Slide 74 text

OLTP (webapps) Ensure bulk of data is cache Optimize overall query load with pg_stat_statements Efficient use of indexes When cache sucks, throw more at it Recap

Slide 75

Slide 75 text

Questions http://www.speakerdeck.com/u/ craigkerstiens/!