Postgres Demystified - Speaker Deck

Slide 1

Slide 1 text

P!"#r$% D$&'%"()$* @+r,(#-$r%"($.%

Slide 2

Slide 2 text

Postgres.app PSA: Macs

Slide 3

Slide 3 text

PSA #2 http://postgresweekly.com

Slide 4

Slide 4 text

CVE 2013-1899 UPGRADE PSA #3

Slide 5

Slide 5 text

Agenda Brief History Developing w/ Postgres Postgres Performance Querying

Slide 6

Slide 6 text

Postgres History Postgres PostgreSQL Post Ingres Around since 1989/1995 Community Driven/Owned

Slide 7

Slide 7 text

Postgres It might help to explain that the pronunciation is "post-gres" or "post-gres-cue-ell", not "post-gray-something". I heard people making this same mistake in presentations at this past weekend's Postgres Anniversary Conference :-( Arguably, the 1996 decision to call it PostgreSQL instead of reverting to plain Postgres was the single worst mistake this project ever made. It seems far too late to change now, though. ! ! ! regards, tom lane

Slide 8

Slide 8 text

Postgres History Postgres PostgreSQL Post Ingres Around since 1989/1995 Community Driven/Owned

Slide 9

Slide 9 text

MVCC Each query sees transactions committed before it Locks for writing don’t conflict with reading

Slide 10

Slide 10 text

Why Postgres http://www.craigkerstiens.com/2012/04/30/why-postgres/ “Its the emacs of databases”

Slide 11

Slide 11 text

Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum TLDR

Slide 12

Slide 12 text

Developing

Slide 13

Slide 13 text

its your friend # \dt # \d # \d tablename # \x # \e psql

Slide 14

Slide 14 text

D,","/$%

Slide 15

Slide 15 text

Datatypes smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Arrays CREATE TABLE item ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );

Slide 20

Slide 20 text

Arrays CREATE TABLE item ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );

Slide 21

Slide 21 text

Arrays INSERT INTO item VALUES (1, 'Django Pony', '{“Programming”,”Animal”}', now()); INSERT INTO item VALUES (2, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

Slide 22

Slide 22 text

Arrays INSERT INTO item VALUES (1, 'Django Pony', '{“Programming”,”Animal”}', now()); INSERT INTO item VALUES (2, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

Slide 23

Slide 23 text

CREATE TABLE talks ( room int, during tsrange ); INSERT INTO talks VALUES ( 3, '[2013-04-04 13:00, 2013-04-04 13:50)' ); Range Types

Slide 24

Slide 24 text

CREATE TABLE talks ( room int, during tsrange ); INSERT INTO talks VALUES ( 3, '[2013-04-04 13:00, 2013-04-04 13:50)' ); Range Types

Slide 25

Slide 25 text

CREATE TABLE talks ( room int, during tsrange ); INSERT INTO talks VALUES ( 3, '[2013-04-04 13:00, 2013-04-04 13:50)' ); Range Types

Slide 26

Slide 26 text

ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&); INSERT INTO talks VALUES ( 3, '[2013-04-04 13:30, 2013-04-04 14:00)' ); ERROR: conflicting key value violates exclusion constraint "talks_during_excl" Range Types

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

0"$.%(1.%

Slide 31

Slide 31 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 32

Slide 32 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 33

Slide 33 text

NoSQL in your SQL

Slide 34

Slide 34 text

NoSQL in your SQL CREATE EXTENSION hstore; CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone );

Slide 35

Slide 35 text

Slide 36

Slide 36 text

hStore INSERT INTO users VALUES ( 1, '[email protected]', 'sex => "M", state => “California”', now(), now() );

Slide 37

Slide 37 text

hStore INSERT INTO users VALUES ( 1, '[email protected]', 'sex => "M", state => “California”', now(), now() );

Slide 38

Slide 38 text

JSON

Slide 39

Slide 39 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON

Slide 40

Slide 40 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8

Slide 41

Slide 41 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8 create or replace function js(src text) returns text as $$ return eval( "(function() { " + src + "})" )(); $$ LANGUAGE plv8;

Slide 42

Slide 42 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8 create or replace function js(src text) returns text as $$ return eval( "(function() { " + src + "})" )(); $$ LANGUAGE plv8; JS Injection in DB: Bad Idea

Slide 43

Slide 43 text

Full Text Search PostGIS

Slide 44

Slide 44 text

Performance

Slide 45

Slide 45 text

Sequential Scans

Slide 46

Slide 46 text

Sequential Scans They’re Bad

Slide 47

Slide 47 text

Sequential Scans They’re Bad (most of the time)

Slide 48

Slide 48 text

Indexes

Slide 49

Slide 49 text

Indexes They’re Good

Slide 50

Slide 50 text

Indexes They’re Good (most of the time)

Slide 51

Slide 51 text

I.*2$%

Slide 52

Slide 52 text

Indexes B-Tree Gin Gist KNN SP-Gist

Slide 53

Slide 53 text

Indexes Which do I use?

Slide 54

Slide 54 text

Indexes B-Tree Default Usually want this

Slide 55

Slide 55 text

Indexes Gin User w/ multiple values 1 column hstore/array

Slide 56

Slide 56 text

Indexes Gist Full text search Shapes GIS

Slide 57

Slide 57 text

Indexes B-Tree Gin Gist KNN SP-Gist

Slide 58

Slide 58 text

U.*$r%",.*(.# P$rf1r&,.+$

Slide 59

Slide 59 text

Understanding Query Performance

Slide 60

Slide 60 text

Understanding Query Performance SELECT last_name FROM employees WHERE salary >= 50000;

Slide 61

Slide 61 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)

Slide 62

Slide 62 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees width=6) Filter: (salary >= 50000) (3 rows) startup time max time rows return (cost=0.00..35811.00 rows=1

Slide 63

Slide 63 text

Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379

Slide 64

Slide 64 text

Rough guidelines Rare queries < 100ms Common queries < 10 ms

Slide 65

Slide 65 text

Slide 66

Slide 66 text

# CREATE INDEX idx_emps ON employees (salary); EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows) Indexes!

Slide 67

Slide 67 text

Slide 68

Slide 68 text

Slide 69

Slide 69 text

Slide 70

Slide 70 text

Pro Tips

Slide 71

Slide 71 text

Pro Tips CREATE INDEX CONCURRENTLY

Slide 72

Slide 72 text

Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar

Slide 73

Slide 73 text

Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD

Slide 74

Slide 74 text

Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD SELECT * WHERE Food LIKE ‘bar%’ is OKAY

Slide 75

Slide 75 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 76

Slide 76 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

Cache Hit Rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes;)

Slide 79

Slide 79 text

Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;

Slide 80

Slide 80 text

Index Hit Rate relname | percent_of_used | rows_in_table ---------------------+-----------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0

Slide 81

Slide 81 text

Rough guidelines Cache hit > 99% Index hit > 95% Indexes on > 10k rows

Slide 82

Slide 82 text

pg_stat_statements

Slide 83

Slide 83 text

pg_stat_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 temp_blks_read │ 0 temp_blks_written │ 0 time_read │ 0 time_write │ 0

Slide 84

Slide 84 text

Slide 85

Slide 85 text

SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; pg_stat_statements

Slide 86

Slide 86 text

total | avg | query --------+--------+------------------------- 295.76 | 10.13 | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) pg_stat_statements

Slide 87

Slide 87 text

https://github.com/will/datascope

Slide 88

Slide 88 text

heroku-pg-extras heroku pg:cache_hit heroku pg:index_hit heroku pg:ps heroku pg:locks heroku pg:kill heroku pg:index_size heroku pg:unused_indexes heroku pg:seq_scans heroku pg:mandelbrot

Slide 89

Slide 89 text

Querying

Slide 90

Slide 90 text

W(.*1w F3.+"(1.%

Slide 91

Slide 91 text

Window Functions SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2;

Slide 92

Slide 92 text

Slide 93

Slide 93 text

dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent Extensions

Slide 94

Slide 94 text

Fuzzy String Match SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will'); SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');

Slide 95

Slide 95 text

Moving Data Around \copy (SELECT * FROM users) TO ‘~/ users.csv’; \copy users FROM ‘~/users.csv’;

Slide 96

Slide 96 text

dblink SELECT dblink_connect('myconn', 'dbname=postgres'); SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text); a | b -------+------------ 1 | example 2 | example2

Slide 97

Slide 97 text

Foreign Data Wrappers oracle mysql informix twitter files www couch sybase ldap odbc s3 redis jdbc mongodb

Slide 98

Slide 98 text

Foreign Data Wrappers CREATE EXTENSION redis_fdw; CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address '127.0.0.1', port '6379'); CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database '0'); CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password 'secret');

Slide 99

Slide 99 text

Redis in my Postgres SELECT * FROM redis_db0 SELECT id, email, value as visits FROM users, redis_db0 WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 40;

Slide 100

Slide 100 text

Redis in my Postgres id | email | visits ----+----------------------------+-------- 2 | [email protected] | 48 16 | [email protected] | 48 18 | [email protected] | 44 31 | [email protected] | 46 6 | [email protected] | 41 12 | [email protected] | 49 44 | [email protected] | 44 47 | [email protected] | 44 39 | [email protected] | 47 40 | [email protected] | 44 46 | [email protected] | 44 14 | [email protected] | 47

Slide 101

Slide 101 text

CTEs – Common Table Expressions Commonly “With clauses” Views within a specific query Readability (CTEs)

Slide 102

Slide 102 text

Readability (CTEs) WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;

Slide 103

Slide 103 text

Slide 104

Slide 104 text

Slide 105

Slide 105 text

Few More Things

Slide 106

Slide 106 text

Postgresql-hll

Slide 107

Slide 107 text

Postgresql-hll KMV Bit pattern observables Stochastic Averaging Harmonic Averaging

Slide 108

Slide 108 text

Postgresql-hll Uniques & Big data

Slide 109

Slide 109 text

Postgresql-hll CREATE EXTENSION hll; CREATE TABLE daily_unique_purchases ( date date unique, users hll ); INSERT INTO daily_unique_purchases (date, users) SELECT occurred_at::date, hll_add_agg(hll_hash_integer(user_id)) FROM purchases GROUP BY 1;

Slide 110

Slide 110 text

Slide 111

Slide 111 text

Slide 112

Slide 112 text

Slide 113

Slide 113 text

Postgresql-hll SELECT date, hll_cardinality(users) FROM daily_unique_purchases ; SELECT EXTRACT(MONTH FROM date) AS month, hll_cardinality(hll_union_agg(users)) FROM daily_unique_purchases WHERE date >= '2012-01-01' AND date < '2013-01-01' GROUP BY 1;

Slide 114

Slide 114 text

Listen/Notify Per Transaction Synchronous Replication SELECT for UPDATE Extras

Slide 115

Slide 115 text

Native in Ruby Full text search Upsert Listen/notify hstore arrays pg_search upsert queue_classic sequel sequel