Slide 1

Slide 1 text

Postgres Demystified Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

Slide 2

Slide 2 text

Postgres Demystified

Slide 3

Slide 3 text

Postgres Demystified

Slide 4

Slide 4 text

Getting Setup Postgres.app

Slide 5

Slide 5 text

Agenda Brief History Developing w/ Postgres Postgres Performance Querying

Slide 6

Slide 6 text

Postgres History Postgres PostgresQL Post Ingress Around since 1989/1995 Community Driven/Owned

Slide 7

Slide 7 text

MVCC Each query sees transactions committed before it Locks for writing don’t conflict with reading

Slide 8

Slide 8 text

Why Postgres

Slide 9

Slide 9 text

Why Postgres “ its the emacs of databases”

Slide 10

Slide 10 text

Developing w/ Postgres

Slide 11

Slide 11 text

Basics psql is your friend

Slide 12

Slide 12 text

Datatypes smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID

Slide 13

Slide 13 text

smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID Datatypes

Slide 14

Slide 14 text

smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID Datatypes

Slide 15

Slide 15 text

Datatypes smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID

Slide 16

Slide 16 text

CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp ); Datatypes

Slide 17

Slide 17 text

CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp ); Datatypes

Slide 18

Slide 18 text

CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp ); Datatypes

Slide 19

Slide 19 text

Datatypes INSERT INTO items VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now()); INSERT INTO items VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());

Slide 20

Slide 20 text

Datatypes smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID

Slide 21

Slide 21 text

Datatypes wish list email url phone zip

Slide 22

Slide 22 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 23

Slide 23 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 24

Slide 24 text

CREATE EXTENSION hstore; CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone ); NoSQL in your SQL

Slide 25

Slide 25 text

INSERT INTO users VALUES ( 1, '[email protected]', 'sex => "M", state => “California”', now(), now() ); hStore

Slide 26

Slide 26 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8 9.2

Slide 27

Slide 27 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8 9.2

Slide 28

Slide 28 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON V8 w/ PLV8 9.2 Bad Idea

Slide 29

Slide 29 text

SELECT '{"id":1,"email": "[email protected]",}'::json; JSON create or replace function js(src text) returns text as $$ return eval( "(function() { " + src + "})" )(); $$ LANGUAGE plv8; V8 w/ PLV8 9.2 Bad Idea

Slide 30

Slide 30 text

Range Types 9.2

Slide 31

Slide 31 text

CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); Range Types 9.2

Slide 32

Slide 32 text

CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); Range Types 9.2 ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&); INSERT INTO talks VALUES (1108, '[2012-09-24 13:30, 2012-09-24 14:00)'); ERROR: conflicting key value violates exclusion constraint "talks_during_excl"

Slide 33

Slide 33 text

Full Text Search

Slide 34

Slide 34 text

Full Text Search TSVECTOR - Text Data TSQUERY - Search Predicates Specialized Indexes and Operators

Slide 35

Slide 35 text

Datatypes smallint bigint integer numeric float serial money char varchar text bytea timestamp timestamptz date time timetz interval boolean enum point line polygon box circle path inet cidr macaddr tsvector tsquery array XML UUID

Slide 36

Slide 36 text

PostGIS

Slide 37

Slide 37 text

1. New datatypes i.e. (2d/3d boxes) PostGIS

Slide 38

Slide 38 text

1. New datatypes i.e. (2d/3d boxes) i.e. SELECT foo && bar ... 2. New operators PostGIS

Slide 39

Slide 39 text

1. New datatypes i.e. (2d/3d boxes) i.e. SELECT foo && bar ... i.e. person within location, nearest distance 2. New operators 3. Understand relationships and distance PostGIS

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Performance

Slide 42

Slide 42 text

Sequential Scans

Slide 43

Slide 43 text

Sequential Scans They’re Bad

Slide 44

Slide 44 text

Sequential Scans They’re Bad (most of the time)

Slide 45

Slide 45 text

Indexes

Slide 46

Slide 46 text

Indexes They’re Good

Slide 47

Slide 47 text

Indexes They’re Good (most of the time)

Slide 48

Slide 48 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)

Slide 49

Slide 49 text

Indexes B-Tree Default Usually want this

Slide 50

Slide 50 text

Indexes Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore

Slide 51

Slide 51 text

Indexes Generalized Search Tree (GIST) Full text search Shapes

Slide 52

Slide 52 text

Understanding Query Perf SELECT last_name FROM employees WHERE salary >= 50000; Given

Slide 53

Slide 53 text

Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)

Slide 54

Slide 54 text

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Startup Cost Explain

Slide 55

Slide 55 text

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Startup Cost Max Time Explain

Slide 56

Slide 56 text

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Startup Cost Max Time Rows Returned Explain

Slide 57

Slide 57 text

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Explain

Slide 58

Slide 58 text

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Startup Cost Explain

Slide 59

Slide 59 text

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Startup Cost Max Time Explain

Slide 60

Slide 60 text

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Startup Cost Max Time Rows Returned Explain

Slide 61

Slide 61 text

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Startup Cost Max Time Rows Returned Explain

Slide 62

Slide 62 text

# CREATE INDEX idx_emps ON employees (salary);

Slide 63

Slide 63 text

# CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)

Slide 64

Slide 64 text

# CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)

Slide 65

Slide 65 text

Indexes Pro Tips

Slide 66

Slide 66 text

Indexes Pro Tips CREATE INDEX CONCURRENTLY

Slide 67

Slide 67 text

Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar

Slide 68

Slide 68 text

Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD

Slide 69

Slide 69 text

Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD SELECT * WHERE Food LIKE ‘bar%’ is OKAY

Slide 70

Slide 70 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;

Slide 73

Slide 73 text

Cache Hit Rate SELECT relname::text, heap_blks_read + heap_blks_hit as reads, round(100 * heap_blks_hit / (heap_blks_hit + heap_blks_read)) as hit_pct, round(100 * idx_blks_hit / (idx_blks_hit + idx_blks_read)) as idx_hit_pct FROM pg_statio_user_tables WHERE heap_blks_hit + heap_blks_read + idx_blks_hit + idx_blks_read > 0 ORDER BY 2 DESC;

Slide 74

Slide 74 text

pg_stats_statements 9.2

Slide 75

Slide 75 text

pg_stats_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid ᴹ 16384 dbid ᴹ 16388 query ᴹ select * from users where email = ?; calls ᴹ 2 total_time ᴹ 0.000268 rows ᴹ 2 shared_blks_hit ᴹ 16 shared_blks_read ᴹ 0 shared_blks_dirtied ᴹ 0 shared_blks_written ᴹ 0 local_blks_hit ᴹ 0 local_blks_read ᴹ 0 local_blks_dirtied ᴹ 0 local_blks_written ᴹ 0 temp_blks_read ᴹ 0 temp_blks_written ᴹ 0 time_read ᴹ 0 time_write ᴹ 0 9.2

Slide 76

Slide 76 text

pg_stats_statements 9.2

Slide 77

Slide 77 text

pg_stats_statements SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; ---------------------------------------------------------------------- query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?; calls | 3000 total_time | 9609.00100000002 rows | 2836 hit_percent | 99.9778970000200936 9.2

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

Querying

Slide 80

Slide 80 text

Window Functions Example: Biggest spender by state

Slide 81

Slide 81 text

SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2; Window Functions

Slide 82

Slide 82 text

SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2; Window Functions

Slide 83

Slide 83 text

Extensions dblink hstore citext ltree isn cube pgcrypto tablefunc uuid-ossp earthdistance trigram fuzzystrmatch pgrowlocks pgstattuple btree_gist dict_int dict_xsyn unaccent

Slide 84

Slide 84 text

Fuzzystrmatch

Slide 85

Slide 85 text

Fuzzystrmatch SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

Slide 86

Slide 86 text

Fuzzystrmatch SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will'); SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg'); SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');

Slide 87

Slide 87 text

Moving Data Around \copy (SELECT * FROM users) TO ‘~/users.csv’; \copy users FROM ‘~/users.csv’;

Slide 88

Slide 88 text

db_link SELECT dblink_connect('myconn', 'dbname=postgres'); SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text); a | b -------+------------ 1 | example 2 | example2

Slide 89

Slide 89 text

Foreign Data Wrappers oracle mysql informix twitter files www couch sybase ldap odbc s3 redis jdbc

Slide 90

Slide 90 text

CREATE EXTENSION redis_fdw; CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address '127.0.0.1', port '6379'); CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database '0'); CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password 'secret'); Foreign Data Wrappers

Slide 91

Slide 91 text

SELECT id, email, value as visits FROM users, redis_db0 WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 10; Query Redis from Postgres SELECT * FROM redis_db0;

Slide 92

Slide 92 text

Readability

Slide 93

Slide 93 text

Readability WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;

Slide 94

Slide 94 text

Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;

Slide 95

Slide 95 text

Brief History Developing w/ Postgres Postgres Performance Querying

Slide 96

Slide 96 text

Extras

Slide 97

Slide 97 text

Extras Listen/Notify

Slide 98

Slide 98 text

Extras Listen/Notify Per Transaction Synchronous Replication

Slide 99

Slide 99 text

Extras Listen/Notify Per Transaction Synchronous Replication Drop index concurrently

Slide 100

Slide 100 text

Postgres - TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum