Postgres – A Data Platform

Slide 1

Slide 1 text

Postgres: A Data Platform craigkerstiens

Slide 2

Slide 2 text

Who am I Product @ Heroku craigkerstiens.com postgresweekly.com postgresguide.com craigkerstiens

Slide 3

Slide 3 text

Postgres – TLDR; Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum craigkerstiens

Slide 4

Slide 4 text

Rough outline Context Day to day Datatypes Performance Expanding postgres

Slide 5

Slide 5 text

1989 • Gas was $1.09 • Movie ticket $2.75 • Microsoft releases Windows 1.0 • Postgres • Project called Postgres is released • Roots in Ingres (post-ingres) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 6

Slide 6 text

1996 It might help to explain that the pronunciation is "post-gres" or! "post-gres-cue-ell", not "post-gray-something".! ! I heard people making this same mistake in presentations at this! past weekend's Postgres Anniversary Conference :-( Arguably,! the 1996 decision to call it PostgreSQL instead of reverting to! plain Postgres was the single worst mistake this project ever made.! It seems far too late to change now, though.! ! ! ! ! regards, tom lane! Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 7

Slide 7 text

MVCC Immutable structure ! Reads don’t block writes, each transaction sees what the state was when it started ! Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 8

Slide 8 text

Under the covers A giant append only log ! Context | Day to day | Datatypes | Performance | Expanding Postgres Writing data adds to the log Updating data adds to the log Deleting data adds to the log

Slide 9

Slide 9 text

A tangent Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 10

Slide 10 text

What is a relational DB • Stores the data and how its related • Data is in a flat two dimensional space • Has relationships between the data ! • Really though it’s math Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 11

Slide 11 text

What is SQL • SQL means SQL to access it • SQL also implies relational • A car has an owner • An owner has an address Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 12

Slide 12 text

NoSQL is overloaded It can relate to CAP but doesn’t necessarily • Consistency • Availability • Partition Tolerance ! • It can relate to SQL or user experience Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 13

Slide 13 text

Databases broken down • Relational databases • Key-Value stores • Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 14

Slide 14 text

Day to Day

Slide 15

Slide 15 text

Postgres bag of tricks • My editor of choice • Writing better SQL • Misc tips/tricks Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 16

Slide 16 text

PSQL

Slide 17

Slide 17 text

PSQL \e - opens your default $EDITOR \d - describe something \dt - list all tables \x auto - pretty results \x help - help .psqlrc - really geek out (pro tip - name your queries) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 18

Slide 18 text

Formatting SQL SELECT foo FROM bar a, baz b WHERE a.this = b.that AND condition2 = false GROUP BY 1 ORDER BY 1 ASC; Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Common Table Expressions

Slide 21

Slide 21 text

An example ! --- Calculates the projects per each user tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), ! --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) ! SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), ! --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 22

Slide 22 text

An example --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 23

Slide 23 text

An example --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 24

Slide 24 text

An example --- Calculates the projects per each user tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 25

Slide 25 text

An example --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 26

Slide 26 text

An example SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 27

Slide 27 text

Window Functions

Slide 28

Slide 28 text

An Example SELECT last_name, salary, department, rank() OVER (PARTITION BY department ORDER BY salary DESC) FROM employees; ! ! ! last_name salary department rank Jones 45000 Accounting 1 Williams 37000 Accounting 2 Smith 55000 Sales 1 Adams 50000 Sales 2 Johnson 40000 Marketing 1

Slide 29

Slide 29 text

Extensions

Slide 30

Slide 30 text

Extensions hstore citext db_link uuid_ossp cube PostGIS intarray pg_crypto isn table_func unaccent dict_int dict_xsyn earth_distance pg_stat_tuple ltree pg_row_locks trigram

Slide 31

Slide 31 text

Datatypes

Slide 32

Slide 32 text

The list integer float serial money character bytea timestamp timestamp with tz date time interval Context | Day to day | Datatypes | Performance | Expanding Postgres boolean enums point line box path polygon circle inet cidr bit tsvector tsquery UUID XML JSON array rangetypes

Slide 33

Slide 33 text

Money • Don’t use it • Created when the world was flat • Money only knows a single currency Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 34

Slide 34 text

Serial Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 35

Slide 35 text

Serial • Serial has a limit • UUID scales • UUID doesn’t leak your primary keys • uuid_ossp extension UUID Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 36

Slide 36 text

Shapes • Basic ones for a shortcut • earth_distance ! • PostGIS for robustness Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 37

Slide 37 text

XML Just kidding

Slide 38

Slide 38 text

XML

Slide 39

Slide 39 text

JSON vs. hStore • NoSQL often means schema-less ! • Adding columns is annoying • Just add data Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 40

Slide 40 text

hStore • Key value store directly in Postgres ! • Can filter where a key does/doesn’t exist • Can filter for specific values of keys Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 41

Slide 41 text

JSON Postgres 9.2 was the JSON release Marketing Postgres 9.3 started to become usable operators, basic indexing, etc. Postgres 9.4 Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 42

Slide 42 text

JSON Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 43

Slide 43 text

JSONB Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 44

Slide 44 text

Others • Timestamps with timezone • Intervals (now() - ‘1 hour’::interval) • Array • Range types Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 45

Slide 45 text

Performance

Slide 46

Slide 46 text

Mo cache, Less problems

Slide 47

Slide 47 text

Cache SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 48

Slide 48 text

Slide 49

Slide 49 text

Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC; Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Rule of thumb Cache > 99% Index hit rate > 95% where rows > 10000 Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 52

Slide 52 text

Speciﬁc Query Performance Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 53

Slide 53 text

An example SELECT last_name FROM employees WHERE salary >= 50000; Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 54

Slide 54 text

Context | Day to day | Datatypes | Performance | Expanding Postgres # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Query plans

Slide 55

Slide 55 text

Query plans # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 56

Slide 56 text

Query plans # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) ! Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 57

Slide 57 text

Indexes fix it # CREATE INDEX idx_emps ON employees (salary); Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 58

Slide 58 text

EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres Indexes fix it

Slide 59

Slide 59 text

pg_stat_statments $ select * from pg_stat_statements where query ~ 'from users where email'; ! ! userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ... Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 60

Slide 60 text

pg_stat_statements SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 61

Slide 61 text

Slide 62

Slide 62 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 63

Slide 63 text

Indexes Which do I use? Context | Day to day | Datatypes | Performance | Expanding Postgres craigkerstiens

Slide 64

Slide 64 text

BTree If you don’t know better, you’re using this. This is usually okay. Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 65

Slide 65 text

Generalized Inverted Index (GIN) Multiple values in a single column. ! Rule of thumb is Array/hStore Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 66

Slide 66 text

Generalized Search Tree (GiST) Values that span across a boundary ! Rule of thumb is full text and shapes Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 67

Slide 67 text

Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) VODKA (coming soon) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 68

Slide 68 text

There’s more Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 69

Slide 69 text

Conditional & Functional Conditional ! CREATE INDEX … WHERE deleted_at is null ! Functional ! CREATE INDEX … LOWER(firstname) Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 70

Slide 70 text

Backups Logical ! Portable Has load on DB ! < 50 GB Context | Day to day | Datatypes | Performance | Expanding Postgres Phsyical ! More setup Limited load on DB ! > 50 GB

Slide 71

Slide 71 text

Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 72

Slide 72 text

Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 73

Slide 73 text

Expanding Postgres

Slide 74

Slide 74 text

Slide 75

Slide 75 text

Slide 76

Slide 76 text

What is Postgres Context | Day to day | Datatypes | Performance | Expanding Postgres

Slide 77

Slide 77 text

Foreign Data Wrappers • Query from Postgres 1 to Postgres 2 • Query other things from Postgres • Redis • Mongo • LDAP • Twitter? Context | Day to day | Datatypes | Performance | Expanding Postgres