Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres – A Data Platform

Craig Kerstiens
October 09, 2014
420

Postgres – A Data Platform

Craig Kerstiens

October 09, 2014
Tweet

Transcript

  1. Postgres – TLDR; Datatypes Conditional Indexes Transactional DDL Foreign Data

    Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum craigkerstiens
  2. 1989 • Gas was $1.09 • Movie ticket $2.75 •

    Microsoft releases Windows 1.0 • Postgres • Project called Postgres is released • Roots in Ingres (post-ingres) Context | Day to day | Datatypes | Performance | Expanding Postgres
  3. 1996 It might help to explain that the pronunciation is

    "post-gres" or! "post-gres-cue-ell", not "post-gray-something".! ! I heard people making this same mistake in presentations at this! past weekend's Postgres Anniversary Conference :-( Arguably,! the 1996 decision to call it PostgreSQL instead of reverting to! plain Postgres was the single worst mistake this project ever made.! It seems far too late to change now, though.! ! ! ! ! regards, tom lane! Context | Day to day | Datatypes | Performance | Expanding Postgres
  4. MVCC Immutable structure ! Reads don’t block writes, each transaction

    sees what the state was when it started ! Context | Day to day | Datatypes | Performance | Expanding Postgres
  5. Under the covers A giant append only log ! Context

    | Day to day | Datatypes | Performance | Expanding Postgres Writing data adds to the log Updating data adds to the log Deleting data adds to the log
  6. A tangent Context | Day to day | Datatypes |

    Performance | Expanding Postgres
  7. What is a relational DB • Stores the data and

    how its related • Data is in a flat two dimensional space • Has relationships between the data ! • Really though it’s math Context | Day to day | Datatypes | Performance | Expanding Postgres
  8. What is SQL • SQL means SQL to access it

    • SQL also implies relational • A car has an owner • An owner has an address Context | Day to day | Datatypes | Performance | Expanding Postgres
  9. NoSQL is overloaded It can relate to CAP but doesn’t

    necessarily • Consistency • Availability • Partition Tolerance ! • It can relate to SQL or user experience Context | Day to day | Datatypes | Performance | Expanding Postgres
  10. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  11. Postgres bag of tricks • My editor of choice •

    Writing better SQL • Misc tips/tricks Context | Day to day | Datatypes | Performance | Expanding Postgres
  12. PSQL \e - opens your default $EDITOR \d - describe

    something \dt - list all tables \x auto - pretty results \x help - help .psqlrc - really geek out (pro tip - name your queries) Context | Day to day | Datatypes | Performance | Expanding Postgres
  13. Formatting SQL SELECT foo FROM bar a, baz b WHERE

    a.this = b.that AND condition2 = false GROUP BY 1 ORDER BY 1 ASC; Context | Day to day | Datatypes | Performance | Expanding Postgres
  14. An example ! --- Calculates the projects per each user

    tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), ! --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) ! SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), ! --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  15. An example --- Initial query to grab project title and

    tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  16. An example --- Calculates the total tasks per each project

    total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  17. An example --- Calculates the projects per each user tasks_per_project_per_user

    AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  18. An example --- Gets user ids that have over 50%

    of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) Context | Day to day | Datatypes | Performance | Expanding Postgres
  19. An example SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE

    users_tasks.user_id = overloaded_users.user_id Context | Day to day | Datatypes | Performance | Expanding Postgres
  20. An Example SELECT last_name, salary, department, rank() OVER (PARTITION BY

    department ORDER BY salary DESC) FROM employees; ! ! ! last_name salary department rank Jones 45000 Accounting 1 Williams 37000 Accounting 2 Smith 55000 Sales 1 Adams 50000 Sales 2 Johnson 40000 Marketing 1
  21. Extensions hstore citext db_link uuid_ossp cube PostGIS intarray pg_crypto isn

    table_func unaccent dict_int dict_xsyn earth_distance pg_stat_tuple ltree pg_row_locks trigram
  22. The list integer float serial money character bytea timestamp timestamp

    with tz date time interval Context | Day to day | Datatypes | Performance | Expanding Postgres boolean enums point line box path polygon circle inet cidr bit tsvector tsquery UUID XML JSON array rangetypes
  23. Money • Don’t use it • Created when the world

    was flat • Money only knows a single currency Context | Day to day | Datatypes | Performance | Expanding Postgres
  24. Serial • Serial has a limit • UUID scales •

    UUID doesn’t leak your primary keys • uuid_ossp extension UUID Context | Day to day | Datatypes | Performance | Expanding Postgres
  25. Shapes • Basic ones for a shortcut • earth_distance !

    • PostGIS for robustness Context | Day to day | Datatypes | Performance | Expanding Postgres
  26. XML

  27. JSON vs. hStore • NoSQL often means schema-less ! •

    Adding columns is annoying • Just add data Context | Day to day | Datatypes | Performance | Expanding Postgres
  28. hStore • Key value store directly in Postgres ! •

    Can filter where a key does/doesn’t exist • Can filter for specific values of keys Context | Day to day | Datatypes | Performance | Expanding Postgres
  29. JSON Postgres 9.2 was the JSON release Marketing Postgres 9.3

    started to become usable operators, basic indexing, etc. Postgres 9.4 Context | Day to day | Datatypes | Performance | Expanding Postgres
  30. Others • Timestamps with timezone • Intervals (now() - ‘1

    hour’::interval) • Array • Range types Context | Day to day | Datatypes | Performance | Expanding Postgres
  31. Cache SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read))

    / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes) Context | Day to day | Datatypes | Performance | Expanding Postgres
  32. Cache name | ratio ----------------+------------------------ cache hit rate | 0.99

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  33. Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan

    + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC; Context | Day to day | Datatypes | Performance | Expanding Postgres
  34. Index Hit Rate relname | percent_used | rows_in_table ------------------------------------+--------------- events

    | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0 Context | Day to day | Datatypes | Performance | Expanding Postgres
  35. Rule of thumb Cache > 99% Index hit rate >

    95% where rows > 10000 Context | Day to day | Datatypes | Performance | Expanding Postgres
  36. An example SELECT last_name FROM employees WHERE salary >= 50000;

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  37. Context | Day to day | Datatypes | Performance |

    Expanding Postgres # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Query plans
  38. Query plans # EXPLAIN SELECT last_name FROM employees WHERE salary

    >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  39. Query plans # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE

    salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) ! Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  40. Indexes fix it # CREATE INDEX idx_emps ON employees (salary);

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  41. EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000;

    QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres Indexes fix it
  42. pg_stat_statments $ select * from pg_stat_statements where query ~ 'from

    users where email'; ! ! userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ... Context | Day to day | Datatypes | Performance | Expanding Postgres
  43. pg_stat_statements SELECT (total_time / 1000 / 60) as total, (total_time/calls)

    as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; Context | Day to day | Datatypes | Performance | Expanding Postgres
  44. pg_stat_statements total | avg | query --------+--------+------------------------- 295.76 | 10.13

    | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  45. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) Context | Day to day | Datatypes | Performance | Expanding Postgres
  46. Indexes Which do I use? Context | Day to day

    | Datatypes | Performance | Expanding Postgres craigkerstiens
  47. BTree If you don’t know better, you’re using this. This

    is usually okay. Context | Day to day | Datatypes | Performance | Expanding Postgres
  48. Generalized Inverted Index (GIN) Multiple values in a single column.

    ! Rule of thumb is Array/hStore Context | Day to day | Datatypes | Performance | Expanding Postgres
  49. Generalized Search Tree (GiST) Values that span across a boundary

    ! Rule of thumb is full text and shapes Context | Day to day | Datatypes | Performance | Expanding Postgres
  50. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) VODKA (coming soon) Context | Day to day | Datatypes | Performance | Expanding Postgres
  51. There’s more Context | Day to day | Datatypes |

    Performance | Expanding Postgres
  52. Conditional & Functional Conditional ! CREATE INDEX … WHERE deleted_at

    is null ! Functional ! CREATE INDEX … LOWER(firstname) Context | Day to day | Datatypes | Performance | Expanding Postgres
  53. Backups Logical ! Portable Has load on DB ! <

    50 GB Context | Day to day | Datatypes | Performance | Expanding Postgres Phsyical ! More setup Limited load on DB ! > 50 GB
  54. Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day

    to day | Datatypes | Performance | Expanding Postgres
  55. Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day

    to day | Datatypes | Performance | Expanding Postgres
  56. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  57. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  58. What is Postgres Context | Day to day | Datatypes

    | Performance | Expanding Postgres
  59. Foreign Data Wrappers • Query from Postgres 1 to Postgres

    2 • Query other things from Postgres • Redis • Mongo • LDAP • Twitter? Context | Day to day | Datatypes | Performance | Expanding Postgres
  60. Postgres – TLDR; Datatypes Conditional Indexes Transactional DDL Foreign Data

    Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum craigkerstiens