Postgres – A Data Platform

240da217b000604a2302dfc8b02a9cad?s=47 Craig Kerstiens
October 09, 2014
360

Postgres – A Data Platform

240da217b000604a2302dfc8b02a9cad?s=128

Craig Kerstiens

October 09, 2014
Tweet

Transcript

  1. Postgres: A Data Platform craigkerstiens

  2. Who am I Product @ Heroku craigkerstiens.com postgresweekly.com postgresguide.com craigkerstiens

  3. Postgres – TLDR; Datatypes Conditional Indexes Transactional DDL Foreign Data

    Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum craigkerstiens
  4. Rough outline Context Day to day Datatypes Performance Expanding postgres

  5. 1989 • Gas was $1.09 • Movie ticket $2.75 •

    Microsoft releases Windows 1.0 • Postgres • Project called Postgres is released • Roots in Ingres (post-ingres) Context | Day to day | Datatypes | Performance | Expanding Postgres
  6. 1996 It might help to explain that the pronunciation is

    "post-gres" or! "post-gres-cue-ell", not "post-gray-something".! ! I heard people making this same mistake in presentations at this! past weekend's Postgres Anniversary Conference :-( Arguably,! the 1996 decision to call it PostgreSQL instead of reverting to! plain Postgres was the single worst mistake this project ever made.! It seems far too late to change now, though.! ! ! ! ! regards, tom lane! Context | Day to day | Datatypes | Performance | Expanding Postgres
  7. MVCC Immutable structure ! Reads don’t block writes, each transaction

    sees what the state was when it started ! Context | Day to day | Datatypes | Performance | Expanding Postgres
  8. Under the covers A giant append only log ! Context

    | Day to day | Datatypes | Performance | Expanding Postgres Writing data adds to the log Updating data adds to the log Deleting data adds to the log
  9. A tangent Context | Day to day | Datatypes |

    Performance | Expanding Postgres
  10. What is a relational DB • Stores the data and

    how its related • Data is in a flat two dimensional space • Has relationships between the data ! • Really though it’s math Context | Day to day | Datatypes | Performance | Expanding Postgres
  11. What is SQL • SQL means SQL to access it

    • SQL also implies relational • A car has an owner • An owner has an address Context | Day to day | Datatypes | Performance | Expanding Postgres
  12. NoSQL is overloaded It can relate to CAP but doesn’t

    necessarily • Consistency • Availability • Partition Tolerance ! • It can relate to SQL or user experience Context | Day to day | Datatypes | Performance | Expanding Postgres
  13. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  14. Day to Day

  15. Postgres bag of tricks • My editor of choice •

    Writing better SQL • Misc tips/tricks Context | Day to day | Datatypes | Performance | Expanding Postgres
  16. PSQL

  17. PSQL \e - opens your default $EDITOR \d - describe

    something \dt - list all tables \x auto - pretty results \x help - help .psqlrc - really geek out (pro tip - name your queries) Context | Day to day | Datatypes | Performance | Expanding Postgres
  18. Formatting SQL SELECT foo FROM bar a, baz b WHERE

    a.this = b.that AND condition2 = false GROUP BY 1 ORDER BY 1 ASC; Context | Day to day | Datatypes | Performance | Expanding Postgres
  19. More readable SQL

  20. Common Table Expressions

  21. An example ! --- Calculates the projects per each user

    tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), ! --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) ! SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), ! --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  22. An example --- Initial query to grab project title and

    tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  23. An example --- Calculates the total tasks per each project

    total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  24. An example --- Calculates the projects per each user tasks_per_project_per_user

    AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), Context | Day to day | Datatypes | Performance | Expanding Postgres
  25. An example --- Gets user ids that have over 50%

    of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, ! FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) Context | Day to day | Datatypes | Performance | Expanding Postgres
  26. An example SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE

    users_tasks.user_id = overloaded_users.user_id Context | Day to day | Datatypes | Performance | Expanding Postgres
  27. Window Functions

  28. An Example SELECT last_name, salary, department, rank() OVER (PARTITION BY

    department ORDER BY salary DESC) FROM employees; ! ! ! last_name salary department rank Jones 45000 Accounting 1 Williams 37000 Accounting 2 Smith 55000 Sales 1 Adams 50000 Sales 2 Johnson 40000 Marketing 1
  29. Extensions

  30. Extensions hstore citext db_link uuid_ossp cube PostGIS intarray pg_crypto isn

    table_func unaccent dict_int dict_xsyn earth_distance pg_stat_tuple ltree pg_row_locks trigram
  31. Datatypes

  32. The list integer float serial money character bytea timestamp timestamp

    with tz date time interval Context | Day to day | Datatypes | Performance | Expanding Postgres boolean enums point line box path polygon circle inet cidr bit tsvector tsquery UUID XML JSON array rangetypes
  33. Money • Don’t use it • Created when the world

    was flat • Money only knows a single currency Context | Day to day | Datatypes | Performance | Expanding Postgres
  34. Serial Context | Day to day | Datatypes | Performance

    | Expanding Postgres
  35. Serial • Serial has a limit • UUID scales •

    UUID doesn’t leak your primary keys • uuid_ossp extension UUID Context | Day to day | Datatypes | Performance | Expanding Postgres
  36. Shapes • Basic ones for a shortcut • earth_distance !

    • PostGIS for robustness Context | Day to day | Datatypes | Performance | Expanding Postgres
  37. XML Just kidding

  38. XML

  39. JSON vs. hStore • NoSQL often means schema-less ! •

    Adding columns is annoying • Just add data Context | Day to day | Datatypes | Performance | Expanding Postgres
  40. hStore • Key value store directly in Postgres ! •

    Can filter where a key does/doesn’t exist • Can filter for specific values of keys Context | Day to day | Datatypes | Performance | Expanding Postgres
  41. JSON Postgres 9.2 was the JSON release Marketing Postgres 9.3

    started to become usable operators, basic indexing, etc. Postgres 9.4 Context | Day to day | Datatypes | Performance | Expanding Postgres
  42. JSON Context | Day to day | Datatypes | Performance

    | Expanding Postgres
  43. JSONB Context | Day to day | Datatypes | Performance

    | Expanding Postgres
  44. Others • Timestamps with timezone • Intervals (now() - ‘1

    hour’::interval) • Array • Range types Context | Day to day | Datatypes | Performance | Expanding Postgres
  45. Performance

  46. Mo cache, Less problems

  47. Cache SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read))

    / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes) Context | Day to day | Datatypes | Performance | Expanding Postgres
  48. Cache name | ratio ----------------+------------------------ cache hit rate | 0.99

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  49. Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan

    + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC; Context | Day to day | Datatypes | Performance | Expanding Postgres
  50. Index Hit Rate relname | percent_used | rows_in_table ------------------------------------+--------------- events

    | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0 Context | Day to day | Datatypes | Performance | Expanding Postgres
  51. Rule of thumb Cache > 99% Index hit rate >

    95% where rows > 10000 Context | Day to day | Datatypes | Performance | Expanding Postgres
  52. Specific Query Performance Context | Day to day | Datatypes

    | Performance | Expanding Postgres
  53. An example SELECT last_name FROM employees WHERE salary >= 50000;

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  54. Context | Day to day | Datatypes | Performance |

    Expanding Postgres # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Query plans
  55. Query plans # EXPLAIN SELECT last_name FROM employees WHERE salary

    >= 50000; ! QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  56. Query plans # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE

    salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) ! Filter: (salary >= 50000) (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  57. Indexes fix it # CREATE INDEX idx_emps ON employees (salary);

    Context | Day to day | Datatypes | Performance | Expanding Postgres
  58. EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000;

    QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres Indexes fix it
  59. pg_stat_statments $ select * from pg_stat_statements where query ~ 'from

    users where email'; ! ! userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ... Context | Day to day | Datatypes | Performance | Expanding Postgres
  60. pg_stat_statements SELECT (total_time / 1000 / 60) as total, (total_time/calls)

    as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; Context | Day to day | Datatypes | Performance | Expanding Postgres
  61. pg_stat_statements total | avg | query --------+--------+------------------------- 295.76 | 10.13

    | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) Context | Day to day | Datatypes | Performance | Expanding Postgres
  62. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) Context | Day to day | Datatypes | Performance | Expanding Postgres
  63. Indexes Which do I use? Context | Day to day

    | Datatypes | Performance | Expanding Postgres craigkerstiens
  64. BTree If you don’t know better, you’re using this. This

    is usually okay. Context | Day to day | Datatypes | Performance | Expanding Postgres
  65. Generalized Inverted Index (GIN) Multiple values in a single column.

    ! Rule of thumb is Array/hStore Context | Day to day | Datatypes | Performance | Expanding Postgres
  66. Generalized Search Tree (GiST) Values that span across a boundary

    ! Rule of thumb is full text and shapes Context | Day to day | Datatypes | Performance | Expanding Postgres
  67. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST)

    K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST) VODKA (coming soon) Context | Day to day | Datatypes | Performance | Expanding Postgres
  68. There’s more Context | Day to day | Datatypes |

    Performance | Expanding Postgres
  69. Conditional & Functional Conditional ! CREATE INDEX … WHERE deleted_at

    is null ! Functional ! CREATE INDEX … LOWER(firstname) Context | Day to day | Datatypes | Performance | Expanding Postgres
  70. Backups Logical ! Portable Has load on DB ! <

    50 GB Context | Day to day | Datatypes | Performance | Expanding Postgres Phsyical ! More setup Limited load on DB ! > 50 GB
  71. Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day

    to day | Datatypes | Performance | Expanding Postgres
  72. Replication Slony Londiste Bucardo PgPool WAL-e Barman Context | Day

    to day | Datatypes | Performance | Expanding Postgres
  73. Expanding Postgres

  74. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  75. Databases broken down • Relational databases • Key-Value stores •

    Document databases • Text search solutions • Distributed data stores • Time series Context | Day to day | Datatypes | Performance | Expanding Postgres
  76. What is Postgres Context | Day to day | Datatypes

    | Performance | Expanding Postgres
  77. Foreign Data Wrappers • Query from Postgres 1 to Postgres

    2 • Query other things from Postgres • Redis • Mongo • LDAP • Twitter? Context | Day to day | Datatypes | Performance | Expanding Postgres
  78. PL’s Context | Day to day | Datatypes | Performance

    | Expanding Postgres
  79. PLV8 Context | Day to day | Datatypes | Performance

    | Expanding Postgres
  80. Postgres – TLDR; Datatypes Conditional Indexes Transactional DDL Foreign Data

    Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions NoSQL inside SQL Momentum craigkerstiens
  81. Fin. craigkerstiens