Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PostgreSQL Workshop

PostgreSQL Workshop

Avatar for Coopdevs

Coopdevs

June 06, 2019
Tweet

More Decks by Coopdevs

Other Decks in Programming

Transcript

  1. What is PostgreSQL? Relational Extensible SQL-standard compliant (mostly) ACID through

    MVCC Transactions Procedural language Triggers Views Replication
  2. What is PostgreSQL? Concurrent DDL Rules CTEs Full-text search Pub/Sub

    TOAST (The Oversized-Attribute Storage Technique) Foreign Data Wrappers Parallel query execution JIT
  3. What is PostgreSQL? Ingres, University of California, Berkeley, 1982 POSTGRES

    project starts in Berkeley, 1985 POSTGRES v1, 1989 MIT license, 1994 Replaced POSTQUEL with SQL + psql, 1995 Renamed to PostgreSQL and life outside Berkeley, 1996 v6.0 and PostgreSQL Global Development Group, 1997
  4. Data types uuid, monetary, enumerated, geometric, binary, network address, bit

    string, text search, xml, json, array, composites, ranges, hstore, etc.
  5. User-defined data types CREATE TYPE invoice_state as enum('pending', 'failed', 'charged');

    CREATE TABLE invoices ( id serial, customer_id int, amount int, state invoice_state );
  6. User-defined data types CREATE TYPE full_address AS ( city VARCHAR(90),

    street VARCHAR(90) ); SELECT (address).city, (address).street FROM enterprises;
  7. User-defined domains A domain is a user-defined data type that

    is based on another underlying type Can have constraints on top of the underlying type
  8. User-defined domains CREATE DOMAIN natural AS integer CHECK (VALUE >

    0); CREATE TABLE mytable (id natural); INSERT INTO mytable VALUES(1); -- works INSERT INTO mytable VALUES(-1); -- fails
  9. json and jsonb Enforces valid JSON Rich set of JSON

    functions, operators and indexes to retrieve and process data
  10. json and jsonb json is a text data type with

    JSON validation without any processing It came first (and too early?)
  11. json and jsonb CREATE TABLE books (book_id serial NOT NULL,

    data jsonb); INSERT INTO books VALUES ( 1, '{"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}' ); SELECT data->'title' FROM books WHERE data->'genres' @> '["Fiction", "Horror"]'::jsonb; Sleeping Beauties
  12. PostGIS Spatial types, functions, operators and indexes Raster and vector

    data support Interoperability with 3rd pary geospatial tools Makes PostgreSQL a spatial database management system
  13. Text search types tsvector: represents a document in a form

    optimized for text search tsquery: represents a text query
  14. Text search types SELECT 'a fat cat sat on a

    mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery;
  15. Text search types Full text search performs: — Parsing documents

    into tokens — Converting tokens into lexemes — Storing preprocessed documents optimized for searching
  16. Constraints It’s a lot cheaper to catch bad data before

    it goes in than it is to clean it up afterwards
  17. CHECK constraints CREATE TABLE products ( (...) sale_price numeric CHECK

    (sale_price > 0), CHECK (price > sale_price) );
  18. Exclusion constraints ensures that if any two rows are compared

    on the specified columns or expressions using the specified operators, at least one of these operator comparisons will return false or null
  19. Exclusion constraints CREATE TABLE billings ( id uuid NOT NULL,

    period tstzrange NOT NULL, price_per_month integer NOT NULL );
  20. Rule system CREATE VIEW myview AS SELECT * FROM mytab;

    is the same as: CREATE TABLE myview (same column list as mytab); CREATE RULE "_RETURN" AS ON SELECT TO myview DO INSTEAD SELECT * FROM mytab;
  21. Updatable Views Simple views are automatically updatable. The system converts

    writes to the corresponing base relation Only one base relation is allowed
  22. Materialized views CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM

    mytab; A materialized view is a relation, just like a table or a view Data is returned from the materialized view, like a table
  23. Expression indexes CREATE INDEX articles_day ON articles ( date(published_at) );

    WHERE date(articles.published_at) = date('2011-03-07')
  24. GIN Generalized Inverted Index, when indexed items are composite values

    queries search for values that appear within the composite items GIN GIN only takes care of: — concurrency
  25. GIN Strategy defines: — how keys are extracted from indexed

    items — how keys are extracted from query conditions — when a query is satisfied
  26. GIN Interface Datum* extractValue(Datum inputValue, uint32* nentries) int compareEntry(Datum a,

    Datum b) Datum* extractQuery(Datum query, uint32* nentries, StrategyNumber n) bool consistent(bool[] check, StrategyNumber n, Datum query)
  27. built-in GIN operator classes Name Indexable operators array_ops tsvector_ops jsonb_ops

    indexable operators: ? ?& ? jsonb_path_ops indexable operators: @>* Fewer operators => better performance
  28. Statistics Collector track_activities: server's top track_counts: stats about table and

    index accesses track_functions: usage of user-defined functions track_io_timing: block read and write times
  29. pg_stat_activity datid | 16386 datname | openfoodnetwork pid | 2763

    usesysid | 16384 usename | ofn_user application_name | delayed_job client_addr | ::1 client_hostname | client_port | 57372 backend_start | 2019-06-05 07:52:16.372732+00 xact_start | query_start | 2019-06-05 09:40:43.282826+00 state_change | 2019-06-05 09:40:43.283123+00 waiting | f state | idle backend_xid | backend_xmin | query | UPDATE "delayed_jobs" SET locked_at = '2019-06-05 09:40:43.279994', ...
  30. Collected Statistics Views pg_stat_archiver: WAL archiver process' activity pg_stat_bgwriter: Background

    writer process's activity pg_stat_database: Database-wide stats pg_stat_all_tables: Stats about accesses to each table pg_stat_sys_tables: System tables only pg_stat_user_tables: User tables only (...) pg_stat_xact_all_tables: Stats about current transaction wich are not yet in pg_stat_all_tables (...) pg_stat_user_indexes: Indexes accesses on user tables
  31. Collected Statistics Views SELECT relname, 100 * idx_scan / (seq_scan

    + idx_scan) percent_of_times_index_used, n_live_tup rows_in_table FROM pg_stat_user_tables ORDER BY n_live_tup DESC; relname | percent_of_times_index_used | rows_in_table -------------------------------------+-----------------------------+--------------- spree_state_changes | 3 | 150212 sessions | 97 | 136295
  32. Statistics Collector It also provides functions to build our own

    custom views SELECT pg_stat_get_backend_pid(s.backendid) AS pid, pg_stat_get_backend_activity(s.backendid) AS query FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
  33. pg_stat_statements SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /

    nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; query | INSERT INTO "sessions" ("created_at", "data", "session_id", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id" calls | 5 total_time | 83.215 rows | 5 hit_percent | 73.5294117647058824
  34. Your turn Clone openfoodnetwork and check out the code Ruby

    on Rails: — app/models/ — app/views/ — app/controllers/
  35. Your turn Take a look at the dataset we gave

    you access to and analyze how the web app makes use of the schema and data. Your task is to come up with ways to optimize its performance, simplify its manipulation or remove app code. Are intrigued by some other PostgreSQL feature? take the time to play with it with this sample dataset. Be creative and use the docs!
  36. Ideas Analyze all the information made available by the statistics

    collector and draw conclusions Find ways we could enrich the db schema with views that could make the app code simpler Replace app-level code with PostgreSQL data types and constraints
  37. Ideas Take the most time-consuming query and try to optimize

    it with indexes and/or schema changes. The current design might not the best suited for it. Check the app code and see what can be done at the DB level to simplify it. Take a look at model validations.
  38. Ideas Do you have transformations (ETL) in mind that could

    make it easier to analyze the business in a classic data warehouse? Can you find a way to improve the data integrity of the database with the options PostgreSQL gives us?
  39. Q&A