Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Confuse a cat - the Postgres way: The main conc...

Confuse a cat - the Postgres way: The main concepts of advanced query processing and optimization in examples

When it comes to optimization of a query not only indexes come into play. Of course modifications to data access methods are very important and often offer huge performance gain but it’s definitely not a “silver bullet”.
This talk focuses on some of the most important concepts of query processing including layers above data access methods, i.e. the flow and dependencies between subqueries, the methods of writing queries and its consequences and also some “hacks” for the optimizer will be shown.

Presented in Prague on Prague PostgreSQL Meetup

https://www.meetup.com/pl-PL/Prague-PostgreSQL-Meetup/events/254172810/

AwdotiaRomanowna

April 29, 2019
Tweet

More Decks by AwdotiaRomanowna

Other Decks in Technology

Transcript

  1. Alicja Kucharczyk Senior Solution Architect PostgreSQL Team Leader Confuse a

    cat - the Postgres way. The main concepts of advanced query processing and its optimization in examples Prague, 2019-04-29
  2. o Query processing concepts: – pipelining vs. materialization – flattening

    – inlining – optimization fence – double evaluation o WITH queries behavior o Repeated calculations o Repeated calculations and the functions Overview
  3. SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1,

    10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization
  4. o Pipelining Stream the output as it is produced, without

    waiting to the end of parent operations. o Materialization Use intermediate results materialized into temporary relations to evaluate next-level operations. The entire output have to be computed. Pipelining vs. materialization
  5. SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1,

    10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization
  6. SELECT expensive_function(TABLE) FROM table WHERE expensive_function(TABLE) IS NOT NULL; =

    SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation
  7. SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table OFFSET

    0) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation
  8. a style choice? SELECT * FROM (SELECT min(birth_date), position_id FROM

    (SELECT * FROM test.staff WHERE contract_type_id = 'UP') tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01') zagreguj ORDER BY 2, 1; = WITH tylko_up AS ( SELECT * FROM test.staff WHERE contract_type_id = 'UP' ), zagreguj AS ( SELECT min(birth_date), position_id FROM tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01' ) SELECT * FROM zagreguj ORDER BY 2, 1;
  9. a style choice? SELECT * FROM (SELECT n i_name, n

    i_price FROM generate_series(1, 10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; = WITH price_list AS ( SELECT n i_name, n i_price FROM generate_series(1, 10000) n ) SELECT * FROM price_list WHERE i_price BETWEEN 10 AND 100;
  10. a style choice? SELECT f FROM (SELECT expensive_function(TABLE) AS f

    FROM table) ss WHERE f IS NOT NULL; = WITH ss AS ( SELECT expensive_function(TABLE) AS f FROM table ) SELECT f FROM ss WHERE f IS NOT NULL;
  11. o PostgreSQL’s CTEs are optimisation fences o They can be

    treated as „hints” for optimizer o Are always materialized o „A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile;” Craig Ringer o „CTEs are likewise a rather blunt tool that could cost more than they save.” Tom Lane WITH Queries Behavior
  12. SELECT (a + b) * c AS c1, (a +

    b) * d AS d1, (a + b) * e AS e1 FROM table; An example from the mailing list
  13. Questions asked: o „is PostgreSQL "smart" enough to cache this

    (a+b) calculation”? o „Or perhaps I could use a CTE?” o „would I generally get better performance by using a subquery to convince PostgreSQL to cache the result?” An example from the mailing list
  14. Tom’s Lane answers: o Postgres will not notice that there's

    duplicate subexpressions. o Subquery? Maybe… if: – the subexpression is expensive enough – the subquery will not be "flattened" into the upper query – But the OFFSET 0 hack could prevent optimizations that are far more important than avoiding duplicated calculations o CTEs are likewise a rather blunt tool that could cost more than they save. An example from the mailing list
  15. SELECT x * c AS c1, x * d AS

    d1, x * e AS e1 FROM tab, LATERAL (SELECT a + b AS x OFFSET 0) ss; An example from the mailing list
  16. Tom’s Lane answers: o This construction should avoid most of

    the optimization pitfalls, o except one: in this formulation, a+b will be evaluated for every "tab" row even if the particular use of the view does not demand any of the columns that use "x". An example from the mailing list
  17. CREATE TABLE random_data AS SELECT s AS kol1, md5(random() ::

    TEXT) AS kol2, md5((random() / 2) :: TEXT) AS kol3, md5(random() :: TEXT) AS kol4, md5(random() :: TEXT) AS kol5, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol6, md5(random() :: TEXT) AS kol7, md5(random() :: TEXT) AS kol8, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol9, md5(random() :: TEXT) AS kol10, md5(random() :: TEXT) AS kol11, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol12, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol13, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol14, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol15, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol16, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol17, s AS kol18 FROM generate_series(1, 500000) s; Repeated calculations and the functions
  18. CREATE OR REPLACE FUNCTION sql_function() RETURNS INTEGER LANGUAGE SQL AS

    $$ SELECT 123; $$ STABLE; Repeated calculations and the functions
  19. CREATE OR REPLACE FUNCTION plpgsql_function() RETURNS INTEGER LANGUAGE 'plpgsql' AS

    $$ BEGIN RETURN 123; END; $$ STABLE; Repeated calculations and the functions