Slide 1

Slide 1 text

Alicja Kucharczyk Senior Solution Architect PostgreSQL Team Leader Confuse a cat - the Postgres way. The main concepts of advanced query processing and its optimization in examples Prague, 2019-04-29

Slide 2

Slide 2 text

o Query processing concepts: – pipelining vs. materialization – flattening – inlining – optimization fence – double evaluation o WITH queries behavior o Repeated calculations o Repeated calculations and the functions Overview

Slide 3

Slide 3 text

Query Processing Concepts

Slide 4

Slide 4 text

SELECT n i_name, n i_price FROM generate_series(1, 10000) n; Pipelining vs. materialization

Slide 5

Slide 5 text

SELECT n i_name, n i_price FROM generate_series(1, 10000) n; Pipelining vs. materialization

Slide 6

Slide 6 text

SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1, 10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization

Slide 7

Slide 7 text

o Pipelining Stream the output as it is produced, without waiting to the end of parent operations. o Materialization Use intermediate results materialized into temporary relations to evaluate next-level operations. The entire output have to be computed. Pipelining vs. materialization

Slide 8

Slide 8 text

SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1, 10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization

Slide 9

Slide 9 text

o Flattening Merging sub-queries into upper queries, for example as a JOIN. Flattening

Slide 10

Slide 10 text

Flattening

Slide 11

Slide 11 text

Flattening

Slide 12

Slide 12 text

Flattening

Slide 13

Slide 13 text

Flattening

Slide 14

Slide 14 text

SELECT expensive_function(TABLE) FROM table WHERE expensive_function(TABLE) IS NOT NULL; = SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation

Slide 15

Slide 15 text

SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table OFFSET 0) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation

Slide 16

Slide 16 text

WITH Queries Behavior

Slide 17

Slide 17 text

a style choice? SELECT * FROM (SELECT min(birth_date), position_id FROM (SELECT * FROM test.staff WHERE contract_type_id = 'UP') tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01') zagreguj ORDER BY 2, 1; = WITH tylko_up AS ( SELECT * FROM test.staff WHERE contract_type_id = 'UP' ), zagreguj AS ( SELECT min(birth_date), position_id FROM tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01' ) SELECT * FROM zagreguj ORDER BY 2, 1;

Slide 18

Slide 18 text

a style choice? SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1, 10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; = WITH price_list AS ( SELECT n i_name, n i_price FROM generate_series(1, 10000) n ) SELECT * FROM price_list WHERE i_price BETWEEN 10 AND 100;

Slide 19

Slide 19 text

a style choice? SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table) ss WHERE f IS NOT NULL; = WITH ss AS ( SELECT expensive_function(TABLE) AS f FROM table ) SELECT f FROM ss WHERE f IS NOT NULL;

Slide 20

Slide 20 text

a style choice?

Slide 21

Slide 21 text

o PostgreSQL’s CTEs are optimisation fences o They can be treated as „hints” for optimizer o Are always materialized o „A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile;” Craig Ringer o „CTEs are likewise a rather blunt tool that could cost more than they save.” Tom Lane WITH Queries Behavior

Slide 22

Slide 22 text

And Postgres 12 comes along...

Slide 23

Slide 23 text

MATERIALIZED or NOT MATERIALIZED And Postgres 12 comes along...

Slide 24

Slide 24 text

Repeated Calculations

Slide 25

Slide 25 text

SELECT (a + b) * c AS c1, (a + b) * d AS d1, (a + b) * e AS e1 FROM table; An example from the mailing list

Slide 26

Slide 26 text

Questions asked: o „is PostgreSQL "smart" enough to cache this (a+b) calculation”? o „Or perhaps I could use a CTE?” o „would I generally get better performance by using a subquery to convince PostgreSQL to cache the result?” An example from the mailing list

Slide 27

Slide 27 text

Tom’s Lane answers: o Postgres will not notice that there's duplicate subexpressions. o Subquery? Maybe… if: – the subexpression is expensive enough – the subquery will not be "flattened" into the upper query – But the OFFSET 0 hack could prevent optimizations that are far more important than avoiding duplicated calculations o CTEs are likewise a rather blunt tool that could cost more than they save. An example from the mailing list

Slide 28

Slide 28 text

SELECT x * c AS c1, x * d AS d1, x * e AS e1 FROM tab, LATERAL (SELECT a + b AS x OFFSET 0) ss; An example from the mailing list

Slide 29

Slide 29 text

Tom’s Lane answers: o This construction should avoid most of the optimization pitfalls, o except one: in this formulation, a+b will be evaluated for every "tab" row even if the particular use of the view does not demand any of the columns that use "x". An example from the mailing list

Slide 30

Slide 30 text

Repeated calculations and the functions

Slide 31

Slide 31 text

CREATE TABLE random_data AS SELECT s AS kol1, md5(random() :: TEXT) AS kol2, md5((random() / 2) :: TEXT) AS kol3, md5(random() :: TEXT) AS kol4, md5(random() :: TEXT) AS kol5, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol6, md5(random() :: TEXT) AS kol7, md5(random() :: TEXT) AS kol8, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol9, md5(random() :: TEXT) AS kol10, md5(random() :: TEXT) AS kol11, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol12, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol13, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol14, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol15, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol16, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol17, s AS kol18 FROM generate_series(1, 500000) s; Repeated calculations and the functions

Slide 32

Slide 32 text

CREATE OR REPLACE FUNCTION sql_function() RETURNS INTEGER LANGUAGE SQL AS $$ SELECT 123; $$ STABLE; Repeated calculations and the functions

Slide 33

Slide 33 text

CREATE OR REPLACE FUNCTION plpgsql_function() RETURNS INTEGER LANGUAGE 'plpgsql' AS $$ BEGIN RETURN 123; END; $$ STABLE; Repeated calculations and the functions

Slide 34

Slide 34 text

Repeated calculations and the functions

Slide 35

Slide 35 text

Repeated calculations and the functions

Slide 36

Slide 36 text

o https://www.isical.ac.in/~debapriyo/teaching/dbms/QueryProcessing.pdf o http://www.postgresql-archive.org/Query-optimization-with-repeated-calculations-td6001785.html o https://www.postgresql.org/message-id/flat/8569.1128439517%40sss.pgh.pa.us#[email protected] o https://www.2ndquadrant.com/en/blog/postgresql-ctes-are-optimization-fences/ o https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-allow-user-control-of-cte-materialization-and-change-the-de fault-behavior/ o https://paquier.xyz/postgresql-2/postgres-12-with-materialize/ References

Slide 37

Slide 37 text

Alicja Kucharczyk Thank You! Senior Solution Architect PostgreSQL Team Leader [email protected]