Confuse a cat - the Postgres way: The main concepts of advanced query processing and optimization in examples

Alicja Kucharczyk Senior Solution Architect PostgreSQL Team Leader Confuse a
cat - the Postgres way. The main concepts of advanced query processing and its optimization in examples Prague, 2019-04-29

o Query processing concepts: – pipelining vs. materialization – flattening
– inlining – optimization fence – double evaluation o WITH queries behavior o Repeated calculations o Repeated calculations and the functions Overview

Query Processing Concepts

SELECT n i_name, n i_price FROM generate_series(1, 10000) n; Pipelining
vs. materialization

SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1,
10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization

o Pipelining Stream the output as it is produced, without
waiting to the end of parent operations. o Materialization Use intermediate results materialized into temporary relations to evaluate next-level operations. The entire output have to be computed. Pipelining vs. materialization

SELECT * FROM (SELECT n i_name, n i_price FROM generate_series(1,
10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; Pipelining vs. materialization

o Flattening Merging sub-queries into upper queries, for example as
a JOIN. Flattening

Flattening

SELECT expensive_function(TABLE) FROM table WHERE expensive_function(TABLE) IS NOT NULL; =
SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation

SELECT f FROM (SELECT expensive_function(TABLE) AS f FROM table OFFSET
0) ss WHERE f IS NOT NULL; Flattening - avoiding double evaluation

WITH Queries Behavior

a style choice? SELECT * FROM (SELECT min(birth_date), position_id FROM
(SELECT * FROM test.staff WHERE contract_type_id = 'UP') tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01') zagreguj ORDER BY 2, 1; = WITH tylko_up AS ( SELECT * FROM test.staff WHERE contract_type_id = 'UP' ), zagreguj AS ( SELECT min(birth_date), position_id FROM tylko_up GROUP BY position_id HAVING min(birth_date) < '2000-01-01' ) SELECT * FROM zagreguj ORDER BY 2, 1;

a style choice? SELECT * FROM (SELECT n i_name, n
i_price FROM generate_series(1, 10000) n) AS price_list WHERE i_price BETWEEN 10 AND 100; = WITH price_list AS ( SELECT n i_name, n i_price FROM generate_series(1, 10000) n ) SELECT * FROM price_list WHERE i_price BETWEEN 10 AND 100;

a style choice? SELECT f FROM (SELECT expensive_function(TABLE) AS f
FROM table) ss WHERE f IS NOT NULL; = WITH ss AS ( SELECT expensive_function(TABLE) AS f FROM table ) SELECT f FROM ss WHERE f IS NOT NULL;

a style choice?

o PostgreSQL’s CTEs are optimisation fences o They can be
treated as „hints” for optimizer o Are always materialized o „A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile;” Craig Ringer o „CTEs are likewise a rather blunt tool that could cost more than they save.” Tom Lane WITH Queries Behavior

And Postgres 12 comes along...

MATERIALIZED or NOT MATERIALIZED And Postgres 12 comes along...

Repeated Calculations

SELECT (a + b) * c AS c1, (a +
b) * d AS d1, (a + b) * e AS e1 FROM table; An example from the mailing list

Questions asked: o „is PostgreSQL "smart" enough to cache this
(a+b) calculation”? o „Or perhaps I could use a CTE?” o „would I generally get better performance by using a subquery to convince PostgreSQL to cache the result?” An example from the mailing list

Tom’s Lane answers: o Postgres will not notice that there's
duplicate subexpressions. o Subquery? Maybe… if: – the subexpression is expensive enough – the subquery will not be "flattened" into the upper query – But the OFFSET 0 hack could prevent optimizations that are far more important than avoiding duplicated calculations o CTEs are likewise a rather blunt tool that could cost more than they save. An example from the mailing list

SELECT x * c AS c1, x * d AS
d1, x * e AS e1 FROM tab, LATERAL (SELECT a + b AS x OFFSET 0) ss; An example from the mailing list

Tom’s Lane answers: o This construction should avoid most of
the optimization pitfalls, o except one: in this formulation, a+b will be evaluated for every "tab" row even if the particular use of the view does not demand any of the columns that use "x". An example from the mailing list

Repeated calculations and the functions

CREATE TABLE random_data AS SELECT s AS kol1, md5(random() ::
TEXT) AS kol2, md5((random() / 2) :: TEXT) AS kol3, md5(random() :: TEXT) AS kol4, md5(random() :: TEXT) AS kol5, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol6, md5(random() :: TEXT) AS kol7, md5(random() :: TEXT) AS kol8, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol9, md5(random() :: TEXT) AS kol10, md5(random() :: TEXT) AS kol11, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol12, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol13, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol14, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol15, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol16, md5(random() :: TEXT) || md5(random() :: TEXT) AS kol17, s AS kol18 FROM generate_series(1, 500000) s; Repeated calculations and the functions

CREATE OR REPLACE FUNCTION sql_function() RETURNS INTEGER LANGUAGE SQL AS
$$ SELECT 123; $$ STABLE; Repeated calculations and the functions

CREATE OR REPLACE FUNCTION plpgsql_function() RETURNS INTEGER LANGUAGE 'plpgsql' AS
$$ BEGIN RETURN 123; END; $$ STABLE; Repeated calculations and the functions

Repeated calculations and the functions

o https://www.isical.ac.in/~debapriyo/teaching/dbms/QueryProcessing.pdf o http://www.postgresql-archive.org/Query-optimization-with-repeated-calculations-td6001785.html o https://www.postgresql.org/message-id/flat/8569.1128439517%40sss.pgh.pa.us#[email protected] o https://www.2ndquadrant.com/en/blog/postgresql-ctes-are-optimization-fences/ o https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-allow-user-control-of-cte-materialization-and-change-the-de
fault-behavior/ o https://paquier.xyz/postgresql-2/postgres-12-with-materialize/ References

Alicja Kucharczyk Thank You! Senior Solution Architect PostgreSQL Team Leader
[email protected]

Confuse a cat - the Postgres way: The main conc...

Confuse a cat - the Postgres way: The main concepts of advanced query processing and optimization in examples

More Decks by AwdotiaRomanowna

Other Decks in Technology

Featured

Transcript