PostgreSQL – Tomasz Borek
Database for next project?
Why, PostgreSQL of course!
@LAFK_pl
Consultant @
Slide 2
Slide 2 text
About me
@LAFK_pl
Consultant @
Tomasz Borek
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
What will I tell you?
●
Colourful history of PostgreSQL
– So, DB wars
●
Chosen features
●
Architecture and internals
●
Query path and optimization (no hinting)
●
Multithreading (very briefly, too little time)
Slide 5
Slide 5 text
Colourful history
Slide 6
Slide 6 text
History
Slide 7
Slide 7 text
In-/Postgres forks
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
Support?
Slide 10
Slide 10 text
Chosen features
Slide 11
Slide 11 text
My Faves
●
Error reporting / logging
●
PL/xSQL – feel free to use Perl, Python, Ruby, Java,
LISP...
●
XML and JSON handling
●
Foreign Data Wrappers (FDW)
●
Windowing functions
●
Common table expressions (CTE) and recursive queries
●
Power of Indexes
Slide 12
Slide 12 text
Will DB eat your cake?
●
Thanks @anandology
Slide 13
Slide 13 text
Will DB eat your cake?
●
Thanks @anandology
Slide 14
Slide 14 text
Will DB eat your cake?
●
Thanks @anandology
Slide 15
Slide 15 text
The cake is a lie!
Slide 16
Slide 16 text
Will DB eat your cake?
●
Thanks @anandology
Slide 17
Slide 17 text
Will DB eat your cake?
●
Thanks @anandology
Slide 18
Slide 18 text
Will DB eat your cake?
●
Thanks @anandology
Consider password VARCHAR(8)
Slide 19
Slide 19 text
Logging, ‘gotchas’
●
Default is to stderr only
●
Set on CLI or in config, not through sets
●
Where is it?
●
How to log queries… or turning log_collector on
Slide 20
Slide 20 text
Where is it?
●
Default
– data/pg_log
●
Launchers can set it (Mac Homebrew/plist)
●
Version and config dependent
Slide 21
Slide 21 text
Ask DB
Slide 22
Slide 22 text
Logging, turn it on
●
Default is to stderr only
●
In PG:
logging_collector = on
log_filename = strftime-patterned filename
[log_destination = [stderr|syslog|csvlog] ]
log_statement = [none|ddl|mod|all] /
/ all
log_min_error_statement = ERROR
log_line_prefix = '%t %c %u ' # time sessionid user
Slide 23
Slide 23 text
Log line prefix
Slide 24
Slide 24 text
PL/pgSQL
●
Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
Slide 25
Slide 25 text
PL/pgSQL
●
Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
●
Over dozen of options:
– Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
Slide 26
Slide 26 text
PL/pgSQL
●
Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
●
Over dozen of options:
– Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
●
DevOps, SysAdmins, DBAs… ETLs etc.
Slide 27
Slide 27 text
PL/pgSQL
●
Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
●
Over dozen of options:
– Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
●
DevOps, SysAdmins, DBAs… ETLs etc.
Slide 28
Slide 28 text
Perl function example
CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
my ($x, $y) = @_;
if (not defined $x) {
return undef if not defined $y;
return $y;
}
return $x if not defined $y;
return $x if $x > $y;
return $y;
$$ LANGUAGE plperl;
Slide 29
Slide 29 text
XML or JSON support
●
Parsing and retrieving XML (functions)
●
Valid JSON checks (type)
●
Careful with encoding!
– PG allows only one server encoding per database
– Specify it to UTF-8 or weep
●
Document database instead of OO or rel
– JSON, JSONB, HSTORE – noSQL fun welcome!
Slide 30
Slide 30 text
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
Slide 31
Slide 31 text
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data) VALUES
('name => "John Smith", age => 28, gender => "M"'),
('name => "Jane Smith", age => 24');
Slide 32
Slide 32 text
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data)
VALUES
('name => "John Smith", age => 28,
gender => "M"'),
('name => "Jane Smith", age => 24');
SELECT id,
data->'name'
FROM example;
SELECT id, data->'age'
FROM example
WHERE data->'age' >=
'25';
Slide 33
Slide 33 text
XML and JSON datatype
CREATE TABLE test (
...,
xml_file xml,
json_file json,
...
);
Slide 34
Slide 34 text
XML functions example
XMLROOT (
XMLELEMENT (
NAME gazonk,
XMLATTRIBUTES (
’val’ AS name,
1 + 1 AS num
),
XMLELEMENT (
NAME qux,
’foo’
)
),
VERSION ’1.0’,
STANDALONE YES
)
foo
xml 'bar'
'bar'::xml
Slide 35
Slide 35 text
Foreign Data Wrappers (FDW)
●
Stop ETL, start FDW
●
Read AND write
●
FS, Mongo, Hadoop, Redis…
●
You can write your own!
Slide 36
Slide 36 text
FDW vs ETL?
Slide 37
Slide 37 text
Windowing functions
●
Replacement for procedures (somewhat)
●
In a nutshell:
– Take row,
– find related rows,
– compute things over related rows,
– return result along with the row
●
Ranking, averaging, growth per time...
http://www.craigkerstiens.com/2014/02/26/Tracking-MoM-growth-in-SQL/
https://www.postgresql.org/docs/9.1/static/tutorial-window.html
Slide 38
Slide 38 text
CTEs and recursive queries
●
Common table expressions (CTE) and
recursive queries
Slide 39
Slide 39 text
Index power
●
Geo and spherical indexes
●
Partial indexes (email like @company.com)
●
Function indexes
●
JSON(B) has index support
●
You may create your own index
Slide 40
Slide 40 text
Architecture and internals
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
Check out processes
●
pgrep -l postgres
●
htop > filter: postgres
●
Whatever you like / use usually
●
Careful with kill -9 on connections
– kill -15 better
Parser
●
Syntax checks, like FRIM is not a keyword
– SELECT * FRIM myTable;
●
Catalog lookup
– MyTable may not exist
●
In the end query tree is built
– Query tokenization: SELECT (keyword)
employeeName (field id) count (function call)...
Slide 50
Slide 50 text
Grammar and a query tree
Slide 51
Slide 51 text
Planner
●
Where Planner Tree is built
●
Where best execution is decided upon
– Seq or index scan? Index or bitmap index?
– Which join order?
– Which join strategy (nested, hashed, merge)?
– Inner or outer?
– Aggregation: plain, hashed, sorted…
●
Heuristic, if finding all plans too costly
Slide 52
Slide 52 text
Full query path
Slide 53
Slide 53 text
Example to explain EXPLAIN
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00
rows=10000 width=244)
Slide 54
Slide 54 text
Explaining EXPLAIN - what
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000
width=244)
●
Startup cost – time before output phase begins
●
Total cost – in page fetches, may change, assumed to
run node to completion
●
Rows – estimated number to scan (but LIMIT etc.)
●
Estimated average width of output from that node (in
bytes)
Slide 55
Slide 55 text
Explaining EXPLAIN - how
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; /
/358|10k
●
No WHERE, no index
●
Cost = disk pages read * seq page cost + rows scanned
* cpu tuple cost
●
358 * 1.0 + 10000 * 0.01 = 458 // default values
Slide 56
Slide 56 text
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
●
Actually runs the query
●
More info: actual times, rows removed by filter,
sort method used, disk/memory used...
Slide 57
Slide 57 text
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
Slide 58
Slide 58 text
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
Slide 59
Slide 59 text
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
Slide 60
Slide 60 text
Multithreading and PostgreSQL
Slide 61
Slide 61 text
Summary
Slide 62
Slide 62 text
Battle-tested
●
Could mature since 1987
●
Comes in many flavours (forks)
●
Largest cluster – PBs in Yahoo
●
Skype, NASA, Instagram
●
Stable:
– Many years on one version
– Good version support
– Every year something new
– Follows ANSI SQL standards
https://www.postgresql.org/about/users/
Slide 63
Slide 63 text
Great features
●
Java, Perl, Python for stored procedures
●
Add CTEs and FDWs => great ETL or µservice
●
Handles XMLs and JSONs
●
Error reporting / logging
●
MVCC built-in
●
Windowing functions
●
...
Slide 64
Slide 64 text
Solid internals
●
Well-thought out processes
●
Built-in security (dozen of solutions)
●
WAL, stats collector, vacuum
●
Good rule engine and clear query optimization
– No hinting will bother some people
●
Plethora of data types
Slide 65
Slide 65 text
Disadvantages
●
More like Python then Perl/PHP
●
Some learning curve
●
Some say:
– replication(‘s performance)
●
I can’t think of more, doesn’t mean none are
present :-)
Slide 66
Slide 66 text
PostgreSQL – Tomasz Borek
Database for next project?
Why, PostgreSQL of course!
@LAFK_pl
Consultant @