DB for your next project? Why, Postgres, of course

PostgreSQL – Tomasz Borek Database for next project? Why, PostgreSQL
of course! @LAFK_pl Consultant @

About me @LAFK_pl Consultant @ Tomasz Borek

What will I tell you? • Colourful history of PostgreSQL
– So, DB wars • Chosen features • Architecture and internals • Query path and optimization (no hinting) • Multithreading (very briefly, too little time)

Colourful history

History

In-/Postgres forks

Support?

Chosen features

My Faves • Error reporting / logging • PL/xSQL –
feel free to use Perl, Python, Ruby, Java, LISP... • XML and JSON handling • Foreign Data Wrappers (FDW) • Windowing functions • Common table expressions (CTE) and recursive queries • Power of Indexes

Will DB eat your cake? • Thanks @anandology

The cake is a lie!

Will DB eat your cake? • Thanks @anandology

Will DB eat your cake? • Thanks @anandology Consider password
VARCHAR(8)

Logging, ‘gotchas’ • Default is to stderr only • Set
on CLI or in config, not through sets • Where is it? • How to log queries… or turning log_collector on

Where is it? • Default – data/pg_log • Launchers can
set it (Mac Homebrew/plist) • Version and config dependent

Ask DB

Logging, turn it on • Default is to stderr only
• In PG: logging_collector = on log_filename = strftime-patterned filename [log_destination = [stderr|syslog|csvlog] ] log_statement = [none|ddl|mod|all] / / all log_min_error_statement = ERROR log_line_prefix = '%t %c %u ' # time sessionid user

Log line prefix

PL/pgSQL • Stored procedure dilemma – Where to keep your
logic? – How your logic is NOT in your SCM

logic? – How your logic is NOT in your SCM • Over dozen of options: – Perl, Python, Ruby, – pgSQL, Java, – TCL, LISP…

logic? – How your logic is NOT in your SCM • Over dozen of options: – Perl, Python, Ruby, – pgSQL, Java, – TCL, LISP… • DevOps, SysAdmins, DBAs… ETLs etc.

Perl function example CREATE FUNCTION perl_max (integer, integer) RETURNS integer
AS $$ my ($x, $y) = @_; if (not defined $x) { return undef if not defined $y; return $y; } return $x if not defined $y; return $x if $x > $y; return $y; $$ LANGUAGE plperl;

XML or JSON support • Parsing and retrieving XML (functions)
• Valid JSON checks (type) • Careful with encoding! – PG allows only one server encoding per database – Specify it to UTF-8 or weep • Document database instead of OO or rel – JSON, JSONB, HSTORE – noSQL fun welcome!

HSTORE? CREATE TABLE example ( id serial PRIMARY KEY, data
hstore);

hstore); INSERT INTO example (data) VALUES ('name => "John Smith", age => 28, gender => "M"'), ('name => "Jane Smith", age => 24');

hstore); INSERT INTO example (data) VALUES ('name => "John Smith", age => 28, gender => "M"'), ('name => "Jane Smith", age => 24'); SELECT id, data->'name' FROM example; SELECT id, data->'age' FROM example WHERE data->'age' >= '25';

XML and JSON datatype CREATE TABLE test ( ..., xml_file
xml, json_file json, ... );

XML functions example XMLROOT ( XMLELEMENT ( NAME gazonk, XMLATTRIBUTES
( ’val’ AS name, 1 + 1 AS num ), XMLELEMENT ( NAME qux, ’foo’ ) ), VERSION ’1.0’, STANDALONE YES ) <?xml version=’1.0’ standalone=’yes’ ?> <gazonk name=’val’ num=’2’> <qux>foo</qux> </gazonk> xml '<foo>bar</foo>' '<foo>bar</foo>'::xml

Foreign Data Wrappers (FDW) • Stop ETL, start FDW •
Read AND write • FS, Mongo, Hadoop, Redis… • You can write your own!

FDW vs ETL?

Windowing functions • Replacement for procedures (somewhat) • In a
nutshell: – Take row, – find related rows, – compute things over related rows, – return result along with the row • Ranking, averaging, growth per time... http://www.craigkerstiens.com/2014/02/26/Tracking-MoM-growth-in-SQL/ https://www.postgresql.org/docs/9.1/static/tutorial-window.html

CTEs and recursive queries • Common table expressions (CTE) and
recursive queries

Index power • Geo and spherical indexes • Partial indexes
(email like @company.com) • Function indexes • JSON(B) has index support • You may create your own index

Architecture and internals

Check out processes • pgrep -l postgres • htop >
filter: postgres • Whatever you like / use usually • Careful with kill -9 on connections – kill -15 better

Regions

Query path and optimization (no hinting)

Query Path http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals

Parser • Syntax checks, like FRIM is not a keyword
– SELECT * FRIM myTable; • Catalog lookup – MyTable may not exist • In the end query tree is built – Query tokenization: SELECT (keyword) employeeName (field id) count (function call)...

Grammar and a query tree

Planner • Where Planner Tree is built • Where best
execution is decided upon – Seq or index scan? Index or bitmap index? – Which join order? – Which join strategy (nested, hashed, merge)? – Inner or outer? – Aggregation: plain, hashed, sorted… • Heuristic, if finding all plans too costly

Full query path

Example to explain EXPLAIN EXPLAIN SELECT * FROM tenk1; QUERY
PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)

Explaining EXPLAIN - what EXPLAIN SELECT * FROM tenk1; QUERY
PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) • Startup cost – time before output phase begins • Total cost – in page fetches, may change, assumed to run node to completion • Rows – estimated number to scan (but LIMIT etc.) • Estimated average width of output from that node (in bytes)

Explaining EXPLAIN - how EXPLAIN SELECT * FROM tenk1; QUERY
PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; / /358|10k • No WHERE, no index • Cost = disk pages read * seq page cost + rows scanned * cpu tuple cost • 358 * 1.0 + 10000 * 0.01 = 458 // default values

Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1,
tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms • Actually runs the query • More info: actual times, rows removed by filter, sort method used, disk/memory used...

Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1,
tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms

Multithreading and PostgreSQL

Summary

Battle-tested • Could mature since 1987 • Comes in many
flavours (forks) • Largest cluster – PBs in Yahoo • Skype, NASA, Instagram • Stable: – Many years on one version – Good version support – Every year something new – Follows ANSI SQL standards https://www.postgresql.org/about/users/

Great features • Java, Perl, Python for stored procedures •
Add CTEs and FDWs => great ETL or µservice • Handles XMLs and JSONs • Error reporting / logging • MVCC built-in • Windowing functions • ...

Solid internals • Well-thought out processes • Built-in security (dozen
of solutions) • WAL, stats collector, vacuum • Good rule engine and clear query optimization – No hinting will bother some people • Plethora of data types

Disadvantages • More like Python then Perl/PHP • Some learning
curve • Some say: – replication(‘s performance) • I can’t think of more, doesn’t mean none are present :-)

PostgreSQL – Tomasz Borek Database for next project? Why, PostgreSQL
of course! @LAFK_pl Consultant @

DB for your next project? Why, Postgres, of course

DB for your next project? Why, Postgres, of course

More Decks by Tomasz Borek

Other Decks in Technology

Featured

Transcript