$30 off During Our Annual Pro Sale. View Details »

DB for your next project? Why, Postgres, of course

DB for your next project? Why, Postgres, of course

This talk highlights nice sides of Postgres which IMO make it an excellent choice for your next project - aside from money. I'm talking about NoSQL support, datatypes, logging, error handling, procedures in languages devs know as well as about Postgres internals: it's query path, architecture and processes.

Tomasz Borek

October 11, 2016
Tweet

More Decks by Tomasz Borek

Other Decks in Technology

Transcript

  1. PostgreSQL – Tomasz Borek
    Database for next project?
    Why, PostgreSQL of course!
    @LAFK_pl
    Consultant @

    View Slide

  2. About me
    @LAFK_pl
    Consultant @
    Tomasz Borek

    View Slide

  3. View Slide

  4. What will I tell you?

    Colourful history of PostgreSQL
    – So, DB wars

    Chosen features

    Architecture and internals

    Query path and optimization (no hinting)

    Multithreading (very briefly, too little time)

    View Slide

  5. Colourful history

    View Slide

  6. History

    View Slide

  7. In-/Postgres forks

    View Slide

  8. View Slide

  9. Support?

    View Slide

  10. Chosen features

    View Slide

  11. My Faves

    Error reporting / logging

    PL/xSQL – feel free to use Perl, Python, Ruby, Java,
    LISP...

    XML and JSON handling

    Foreign Data Wrappers (FDW)

    Windowing functions

    Common table expressions (CTE) and recursive queries

    Power of Indexes

    View Slide

  12. Will DB eat your cake?

    Thanks @anandology

    View Slide

  13. Will DB eat your cake?

    Thanks @anandology

    View Slide

  14. Will DB eat your cake?

    Thanks @anandology

    View Slide

  15. The cake is a lie!

    View Slide

  16. Will DB eat your cake?

    Thanks @anandology

    View Slide

  17. Will DB eat your cake?

    Thanks @anandology

    View Slide

  18. Will DB eat your cake?

    Thanks @anandology
    Consider password VARCHAR(8)

    View Slide

  19. Logging, ‘gotchas’

    Default is to stderr only

    Set on CLI or in config, not through sets

    Where is it?

    How to log queries… or turning log_collector on

    View Slide

  20. Where is it?

    Default
    – data/pg_log

    Launchers can set it (Mac Homebrew/plist)

    Version and config dependent

    View Slide

  21. Ask DB

    View Slide

  22. Logging, turn it on

    Default is to stderr only

    In PG:
    logging_collector = on
    log_filename = strftime-patterned filename
    [log_destination = [stderr|syslog|csvlog] ]
    log_statement = [none|ddl|mod|all] /
    / all
    log_min_error_statement = ERROR
    log_line_prefix = '%t %c %u ' # time sessionid user

    View Slide

  23. Log line prefix

    View Slide

  24. PL/pgSQL

    Stored procedure dilemma
    – Where to keep your logic?
    – How your logic is NOT in your SCM

    View Slide

  25. PL/pgSQL

    Stored procedure dilemma
    – Where to keep your logic?
    – How your logic is NOT in your SCM

    Over dozen of options:
    – Perl, Python, Ruby,
    – pgSQL, Java,
    – TCL, LISP…

    View Slide

  26. PL/pgSQL

    Stored procedure dilemma
    – Where to keep your logic?
    – How your logic is NOT in your SCM

    Over dozen of options:
    – Perl, Python, Ruby,
    – pgSQL, Java,
    – TCL, LISP…

    DevOps, SysAdmins, DBAs… ETLs etc.

    View Slide

  27. PL/pgSQL

    Stored procedure dilemma
    – Where to keep your logic?
    – How your logic is NOT in your SCM

    Over dozen of options:
    – Perl, Python, Ruby,
    – pgSQL, Java,
    – TCL, LISP…

    DevOps, SysAdmins, DBAs… ETLs etc.

    View Slide

  28. Perl function example
    CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
    my ($x, $y) = @_;
    if (not defined $x) {
    return undef if not defined $y;
    return $y;
    }
    return $x if not defined $y;
    return $x if $x > $y;
    return $y;
    $$ LANGUAGE plperl;

    View Slide

  29. XML or JSON support

    Parsing and retrieving XML (functions)

    Valid JSON checks (type)

    Careful with encoding!
    – PG allows only one server encoding per database
    – Specify it to UTF-8 or weep

    Document database instead of OO or rel
    – JSON, JSONB, HSTORE – noSQL fun welcome!

    View Slide

  30. HSTORE?
    CREATE TABLE example (
    id serial PRIMARY KEY,
    data hstore);

    View Slide

  31. HSTORE?
    CREATE TABLE example (
    id serial PRIMARY KEY,
    data hstore);
    INSERT INTO example (data) VALUES
    ('name => "John Smith", age => 28, gender => "M"'),
    ('name => "Jane Smith", age => 24');

    View Slide

  32. HSTORE?
    CREATE TABLE example (
    id serial PRIMARY KEY,
    data hstore);
    INSERT INTO example (data)
    VALUES
    ('name => "John Smith", age => 28,
    gender => "M"'),
    ('name => "Jane Smith", age => 24');
    SELECT id,
    data->'name'
    FROM example;
    SELECT id, data->'age'
    FROM example
    WHERE data->'age' >=
    '25';

    View Slide

  33. XML and JSON datatype
    CREATE TABLE test (
    ...,
    xml_file xml,
    json_file json,
    ...
    );

    View Slide

  34. XML functions example
    XMLROOT (
    XMLELEMENT (
    NAME gazonk,
    XMLATTRIBUTES (
    ’val’ AS name,
    1 + 1 AS num
    ),
    XMLELEMENT (
    NAME qux,
    ’foo’
    )
    ),
    VERSION ’1.0’,
    STANDALONE YES
    )
    standalone=’yes’ ?>
    num=’2’>
    foo

    xml 'bar'
    'bar'::xml

    View Slide

  35. Foreign Data Wrappers (FDW)

    Stop ETL, start FDW

    Read AND write

    FS, Mongo, Hadoop, Redis…

    You can write your own!

    View Slide

  36. FDW vs ETL?

    View Slide

  37. Windowing functions

    Replacement for procedures (somewhat)

    In a nutshell:
    – Take row,
    – find related rows,
    – compute things over related rows,
    – return result along with the row

    Ranking, averaging, growth per time...
    http://www.craigkerstiens.com/2014/02/26/Tracking-MoM-growth-in-SQL/
    https://www.postgresql.org/docs/9.1/static/tutorial-window.html

    View Slide

  38. CTEs and recursive queries

    Common table expressions (CTE) and
    recursive queries

    View Slide

  39. Index power

    Geo and spherical indexes

    Partial indexes (email like @company.com)

    Function indexes

    JSON(B) has index support

    You may create your own index

    View Slide

  40. Architecture and internals

    View Slide

  41. View Slide

  42. View Slide

  43. Check out processes

    pgrep -l postgres

    htop > filter: postgres

    Whatever you like / use usually

    Careful with kill -9 on connections
    – kill -15 better

    View Slide

  44. Regions

    View Slide

  45. View Slide

  46. View Slide

  47. Query path and optimization (no hinting)

    View Slide

  48. Query Path
    http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals

    View Slide

  49. Parser

    Syntax checks, like FRIM is not a keyword
    – SELECT * FRIM myTable;

    Catalog lookup
    – MyTable may not exist

    In the end query tree is built
    – Query tokenization: SELECT (keyword)
    employeeName (field id) count (function call)...

    View Slide

  50. Grammar and a query tree

    View Slide

  51. Planner

    Where Planner Tree is built

    Where best execution is decided upon
    – Seq or index scan? Index or bitmap index?
    – Which join order?
    – Which join strategy (nested, hashed, merge)?
    – Inner or outer?
    – Aggregation: plain, hashed, sorted…

    Heuristic, if finding all plans too costly

    View Slide

  52. Full query path

    View Slide

  53. Example to explain EXPLAIN
    EXPLAIN SELECT * FROM tenk1;
    QUERY PLAN
    ------------------------------------------------------------
    Seq Scan on tenk1 (cost=0.00..458.00
    rows=10000 width=244)

    View Slide

  54. Explaining EXPLAIN - what
    EXPLAIN SELECT * FROM tenk1;
    QUERY PLAN
    ------------------------------------------------------------
    Seq Scan on tenk1 (cost=0.00..458.00 rows=10000
    width=244)

    Startup cost – time before output phase begins

    Total cost – in page fetches, may change, assumed to
    run node to completion

    Rows – estimated number to scan (but LIMIT etc.)

    Estimated average width of output from that node (in
    bytes)

    View Slide

  55. Explaining EXPLAIN - how
    EXPLAIN SELECT * FROM tenk1;
    QUERY PLAN
    ------------------------------------------------------------
    Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
    SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; /
    /358|10k

    No WHERE, no index

    Cost = disk pages read * seq page cost + rows scanned
    * cpu tuple cost

    358 * 1.0 + 10000 * 0.01 = 458 // default values

    View Slide

  56. Analyzing EXPLAIN ANALYZE
    EXPLAIN ANALYZE SELECT *
    FROM tenk1 t1, tenk2 t2
    WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------
    Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
    -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
    Recheck Cond: (unique1 < 10)
    -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
    Index Cond: (unique1 < 10)
    -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
    Index Cond: (unique2 = t1.unique2)
    Planning time: 0.181 ms
    Execution time: 0.501 ms

    Actually runs the query

    More info: actual times, rows removed by filter,
    sort method used, disk/memory used...

    View Slide

  57. Analyzing EXPLAIN ANALYZE
    EXPLAIN ANALYZE SELECT *
    FROM tenk1 t1, tenk2 t2
    WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------
    Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
    -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
    loops=1)
    Recheck Cond: (unique1 < 10)
    -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
    rows=10 loops=1)
    Index Cond: (unique1 < 10)
    -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
    rows=1 loops=10)
    Index Cond: (unique2 = t1.unique2)
    Planning time: 0.181 ms
    Execution time: 0.501 ms

    View Slide

  58. Analyzing EXPLAIN ANALYZE
    EXPLAIN ANALYZE SELECT *
    FROM tenk1 t1, tenk2 t2
    WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------
    Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
    -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
    loops=1)
    Recheck Cond: (unique1 < 10)
    -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
    rows=10 loops=1)
    Index Cond: (unique1 < 10)
    -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
    rows=1 loops=10)
    Index Cond: (unique2 = t1.unique2)
    Planning time: 0.181 ms
    Execution time: 0.501 ms

    View Slide

  59. Analyzing EXPLAIN ANALYZE
    EXPLAIN ANALYZE SELECT *
    FROM tenk1 t1, tenk2 t2
    WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------
    Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
    -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
    loops=1)
    Recheck Cond: (unique1 < 10)
    -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
    rows=10 loops=1)
    Index Cond: (unique1 < 10)
    -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
    rows=1 loops=10)
    Index Cond: (unique2 = t1.unique2)
    Planning time: 0.181 ms
    Execution time: 0.501 ms

    View Slide

  60. Multithreading and PostgreSQL

    View Slide

  61. Summary

    View Slide

  62. Battle-tested

    Could mature since 1987

    Comes in many flavours (forks)

    Largest cluster – PBs in Yahoo

    Skype, NASA, Instagram

    Stable:
    – Many years on one version
    – Good version support
    – Every year something new
    – Follows ANSI SQL standards
    https://www.postgresql.org/about/users/

    View Slide

  63. Great features

    Java, Perl, Python for stored procedures

    Add CTEs and FDWs => great ETL or µservice

    Handles XMLs and JSONs

    Error reporting / logging

    MVCC built-in

    Windowing functions

    ...

    View Slide

  64. Solid internals

    Well-thought out processes

    Built-in security (dozen of solutions)

    WAL, stats collector, vacuum

    Good rule engine and clear query optimization
    – No hinting will bother some people

    Plethora of data types

    View Slide

  65. Disadvantages

    More like Python then Perl/PHP

    Some learning curve

    Some say:
    – replication(‘s performance)

    I can’t think of more, doesn’t mean none are
    present :-)

    View Slide

  66. PostgreSQL – Tomasz Borek
    Database for next project?
    Why, PostgreSQL of course!
    @LAFK_pl
    Consultant @

    View Slide