Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres Demystified

Postgres Demystified

Craig Kerstiens

September 24, 2012
Tweet

More Decks by Craig Kerstiens

Other Decks in Programming

Transcript

  1. Postgres Demystified
    Craig Kerstiens
    @craigkerstiens
    http://www.craigkerstiens.com
    https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

    View full-size slide

  2. Postgres Demystified

    View full-size slide

  3. Postgres Demystified

    View full-size slide

  4. Getting Setup
    Postgres.app

    View full-size slide

  5. Agenda
    Brief History
    Developing w/ Postgres
    Postgres Performance
    Querying

    View full-size slide

  6. Postgres History
    Postgres
    PostgresQL
    Post Ingress
    Around since 1989/1995
    Community Driven/Owned

    View full-size slide

  7. MVCC
    Each query sees transactions committed before it
    Locks for writing don’t conflict with reading

    View full-size slide

  8. Why Postgres

    View full-size slide

  9. Why Postgres
    “ its the emacs of databases”

    View full-size slide

  10. Developing w/ Postgres

    View full-size slide

  11. Basics
    psql is your friend

    View full-size slide

  12. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View full-size slide

  13. smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID
    Datatypes

    View full-size slide

  14. smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID
    Datatypes

    View full-size slide

  15. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View full-size slide

  16. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View full-size slide

  17. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View full-size slide

  18. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View full-size slide

  19. Datatypes
    INSERT INTO items
    VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());
    INSERT INTO items
    VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());

    View full-size slide

  20. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View full-size slide

  21. Datatypes wish list
    email url
    phone
    zip

    View full-size slide

  22. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View full-size slide

  23. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View full-size slide

  24. CREATE EXTENSION hstore;
    CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255),
    data hstore,
    created_at timestamp without time zone,
    last_login timestamp without time zone
    );
    NoSQL in your SQL

    View full-size slide

  25. INSERT INTO users
    VALUES (
    1,
    '[email protected]',
    'sex => "M", state => “California”',
    now(),
    now()
    );
    hStore

    View full-size slide

  26. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2

    View full-size slide

  27. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2

    View full-size slide

  28. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2
    Bad Idea

    View full-size slide

  29. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    create or replace function
    js(src text) returns text as $$
    return eval(
    "(function() { " + src + "})"
    )();
    $$ LANGUAGE plv8;
    V8 w/ PLV8
    9.2
    Bad Idea

    View full-size slide

  30. Range Types
    9.2

    View full-size slide

  31. CREATE TABLE talks (room int, during tsrange);
    INSERT INTO talks VALUES
    (3, '[2012-09-24 13:00, 2012-09-24 13:50)');
    Range Types
    9.2

    View full-size slide

  32. CREATE TABLE talks (room int, during tsrange);
    INSERT INTO talks VALUES
    (3, '[2012-09-24 13:00, 2012-09-24 13:50)');
    Range Types
    9.2
    ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);
    INSERT INTO talks VALUES
    (1108, '[2012-09-24 13:30, 2012-09-24 14:00)');
    ERROR: conflicting key value violates exclusion constraint
    "talks_during_excl"

    View full-size slide

  33. Full Text Search

    View full-size slide

  34. Full Text Search
    TSVECTOR - Text Data
    TSQUERY - Search Predicates
    Specialized Indexes and Operators

    View full-size slide

  35. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View full-size slide

  36. 1. New datatypes i.e. (2d/3d boxes)
    PostGIS

    View full-size slide

  37. 1. New datatypes i.e. (2d/3d boxes)
    i.e. SELECT foo && bar ...
    2. New operators
    PostGIS

    View full-size slide

  38. 1. New datatypes i.e. (2d/3d boxes)
    i.e. SELECT foo && bar ...
    i.e. person within location, nearest distance
    2. New operators
    3. Understand relationships and distance
    PostGIS

    View full-size slide

  39. Sequential Scans

    View full-size slide

  40. Sequential Scans
    They’re Bad

    View full-size slide

  41. Sequential Scans
    They’re Bad (most of the time)

    View full-size slide

  42. Indexes
    They’re Good

    View full-size slide

  43. Indexes
    They’re Good (most of the time)

    View full-size slide

  44. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)

    View full-size slide

  45. Indexes
    B-Tree
    Default
    Usually want this

    View full-size slide

  46. Indexes
    Generalized Inverted Index (GIN)
    Use with multiple values in 1 column
    Array/hStore

    View full-size slide

  47. Indexes
    Generalized Search Tree (GIST)
    Full text search
    Shapes

    View full-size slide

  48. Understanding Query Perf
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    Given

    View full-size slide

  49. Explain
    # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View full-size slide

  50. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Explain

    View full-size slide

  51. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Max Time
    Explain

    View full-size slide

  52. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Max Time
    Rows Returned
    Explain

    View full-size slide

  53. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Explain

    View full-size slide

  54. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost
    Explain

    View full-size slide

  55. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Explain

    View full-size slide

  56. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Rows Returned
    Explain

    View full-size slide

  57. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Rows Returned
    Explain

    View full-size slide

  58. # CREATE INDEX idx_emps ON employees (salary);

    View full-size slide

  59. # CREATE INDEX idx_emps ON employees (salary);
    # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
    width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View full-size slide

  60. # CREATE INDEX idx_emps ON employees (salary);
    # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
    width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View full-size slide

  61. Indexes Pro Tips

    View full-size slide

  62. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY

    View full-size slide

  63. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar

    View full-size slide

  64. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar
    SELECT * WHERE foo LIKE ‘%bar% is BAD

    View full-size slide

  65. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar
    SELECT * WHERE foo LIKE ‘%bar% is BAD
    SELECT * WHERE Food LIKE ‘bar%’ is OKAY

    View full-size slide

  66. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View full-size slide

  67. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;

    View full-size slide

  68. Cache Hit Rate
    SELECT
    relname::text,
    heap_blks_read + heap_blks_hit as reads,
    round(100 * heap_blks_hit / (heap_blks_hit + heap_blks_read)) as
    hit_pct,
    round(100 * idx_blks_hit / (idx_blks_hit + idx_blks_read)) as idx_hit_pct
    FROM pg_statio_user_tables
    WHERE heap_blks_hit + heap_blks_read + idx_blks_hit + idx_blks_read
    > 0
    ORDER BY 2 DESC;

    View full-size slide

  69. pg_stats_statements
    9.2

    View full-size slide

  70. pg_stats_statements
    $ select * from pg_stat_statements where query ~ 'from users where email';
    userid ᴹ 16384
    dbid ᴹ 16388
    query ᴹ select * from users where email = ?;
    calls ᴹ 2
    total_time ᴹ 0.000268
    rows ᴹ 2
    shared_blks_hit ᴹ 16
    shared_blks_read ᴹ 0
    shared_blks_dirtied ᴹ 0
    shared_blks_written ᴹ 0
    local_blks_hit ᴹ 0
    local_blks_read ᴹ 0
    local_blks_dirtied ᴹ 0
    local_blks_written ᴹ 0
    temp_blks_read ᴹ 0
    temp_blks_written ᴹ 0
    time_read ᴹ 0
    time_write ᴹ 0
    9.2

    View full-size slide

  71. pg_stats_statements
    9.2

    View full-size slide

  72. pg_stats_statements
    SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
    nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
    FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
    ----------------------------------------------------------------------
    query | UPDATE pgbench_branches SET bbalance = bbalance + ?
    WHERE bid = ?;
    calls | 3000
    total_time | 9609.00100000002
    rows | 2836
    hit_percent | 99.9778970000200936
    9.2

    View full-size slide

  73. Window Functions
    Example:
    Biggest spender by state

    View full-size slide

  74. SELECT
    email,
    users.data->'state',
    sum(total(items)),
    rank() OVER
    (PARTITION BY users.data->'state'
    ORDER BY sum(total(items)) desc)
    FROM
    users, purchases
    WHERE purchases.user_id = users.id
    GROUP BY 1, 2;
    Window Functions

    View full-size slide

  75. SELECT
    email,
    users.data->'state',
    sum(total(items)),
    rank() OVER
    (PARTITION BY users.data->'state'
    ORDER BY sum(total(items)) desc)
    FROM
    users, purchases
    WHERE purchases.user_id = users.id
    GROUP BY 1, 2;
    Window Functions

    View full-size slide

  76. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View full-size slide

  77. Fuzzystrmatch

    View full-size slide

  78. Fuzzystrmatch
    SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

    View full-size slide

  79. Fuzzystrmatch
    SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');
    SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');
    SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');

    View full-size slide

  80. Moving Data Around
    \copy (SELECT * FROM users) TO ‘~/users.csv’;
    \copy users FROM ‘~/users.csv’;

    View full-size slide

  81. db_link
    SELECT dblink_connect('myconn', 'dbname=postgres');
    SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);
    a | b
    -------+------------
    1 | example
    2 | example2

    View full-size slide

  82. Foreign Data Wrappers
    oracle mysql
    informix
    twitter
    files
    www
    couch
    sybase
    ldap
    odbc
    s3
    redis jdbc

    View full-size slide

  83. CREATE EXTENSION redis_fdw;
    CREATE SERVER redis_server

    FOREIGN DATA WRAPPER redis_fdw

    OPTIONS (address '127.0.0.1', port '6379');
    CREATE FOREIGN TABLE redis_db0 (key text, value text)

    SERVER redis_server

    OPTIONS (database '0');
    CREATE USER MAPPING FOR PUBLIC

    SERVER redis_server

    OPTIONS (password 'secret');
    Foreign Data Wrappers

    View full-size slide

  84. SELECT
    id,
    email,
    value as visits
    FROM
    users,
    redis_db0
    WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text)
    AND cast(value as int) > 10;
    Query Redis from Postgres
    SELECT *
    FROM redis_db0;

    View full-size slide

  85. Readability
    WITH top_5_products AS (
    SELECT products.*, count(*)
    FROM products, line_items
    WHERE products.id = line_items.product_id
    GROUP BY products.id
    ORDER BY count(*) DESC
    LIMIT 5
    )
    SELECT users.email, count(*)
    FROM users, line_items, top_5_products
    WHERE line_items.user_id = users.id
    AND line_items.product_id = top_5_products.id
    GROUP BY 1
    ORDER BY 1;

    View full-size slide

  86. Common Table Expressions
    WITH top_5_products AS (
    SELECT products.*, count(*)
    FROM products, line_items
    WHERE products.id = line_items.product_id
    GROUP BY products.id
    ORDER BY count(*) DESC
    LIMIT 5
    )
    SELECT users.email, count(*)
    FROM users, line_items, top_5_products
    WHERE line_items.user_id = users.id
    AND line_items.product_id = top_5_products.id
    GROUP BY 1
    ORDER BY 1;

    View full-size slide

  87. Brief History
    Developing w/ Postgres
    Postgres Performance
    Querying

    View full-size slide

  88. Extras
    Listen/Notify

    View full-size slide

  89. Extras
    Listen/Notify
    Per Transaction Synchronous Replication

    View full-size slide

  90. Extras
    Listen/Notify
    Per Transaction Synchronous Replication
    Drop index concurrently

    View full-size slide

  91. Postgres - TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum

    View full-size slide