Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres Demystified

Postgres Demystified

Craig Kerstiens

September 24, 2012
Tweet

More Decks by Craig Kerstiens

Other Decks in Programming

Transcript

  1. Postgres Demystified
    Craig Kerstiens
    @craigkerstiens
    http://www.craigkerstiens.com
    https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

    View Slide

  2. Postgres Demystified

    View Slide

  3. Postgres Demystified

    View Slide

  4. Getting Setup
    Postgres.app

    View Slide

  5. Agenda
    Brief History
    Developing w/ Postgres
    Postgres Performance
    Querying

    View Slide

  6. Postgres History
    Postgres
    PostgresQL
    Post Ingress
    Around since 1989/1995
    Community Driven/Owned

    View Slide

  7. MVCC
    Each query sees transactions committed before it
    Locks for writing don’t conflict with reading

    View Slide

  8. Why Postgres

    View Slide

  9. Why Postgres
    “ its the emacs of databases”

    View Slide

  10. Developing w/ Postgres

    View Slide

  11. Basics
    psql is your friend

    View Slide

  12. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View Slide

  13. smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID
    Datatypes

    View Slide

  14. smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID
    Datatypes

    View Slide

  15. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View Slide

  16. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View Slide

  17. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View Slide

  18. CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],

    created_at timestamp
    );
    Datatypes

    View Slide

  19. Datatypes
    INSERT INTO items
    VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());
    INSERT INTO items
    VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());

    View Slide

  20. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View Slide

  21. Datatypes wish list
    email url
    phone
    zip

    View Slide

  22. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View Slide

  23. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View Slide

  24. CREATE EXTENSION hstore;
    CREATE TABLE users (
    id integer NOT NULL,
    email character varying(255),
    data hstore,
    created_at timestamp without time zone,
    last_login timestamp without time zone
    );
    NoSQL in your SQL

    View Slide

  25. INSERT INTO users
    VALUES (
    1,
    '[email protected]',
    'sex => "M", state => “California”',
    now(),
    now()
    );
    hStore

    View Slide

  26. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2

    View Slide

  27. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2

    View Slide

  28. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    V8 w/ PLV8
    9.2
    Bad Idea

    View Slide

  29. SELECT
    '{"id":1,"email": "[email protected]",}'::json;
    JSON
    create or replace function
    js(src text) returns text as $$
    return eval(
    "(function() { " + src + "})"
    )();
    $$ LANGUAGE plv8;
    V8 w/ PLV8
    9.2
    Bad Idea

    View Slide

  30. Range Types
    9.2

    View Slide

  31. CREATE TABLE talks (room int, during tsrange);
    INSERT INTO talks VALUES
    (3, '[2012-09-24 13:00, 2012-09-24 13:50)');
    Range Types
    9.2

    View Slide

  32. CREATE TABLE talks (room int, during tsrange);
    INSERT INTO talks VALUES
    (3, '[2012-09-24 13:00, 2012-09-24 13:50)');
    Range Types
    9.2
    ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);
    INSERT INTO talks VALUES
    (1108, '[2012-09-24 13:30, 2012-09-24 14:00)');
    ERROR: conflicting key value violates exclusion constraint
    "talks_during_excl"

    View Slide

  33. Full Text Search

    View Slide

  34. Full Text Search
    TSVECTOR - Text Data
    TSQUERY - Search Predicates
    Specialized Indexes and Operators

    View Slide

  35. Datatypes
    smallint bigint integer
    numeric float
    serial money char
    varchar
    text
    bytea
    timestamp
    timestamptz date
    time
    timetz
    interval boolean
    enum
    point
    line
    polygon
    box
    circle
    path
    inet
    cidr
    macaddr
    tsvector
    tsquery
    array
    XML
    UUID

    View Slide

  36. PostGIS

    View Slide

  37. 1. New datatypes i.e. (2d/3d boxes)
    PostGIS

    View Slide

  38. 1. New datatypes i.e. (2d/3d boxes)
    i.e. SELECT foo && bar ...
    2. New operators
    PostGIS

    View Slide

  39. 1. New datatypes i.e. (2d/3d boxes)
    i.e. SELECT foo && bar ...
    i.e. person within location, nearest distance
    2. New operators
    3. Understand relationships and distance
    PostGIS

    View Slide

  40. View Slide

  41. Performance

    View Slide

  42. Sequential Scans

    View Slide

  43. Sequential Scans
    They’re Bad

    View Slide

  44. Sequential Scans
    They’re Bad (most of the time)

    View Slide

  45. Indexes

    View Slide

  46. Indexes
    They’re Good

    View Slide

  47. Indexes
    They’re Good (most of the time)

    View Slide

  48. Indexes
    B-Tree
    Generalized Inverted Index (GIN)
    Generalized Search Tree (GIST)
    K Nearest Neighbors (KNN)
    Space Partitioned GIST (SP-GIST)

    View Slide

  49. Indexes
    B-Tree
    Default
    Usually want this

    View Slide

  50. Indexes
    Generalized Inverted Index (GIN)
    Use with multiple values in 1 column
    Array/hStore

    View Slide

  51. Indexes
    Generalized Search Tree (GIST)
    Full text search
    Shapes

    View Slide

  52. Understanding Query Perf
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
    Given

    View Slide

  53. Explain
    # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)

    View Slide

  54. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Explain

    View Slide

  55. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Max Time
    Explain

    View Slide

  56. # EXPLAIN SELECT last_name FROM employees WHERE salary >=
    50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
    Filter: (salary >= 50000)
    (3 rows)
    Startup Cost
    Max Time
    Rows Returned
    Explain

    View Slide

  57. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Explain

    View Slide

  58. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost
    Explain

    View Slide

  59. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Explain

    View Slide

  60. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Rows Returned
    Explain

    View Slide

  61. # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
    time=2.401..295.247 rows=1428 loops=1)
    Filter: (salary >= 50000)
    Total runtime: 295.379
    (3 rows)
    Startup Cost Max Time
    Rows Returned
    Explain

    View Slide

  62. # CREATE INDEX idx_emps ON employees (salary);

    View Slide

  63. # CREATE INDEX idx_emps ON employees (salary);
    # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
    width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View Slide

  64. # CREATE INDEX idx_emps ON employees (salary);
    # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
    salary >= 50000;
    QUERY PLAN
    -------------------------------------------------------------------------
    Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
    width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
    Index Cond: (salary >= 50000)
    Total runtime: 1.771 ms
    (3 rows)

    View Slide

  65. Indexes Pro Tips

    View Slide

  66. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY

    View Slide

  67. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar

    View Slide

  68. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar
    SELECT * WHERE foo LIKE ‘%bar% is BAD

    View Slide

  69. Indexes Pro Tips
    CREATE INDEX CONCURRENTLY
    CREATE INDEX WHERE foo=bar
    SELECT * WHERE foo LIKE ‘%bar% is BAD
    SELECT * WHERE Food LIKE ‘bar%’ is OKAY

    View Slide

  70. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View Slide

  71. View Slide

  72. Index Hit Rate
    SELECT
    relname,
    100 * idx_scan / (seq_scan + idx_scan),
    n_live_tup
    FROM pg_stat_user_tables
    ORDER BY n_live_tup DESC;

    View Slide

  73. Cache Hit Rate
    SELECT
    relname::text,
    heap_blks_read + heap_blks_hit as reads,
    round(100 * heap_blks_hit / (heap_blks_hit + heap_blks_read)) as
    hit_pct,
    round(100 * idx_blks_hit / (idx_blks_hit + idx_blks_read)) as idx_hit_pct
    FROM pg_statio_user_tables
    WHERE heap_blks_hit + heap_blks_read + idx_blks_hit + idx_blks_read
    > 0
    ORDER BY 2 DESC;

    View Slide

  74. pg_stats_statements
    9.2

    View Slide

  75. pg_stats_statements
    $ select * from pg_stat_statements where query ~ 'from users where email';
    userid ᴹ 16384
    dbid ᴹ 16388
    query ᴹ select * from users where email = ?;
    calls ᴹ 2
    total_time ᴹ 0.000268
    rows ᴹ 2
    shared_blks_hit ᴹ 16
    shared_blks_read ᴹ 0
    shared_blks_dirtied ᴹ 0
    shared_blks_written ᴹ 0
    local_blks_hit ᴹ 0
    local_blks_read ᴹ 0
    local_blks_dirtied ᴹ 0
    local_blks_written ᴹ 0
    temp_blks_read ᴹ 0
    temp_blks_written ᴹ 0
    time_read ᴹ 0
    time_write ᴹ 0
    9.2

    View Slide

  76. pg_stats_statements
    9.2

    View Slide

  77. pg_stats_statements
    SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
    nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
    FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
    ----------------------------------------------------------------------
    query | UPDATE pgbench_branches SET bbalance = bbalance + ?
    WHERE bid = ?;
    calls | 3000
    total_time | 9609.00100000002
    rows | 2836
    hit_percent | 99.9778970000200936
    9.2

    View Slide

  78. View Slide

  79. Querying

    View Slide

  80. Window Functions
    Example:
    Biggest spender by state

    View Slide

  81. SELECT
    email,
    users.data->'state',
    sum(total(items)),
    rank() OVER
    (PARTITION BY users.data->'state'
    ORDER BY sum(total(items)) desc)
    FROM
    users, purchases
    WHERE purchases.user_id = users.id
    GROUP BY 1, 2;
    Window Functions

    View Slide

  82. SELECT
    email,
    users.data->'state',
    sum(total(items)),
    rank() OVER
    (PARTITION BY users.data->'state'
    ORDER BY sum(total(items)) desc)
    FROM
    users, purchases
    WHERE purchases.user_id = users.id
    GROUP BY 1, 2;
    Window Functions

    View Slide

  83. Extensions
    dblink hstore
    citext
    ltree
    isn
    cube
    pgcrypto
    tablefunc
    uuid-ossp
    earthdistance
    trigram
    fuzzystrmatch
    pgrowlocks
    pgstattuple
    btree_gist
    dict_int
    dict_xsyn
    unaccent

    View Slide

  84. Fuzzystrmatch

    View Slide

  85. Fuzzystrmatch
    SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

    View Slide

  86. Fuzzystrmatch
    SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');
    SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');
    SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');

    View Slide

  87. Moving Data Around
    \copy (SELECT * FROM users) TO ‘~/users.csv’;
    \copy users FROM ‘~/users.csv’;

    View Slide

  88. db_link
    SELECT dblink_connect('myconn', 'dbname=postgres');
    SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);
    a | b
    -------+------------
    1 | example
    2 | example2

    View Slide

  89. Foreign Data Wrappers
    oracle mysql
    informix
    twitter
    files
    www
    couch
    sybase
    ldap
    odbc
    s3
    redis jdbc

    View Slide

  90. CREATE EXTENSION redis_fdw;
    CREATE SERVER redis_server

    FOREIGN DATA WRAPPER redis_fdw

    OPTIONS (address '127.0.0.1', port '6379');
    CREATE FOREIGN TABLE redis_db0 (key text, value text)

    SERVER redis_server

    OPTIONS (database '0');
    CREATE USER MAPPING FOR PUBLIC

    SERVER redis_server

    OPTIONS (password 'secret');
    Foreign Data Wrappers

    View Slide

  91. SELECT
    id,
    email,
    value as visits
    FROM
    users,
    redis_db0
    WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text)
    AND cast(value as int) > 10;
    Query Redis from Postgres
    SELECT *
    FROM redis_db0;

    View Slide

  92. Readability

    View Slide

  93. Readability
    WITH top_5_products AS (
    SELECT products.*, count(*)
    FROM products, line_items
    WHERE products.id = line_items.product_id
    GROUP BY products.id
    ORDER BY count(*) DESC
    LIMIT 5
    )
    SELECT users.email, count(*)
    FROM users, line_items, top_5_products
    WHERE line_items.user_id = users.id
    AND line_items.product_id = top_5_products.id
    GROUP BY 1
    ORDER BY 1;

    View Slide

  94. Common Table Expressions
    WITH top_5_products AS (
    SELECT products.*, count(*)
    FROM products, line_items
    WHERE products.id = line_items.product_id
    GROUP BY products.id
    ORDER BY count(*) DESC
    LIMIT 5
    )
    SELECT users.email, count(*)
    FROM users, line_items, top_5_products
    WHERE line_items.user_id = users.id
    AND line_items.product_id = top_5_products.id
    GROUP BY 1
    ORDER BY 1;

    View Slide

  95. Brief History
    Developing w/ Postgres
    Postgres Performance
    Querying

    View Slide

  96. Extras

    View Slide

  97. Extras
    Listen/Notify

    View Slide

  98. Extras
    Listen/Notify
    Per Transaction Synchronous Replication

    View Slide

  99. Extras
    Listen/Notify
    Per Transaction Synchronous Replication
    Drop index concurrently

    View Slide

  100. Postgres - TLDR
    Datatypes
    Conditional Indexes
    Transactional DDL
    Foreign Data Wrappers
    Concurrent Index Creation
    Extensions
    Common Table Expressions
    Fast Column Addition
    Listen/Notify
    Table Inheritance
    Per Transaction sync replication
    Window functions
    NoSQL inside SQL
    Momentum

    View Slide