Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Safety and Convenience: PostgreSQL's concurrency model and a CTE HOWTO

Safety and Convenience: PostgreSQL's concurrency model and a CTE HOWTO

PostgreSQL is known for it's reliability and advanced features. Learn about the concurrency model that makes PostgreSQL so awesome, and learn how Common Table Expressions can make everything better. Talk given at RICON 2012 in San Francisco.

Selena Deckelmann

October 11, 2012
Tweet

More Decks by Selena Deckelmann

Other Decks in Technology

Transcript

  1. Thursday, October 11, 12

    View full-size slide

  2. How Postgres handles
    concurrency. Also, CTEs.
    Selena Deckelmann
    @selenamarie
    [email protected]
    Thursday, October 11, 12

    View full-size slide

  3. Safety and convenience
    Thursday, October 11, 12

    View full-size slide

  4. select ‘fail’ from test
    where
    now() - ‘1 hour’::interval
    > max(timestamp)
    Thursday, October 11, 12

    View full-size slide

  5. with maxcte as (
    select max(stamp) as lastupdate from test
    )
    select ‘fail’ from maxcte
    where
    now() - ‘1 hour’::interval > lastupdate;
    Thursday, October 11, 12

    View full-size slide

  6. "Postgres is bloatware by design:
    it was built to house PhD theses."
    -Joe Hellerstein, 2002
    Thursday, October 11, 12

    View full-size slide

  7. • Started in 1986 as a PhD project
    • First open source commit: July 1996
    • About 672,670 lines of C as of 10/10/12
    Thursday, October 11, 12

    View full-size slide

  8. 7.4 (2002-2010)
    8.0 (2004-2010)
    8.1 (2005-2010)
    8.2 (2005-2011)
    8.3 (2006-2012)
    8.4 (2008-2012)
    9.0 (2010-2012)
    9.1 (2010-2012)
    9.2 (2011-2012)
    0 175 350 525 700
    14
    15
    17
    18
    20
    17
    18
    20
    23
    289
    336
    298
    450
    624
    455
    421
    342
    470
    Number of PostgreSQL contributors by release
    Contributors Named Committers
    Thursday, October 11, 12

    View full-size slide

  9. 2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    0 75 150 225 300
    13
    14
    16
    16
    13
    15
    11
    13
    14
    15
    13
    196
    242
    240
    245
    209
    238
    223
    276
    248
    260
    244
    PostgreSQL contributors by year
    Contributors Named Committers
    Thursday, October 11, 12

    View full-size slide

  10. "Salesforce.com bases its entire cloud on
    Oracle database, but its database platform
    offering is PostgreSQL.
    I find that interesting."
    -Larry Ellison, OpenWorld 2012
    Thursday, October 11, 12

    View full-size slide

  11. How Postgres handles concurrent
    operations
    How to write application queries to
    get the right results, without
    sacrificing performance.
    Thursday, October 11, 12

    View full-size slide

  12. What we want as developers:
    Transactions that appear to execute one at a
    time.
    Easy way? Run one transaction at a time.
    This is the thinking behind a global lock.
    Thursday, October 11, 12

    View full-size slide

  13. Concurrency the hard right way
    Snapshot Isolation
    Transactions appear to execute one at a
    time, in order.
    Also, each transaction has a version of
    the world that it sees, isolated from
    other transactions.
    Thursday, October 11, 12

    View full-size slide

  14. Inside a snapshot
    Each transaction runs against a “snapshot” of the
    committed state of the database at a particular instant. It
    cannot see the effects of uncommitted transactions
    (except itself), nor of transactions that commit after the
    snapshot time.
    Every update of a database row generates a new version of
    the row.
    Snapshot information determines which version of a given
    row is visible to a given transaction. Old versions of rows
    can be reclaimed once they are no longer visible to any
    running transaction.
    Thursday, October 11, 12

    View full-size slide

  15. How this works
    “Bunnies are dangerous”
    Thursday, October 11, 12

    View full-size slide

  16. “Bunnies are dangerous”
    ID #2 - SELECT
    Thursday, October 11, 12

    View full-size slide

  17. “Bunnies are dangerous”
    ID #2 - SELECT
    “Bunnies are adorable.” ID #3 - UPDATE
    There’s a
    new row
    version!
    Thursday, October 11, 12

    View full-size slide

  18. “Bunnies are dangerous”
    ID #2 - SELECT
    “Bunnies are adorable.”
    Once Transaction ID #3 is committed, the original row is
    no longer visible to future transactions, but still exists and
    is visible to ID #2.
    Thursday, October 11, 12

    View full-size slide

  19. “Bunnies are dangerous”
    ID #2 - SELECT
    “Bunnies are adorable.”
    ID #4 - SELECT
    Thursday, October 11, 12

    View full-size slide

  20. “Bunnies are dangerous”
    ID #2 - SELECT
    “Bunnies are adorable.”
    ID #4 - SELECT
    Once Transaction ID #2 ends, then the row associated with
    ID #2 is no longer visible to any future transactions.
    Thursday, October 11, 12

    View full-size slide

  21. “Bunnies are dangerous”
    “Bunnies are adorable.”
    “Bunnies will rule the world.” ID #6 - UPDATE
    Thursday, October 11, 12

    View full-size slide

  22. “Bunnies are dangerous”
    “Bunnies are adorable.”
    “Bunnies will rule the world.” ID #6 - UPDATE
    ID #6 - ROLLBACK
    Thursday, October 11, 12

    View full-size slide

  23. Vacuum?
    “Cats are not very cute.”
    “Cats are adorable.”
    “Cats should rule the world.”
    “Bunnies are dangerous”
    “Bunnies are adorable.”
    “Bunnies will rule the world.”
    Thursday, October 11, 12

    View full-size slide

  24. “Cats are not very cute.”
    “Cats are adorable.”
    “Cats should rule the world.”
    VACUUM cleans up these rows
    Vacuum!
    “Bunnies are dangerous”
    “Bunnies are adorable.”
    “Bunnies will rule the world.”
    Thursday, October 11, 12

    View full-size slide

  25. “Cats are adorable.”
    “Bunnies are adorable.”
    Thursday, October 11, 12

    View full-size slide

  26. Life of a row
    Versions of a row are visible for the life of a transaction.
    New versions do not become visible until the transaction
    that created the rows commits.
    Cleaning up “dead” versions of rows is handled by
    VACUUM, asynchronously and automatically.
    Thursday, October 11, 12

    View full-size slide

  27. Three kinds of transaction isolation
    READ COMMITTED
    See all committed activity at QUERY start
    REPEATABLE READ
    See all committed activity at TRANSACTION start.
    Throw serialization error on conflicts.
    SERIALIZABLE
    Guarantee apparent serial execution of transactions
    with predicate locking.
    Thursday, October 11, 12

    View full-size slide

  28. Predicate locking is awesome.
    Postgres keeps track of the rows that your
    query is changing or accessing and uses that
    to determine whether or not a serialization
    conflict might occur.
    Thursday, October 11, 12

    View full-size slide

  29. BEGIN TRANSACTION ISOLATION SERIALIZABLE;
    UPDATE clicks
    SET count = count + 1;
    BEGIN TRANSACTION ISOLATION SERIALIZABLE;
    DELETE FROM clicks
    WHERE count < 10;
    Thursday, October 11, 12

    View full-size slide

  30. COMMIT;
    And afterward we commit both transactions.
    COMMIT;
    ERROR: could not serialize access due to
    concurrent update
    Thursday, October 11, 12

    View full-size slide

  31. BEGIN TRANSACTION ISOLATION READ COMMITTED;
    UPDATE clicks
    SET count = count + 1;
    TRY AGAIN.
    BEGIN TRANSACTION ISOLATION READ COMMITTED;
    DELETE FROM clicks
    WHERE count < 10;
    Thursday, October 11, 12

    View full-size slide

  32. COMMIT;
    And afterward we commit both transactions.
    COMMIT;
    This is what we told Postgres to do:
    Before each query, refresh the snapshot. So, we see changes carried out by
    other, concurrent transactions and that affects our operation.
    Thursday, October 11, 12

    View full-size slide

  33. How to write application queries to
    get the right results, without
    sacrificing performance.
    Thursday, October 11, 12

    View full-size slide

  34. Easy cases
    All read-only transactions
    Many reader, one writer
    Both are trivial - readers don’t block each
    other. One writer in a transaction is invisible
    to readers until after commit.
    Thursday, October 11, 12

    View full-size slide

  35. Closer look at one-writer
    The writer’s updates will all become visible when it commits.
    Readers running in SERIALIZABLE or READ COMMITTED
    mode will not see the changes from writers that commit after
    they start. So, readers have a stable view of the database
    throughout each transaction.
    Readers running in READ COMMITTED mode will be able to
    see changes beginning at first SQL statement issued after writer
    commits. So, the view is stable within a statement but not across
    statements. This is messier, but necessary if (for example) you
    have procedures that want to observe concurrent changes.
    Either way, readers do not block the writer, and the writer does
    not block readers, so concurrent performance is
    excellent.
    Thursday, October 11, 12

    View full-size slide

  36. Suppose we want to cross check deposit totals against bank branch
    totals:
    BEGIN;
    SELECT SUM(balance) FROM accounts;
    SELECT SUM(branch balance) FROM branches;
    -- check to see that we got the same result
    COMMIT;
    How do we ensure correctness?
    Thursday, October 11, 12

    View full-size slide

  37. Suppose we want to cross check deposit totals against bank branch
    totals:
    BEGIN;
    SELECT SUM(balance) FROM accounts;
    SELECT SUM(branch balance) FROM branches;
    -- check to see that we got the same result
    COMMIT;
    This is correct in SERIALIZABLE or REPEATABLE READ mode, but
    wrong in READ COMMITTED mode. To select SERIALIZABLE mode,
    either add
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
    READ ONLY;
    to each transaction block (just after BEGIN), or do
    SET DEFAULT TRANSACTION ISOLATION TO
    SERIALIZABLE READ ONLY;
    (this can be set in the postgresql.conf file, or per-connection).
    Thursday, October 11, 12

    View full-size slide

  38. Rules for read-only transactions
    Don’t put more reader activity into a single transaction than is
    necessary.
    Pure reader transactions should typically be run in SERIALIZABLE
    READ ONLY mode, so that they see a consistent database view
    throughout the transaction; then the transaction boundaries define
    the times when you want to recognize updates committed by other
    clients.
    If you start one serializable transaction and hold it for the entire
    run of an application, you’ll never observe any changes in database
    state, other than those you make yourself. This is what you want in
    a few cases (for instance, the backup utility pg_dump does it to
    make a self-consistent dump) but usually it’s not what you want.
    Thursday, October 11, 12

    View full-size slide

  39. Beware of really long transactions
    Rows visible to the long-running transaction aren’t available to be
    reclaimed, even while they’re long-dead to other transactions.
    Postgres doesn’t have “rollback segments” like Oracle that can run
    out of space and bomb, but it is possible to run out of disk overall.
    Worst thing to do: Start a transaction (BEGIN;) and then go to
    sleep for hours or days.
    Thursday, October 11, 12

    View full-size slide

  40. Rules for concurrent writing
    Two concurrent transactions cannot write (UPDATE, DELETE, or
    SELECT FOR UPDATE) the same row. They can proceed as long
    as they write non-overlapping sets of rows.
    If a transaction tries to write a row already written by a concurrent
    transaction:
    1. If other transaction is still in progress, wait for it to commit or
    rollback.
    2. If it rolled back, proceed (using the non-updated version of the row).
    3. If it committed:
    Thursday, October 11, 12

    View full-size slide

  41. Rules for concurrent writing
    3. If it committed:
    (a) In READ COMMITTED mode: proceed, using the newly-
    committed (latest) version of the row as the starting point for my
    operation. (But first, recheck the new row to see if it still satisfies
    the WHERE clause of my query; if not, ignore it.)
    (b) In REPEATABLE READ or SERIALIZABLE mode: abort with
    “can’t serialize” error.
    SERIALIZABLE uses predicate locking and REPEATABLE READ
    supports pre-9.1 behavior for legacy support.
    Thursday, October 11, 12

    View full-size slide

  42. Concurrent writers can fail without some interlock.
    It’s easy to see that there could be a problem:
    UPDATE webpages SET hits = hits + 1 WHERE url = ’...’;
    CLIENT 1 CLIENT 2
    reads row, sees hits = 531
    reads row, sees hits = 531
    computes hits+1 = 532
    computes hits+1 = 532
    writes hits = 532
    writes hits = 532
    commit
    commit
    Oops . . . final state of the row is 532, not 533 as we’d wish.
    Thursday, October 11, 12

    View full-size slide

  43. READ COMMITTED works perfectly for this.
    That bug won’t occur in Postgres. In READ COMMITTED, the second
    client is forced to redo it’s calculation.
    CLIENT 1 CLIENT 2
    reads row, sees hits = 531
    reads row, sees hits = 531
    computes hits+1 = 532
    computes hits+1 = 532
    writes hits = 532
    Tries to write, waits.
    commit
    re-reads row, sees hits = 532
    re-computes hits+1 = 533
    writes hits = 533
    commit
    Thursday, October 11, 12

    View full-size slide

  44. READ COMMITTED vs SERIALIZABLE
    In SERIALIZABLE mode, the second client would instead get a
    “Can’t serialize access due to concurrent update” error, and would
    have to retry the transaction until he succeeded.
    You always need to code a retry loop in the client when you are
    using SERIALIZABLE mode for a writing transaction.
    Why would you want to use SERIALIZABLE mode, given that it
    requires extra client-side programming? Because READ
    COMMITTED doesn’t scale very well to more complex situations.
    Thursday, October 11, 12

    View full-size slide

  45. Example with separate read and write commands
    BEGIN;
    SELECT hits FROM webpages WHERE url = ’...’;
    -- client internally computes $newval = $hits + 1
    UPDATE webpages SET hits = $newval WHERE url = ’...’;
    COMMIT;
    This might look silly, but it’s not so silly if the “internal computation” is
    complex, or if you need data from multiple database rows to make the
    computation, or if you’d like to know exactly what you just updated
    the hits count to.
    Thursday, October 11, 12

    View full-size slide

  46. READ COMMITTED for this case, does not work.
    CLIENT 1 CLIENT 2
    reads row, sees hits = 531
    reads row, sees hits = 531
    computes $newval = 532
    computes $newval = 532
    writes hits = 532
    Tries to write row, waits.
    commit
    re-reads row, sees hits = 532
    re-computes $newval = 532
    writes hits = 532
    commit
    Thursday, October 11, 12

    View full-size slide

  47. Solution #1: use SELECT FOR UPDATE, not just SELECT,
    to read the input data
    We can make it work by doing this:
    BEGIN;
    SELECT hits FROM webpages WHERE url = ’...’ FOR UPDATE;
    -- client internally computes $newval = $hits + 1
    UPDATE webpages SET hits = $newval WHERE url = ’...’;
    COMMIT;
    FOR UPDATE causes the initial SELECT to acquire a row lock on the
    target row, thus making it safe to do our internal computation and
    update the row.
    SELECT FOR UPDATE returns the latest updated contents of the row,
    so we aren’t working with stale data.
    Thursday, October 11, 12

    View full-size slide

  48. How SELECT FOR UPDATE solution works.
    CLIENT 1 CLIENT 2
    reads row, sees hits = 531
    tries to lock row, waits.
    computes $newval = 532
    writes hits = 532
    commit
    locks row, reads hits = 532
    re-computes $newval = 533
    writes hits = 533
    commit
    We need both the delay till commit and the read of the updated value
    to make this work. Notice SELECT FOR UPDATE may return data
    committed after its start time, which is different from plain SELECT’s
    behavior in READ COMMITTED mode.
    Thursday, October 11, 12

    View full-size slide

  49. Solution #2: use REPEATABLE READ or SERIALIZABLE
    mode and retry on serialization error
    loop
    BEGIN;
    SELECT hits FROM webpages WHERE url = ’...’;
    -- client internally computes $newval = $hits + 1
    UPDATE webpages SET hits = $newval WHERE url = ’...’;
    if (no error)
    break out of loop;
    else
    ROLLBACK;
    end loop
    COMMIT;
    This works because client will re-read the new hits value during retry.
    The UPDATE will succeed only if the data read is still current.
    Thursday, October 11, 12

    View full-size slide

  50. The client could get confused, too
    Consider this small variant of the example, running in READ
    COMMITTED mode:
    BEGIN;
    SELECT hits FROM webpages WHERE url = ’...’;
    UPDATE webpages SET hits = hits + 1 WHERE url = ’...’;
    COMMIT;
    If someone else increments the hits count while the SELECT is running,
    the UPDATE will compute the correct updated value . . . but it will be
    two more than what we just read in the SELECT, not one more. (And
    that’s what we’d see if we did another SELECT.)
    If your application logic could get confused by this sort of thing, you
    need to use either SELECT FOR UPDATE or SERIALIZABLE mode.
    Thursday, October 11, 12

    View full-size slide

  51. The difference between REPEATABLE READ and
    SERIALIZABLE mode
    REPEATABLE READ does not have the ability to take out predicate
    locks and does not use the conflict detection algorithm in
    SERIALIZABLE.
    Thursday, October 11, 12

    View full-size slide

  52. Multi-row constraints, or why serializable is really
    serializable
    Suppose we want to check that we always have at least $1000 total cash
    in our checking and savings accounts. A typical transaction will look like:
    BEGIN;
    UPDATE myaccounts
    SET balance = balance - $withdrawal
    WHERE accountid = ’checking’;
    SELECT SUM(balance) FROM myaccounts;
    if (sum 1000.00)
    COMMIT;
    else
    ROLLBACK;
    Is this safe? YES, only in SERIALIZABLE mode.
    Thursday, October 11, 12

    View full-size slide

  53. Suppose we have $600 in each account and we concurrently try to
    withdraw $200 from each, and we use SERIALIZABLE.
    CLIENT 1 CLIENT 2
    reads checking balance, get 600
    update checking balance to 400
    reads savings balance, get 600
    update savings balance to 400
    run SUM, get 400+600=1000
    run SUM, get 400+600=1000
    commit
    ERROR: serialization failure
    Predicate locking detects the dependency! The second transaction
    would have to retry, and the SUM test would fail.
    Thursday, October 11, 12

    View full-size slide

  54. Performance comparison
    The READ COMMITTED SELECT FOR UPDATE approach is
    essentially pessimistic locking: get the lock first, then do your work.
    The REPEATABLE READ retry-loop approach is essentially optimistic
    locking with retry. It avoids the overhead of getting the row lock
    beforehand, at the cost of having to redo the whole transaction if there
    is a conflict.
    SERIALIZABLE uses predicate locking to ensure concurrent writers
    don’t conflict. Overhead is significant, but may simplify code and ensures
    safety in complex, high concurrency situations. Practically, you’ll see
    more rollbacks than in REPEATABLE READ.
    SELECT FOR UPDATE wins if contention is heavy (since the serializable
    approach will be wasting work pretty often). REPEATABLE READ wins
    if contention for specific rows is light. SERIALIZABLE wins when you
    need predicate locking to ensure data safety.
    Thursday, October 11, 12

    View full-size slide

  55. • PostgreSQL Documentation for SET
    TRANSACTION and Concurrency Control
    • Presentations by Kevin Grittner and Dan
    Ports about Serializable Snapshot Isolation
    (SSI)
    To learn more...
    Thursday, October 11, 12

    View full-size slide

  56. Common Table
    Expressions
    Thursday, October 11, 12

    View full-size slide

  57. Temp tables for a query
    WITH regional_sales AS (
    SELECT region, SUM(amount) AS total_sales
    FROM orders
    GROUP BY region
    ), top_regions AS (
    SELECT region
    FROM regional_sales
    WHERE total_sales > (SELECT SUM(total_sales)/10 FROM
    regional_sales)
    )
    SELECT region,
    product,
    SUM(quantity) AS product_units,
    SUM(amount) AS product_sales
    FROM orders
    WHERE region IN (SELECT region FROM top_regions)
    GROUP BY region, product;
    Thursday, October 11, 12

    View full-size slide

  58. Recursive CTE
    WITH RECURSIVE t1(a, b) AS (
    VALUES(0,0)
    UNION ALL
    SELECT CASE CAST(b AS BOOLEAN)
    WHEN b % 3 = 0 THEN b
    WHEN b % 5 = 0 THEN b
    END,
    b + 1
    FROM t1
    WHERE b < 1000
    )
    SELECT sum(a) FROM t1;
    Euler Project, question 1:
    If we list all the natural numbers
    below 10 that are multiples of 3 or
    5, we get 3, 5, 6 and 9. The sum of
    these multiples is 23.
    Find the sum of all the multiples
    of 3 or 5 below 1000.
    http://wiki.postgresql.org/wiki/Euler_Project,_Question_1
    Thursday, October 11, 12

    View full-size slide

  59. Writeable CTE
    WITH deleted_posts AS (
    DELETE FROM posts
    WHERE created < now()
    - '6 months'::INTERVAL
    RETURNING *
    )
    SELECT user_id, count(*) FROM deleted_posts group BY 1;
    Makes maintaining partitions easier:
    http://www.depesz.com/index.php/2011/03/16/waiting-for-9-1-writable-cte/
    http://xzilla.net/blog/2011/Mar/Upserting-via-Writeable-CTE.html
    Thursday, October 11, 12

    View full-size slide

  60. Real world CTE
    Analyzing sample data to detect correlations.
    Start with a semi-structured text field:
    AdapterVendorID: 1002, AdapterDeviceID: 6779,
    AdapterDriverVersion: 8.784.1.0
    D2D? D2D+
    DWrite? DWrite+
    D3D10 Layers? D3D10 Layers+
    Thursday, October 11, 12

    View full-size slide

  61. Pick a variable:
    substring(app_notes from E'AdapterDeviceID: (\\w+)')
    Normalize the data by creating tables to hold
    all the DeviceIDs... OR...
    Thursday, October 11, 12

    View full-size slide

  62. WITH AllDeviceCrashes AS (
    SELECT count(*) crash_by_deviceid
    , substring(app_notes from
    E'AdapterDeviceID: (\\w+)') AdapterDeviceID
    FROM
    reports
    WHERE
    GROUP BY AdapterDeviceID
    )
    SELECT ...
    Thursday, October 11, 12

    View full-size slide

  63. `
    WITH AllDeviceCrashes AS (
    ...
    ), AllSignatureCrashes AS (
    ...
    ), AllSignatureDeviceCrashes AS (
    ...
    ), AllCrashes AS (
    ...
    )
    SELECT Adapter,
    Signature,
    crash_sig / total_crashes as “P(sig)”,
    crash_devid / total_crashes as “P(devid)”,
    crash_sig_devid / total_crashes as “P(sig, devid)”,
    ...
    Thursday, October 11, 12

    View full-size slide

  64. What features get you
    excited about Postgres?
    Thursday, October 11, 12

    View full-size slide

  65. Transactional DDL
    No MyISAM tables
    ANSI compliance (MORE THAN
    ONCE)
    Portability
    Table functions
    Stored procedures in tons of
    languages
    Complex types
    Extensibility
    LISTEN/NOTIFY
    Foreign Data Wrappers
    Common Table Expressions
    Functional INDEXes
    Temporal data types
    Simple licensing
    psql
    Data integrity
    Sugary syntax
    Stability
    JSON datatype
    Fast bug fixes
    ...
    What they said:
    Thursday, October 11, 12

    View full-size slide

  66. Plans for 9.3
    • Event Triggers (DDL, for example)
    • Dramatically reduce shared memory use
    • LATERAL (performance, prettier queries)
    • UPSERT
    • Expansion of JSON type support
    • Performance improvement in SSI
    Thursday, October 11, 12

    View full-size slide

  67. Where to watch
    • Depesz “Waiting for 9.3”
    • Planet PostgreSQL
    http://planet.postgresql.org
    • Talks for FOSDEM 2013 and PgCon 2013
    Thursday, October 11, 12

    View full-size slide

  68. “[A] funny thing happened on the way
    to the SQL cemetery: PostgreSQL
    became cool again.”
    -Matt Asay
    http://www.theregister.co.uk/2012/08/31/postgresql_too_cool_for_school/
    Thursday, October 11, 12

    View full-size slide

  69. Thursday, October 11, 12

    View full-size slide