Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Safety and Convenience: PostgreSQL's concurrency model and a CTE HOWTO

Safety and Convenience: PostgreSQL's concurrency model and a CTE HOWTO

PostgreSQL is known for it's reliability and advanced features. Learn about the concurrency model that makes PostgreSQL so awesome, and learn how Common Table Expressions can make everything better. Talk given at RICON 2012 in San Francisco.

Selena Deckelmann

October 11, 2012
Tweet

More Decks by Selena Deckelmann

Other Decks in Technology

Transcript

  1. with maxcte as ( select max(stamp) as lastupdate from test

    ) select ‘fail’ from maxcte where now() - ‘1 hour’::interval > lastupdate; Thursday, October 11, 12
  2. "Postgres is bloatware by design: it was built to house

    PhD theses." -Joe Hellerstein, 2002 Thursday, October 11, 12
  3. • Started in 1986 as a PhD project • First

    open source commit: July 1996 • About 672,670 lines of C as of 10/10/12 Thursday, October 11, 12
  4. 7.4 (2002-2010) 8.0 (2004-2010) 8.1 (2005-2010) 8.2 (2005-2011) 8.3 (2006-2012)

    8.4 (2008-2012) 9.0 (2010-2012) 9.1 (2010-2012) 9.2 (2011-2012) 0 175 350 525 700 14 15 17 18 20 17 18 20 23 289 336 298 450 624 455 421 342 470 Number of PostgreSQL contributors by release Contributors Named Committers Thursday, October 11, 12
  5. 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

    2012 0 75 150 225 300 13 14 16 16 13 15 11 13 14 15 13 196 242 240 245 209 238 223 276 248 260 244 PostgreSQL contributors by year Contributors Named Committers Thursday, October 11, 12
  6. "Salesforce.com bases its entire cloud on Oracle database, but its

    database platform offering is PostgreSQL. I find that interesting." -Larry Ellison, OpenWorld 2012 Thursday, October 11, 12
  7. How Postgres handles concurrent operations How to write application queries

    to get the right results, without sacrificing performance. Thursday, October 11, 12
  8. What we want as developers: Transactions that appear to execute

    one at a time. Easy way? Run one transaction at a time. This is the thinking behind a global lock. Thursday, October 11, 12
  9. Concurrency the hard right way Snapshot Isolation Transactions appear to

    execute one at a time, in order. Also, each transaction has a version of the world that it sees, isolated from other transactions. Thursday, October 11, 12
  10. Inside a snapshot Each transaction runs against a “snapshot” of

    the committed state of the database at a particular instant. It cannot see the effects of uncommitted transactions (except itself), nor of transactions that commit after the snapshot time. Every update of a database row generates a new version of the row. Snapshot information determines which version of a given row is visible to a given transaction. Old versions of rows can be reclaimed once they are no longer visible to any running transaction. Thursday, October 11, 12
  11. “Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.”

    ID #3 - UPDATE There’s a new row version! Thursday, October 11, 12
  12. “Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.”

    Once Transaction ID #3 is committed, the original row is no longer visible to future transactions, but still exists and is visible to ID #2. Thursday, October 11, 12
  13. “Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.”

    ID #4 - SELECT Once Transaction ID #2 ends, then the row associated with ID #2 is no longer visible to any future transactions. Thursday, October 11, 12
  14. “Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the

    world.” ID #6 - UPDATE ID #6 - ROLLBACK Thursday, October 11, 12
  15. Vacuum? “Cats are not very cute.” “Cats are adorable.” “Cats

    should rule the world.” “Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” Thursday, October 11, 12
  16. “Cats are not very cute.” “Cats are adorable.” “Cats should

    rule the world.” VACUUM cleans up these rows Vacuum! “Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” Thursday, October 11, 12
  17. Life of a row Versions of a row are visible

    for the life of a transaction. New versions do not become visible until the transaction that created the rows commits. Cleaning up “dead” versions of rows is handled by VACUUM, asynchronously and automatically. Thursday, October 11, 12
  18. Three kinds of transaction isolation READ COMMITTED See all committed

    activity at QUERY start REPEATABLE READ See all committed activity at TRANSACTION start. Throw serialization error on conflicts. SERIALIZABLE Guarantee apparent serial execution of transactions with predicate locking. Thursday, October 11, 12
  19. Predicate locking is awesome. Postgres keeps track of the rows

    that your query is changing or accessing and uses that to determine whether or not a serialization conflict might occur. Thursday, October 11, 12
  20. BEGIN TRANSACTION ISOLATION SERIALIZABLE; UPDATE clicks SET count = count

    + 1; BEGIN TRANSACTION ISOLATION SERIALIZABLE; DELETE FROM clicks WHERE count < 10; Thursday, October 11, 12
  21. COMMIT; And afterward we commit both transactions. COMMIT; ERROR: could

    not serialize access due to concurrent update Thursday, October 11, 12
  22. BEGIN TRANSACTION ISOLATION READ COMMITTED; UPDATE clicks SET count =

    count + 1; TRY AGAIN. BEGIN TRANSACTION ISOLATION READ COMMITTED; DELETE FROM clicks WHERE count < 10; Thursday, October 11, 12
  23. COMMIT; And afterward we commit both transactions. COMMIT; This is

    what we told Postgres to do: Before each query, refresh the snapshot. So, we see changes carried out by other, concurrent transactions and that affects our operation. Thursday, October 11, 12
  24. How to write application queries to get the right results,

    without sacrificing performance. Thursday, October 11, 12
  25. Easy cases All read-only transactions Many reader, one writer Both

    are trivial - readers don’t block each other. One writer in a transaction is invisible to readers until after commit. Thursday, October 11, 12
  26. Closer look at one-writer The writer’s updates will all become

    visible when it commits. Readers running in SERIALIZABLE or READ COMMITTED mode will not see the changes from writers that commit after they start. So, readers have a stable view of the database throughout each transaction. Readers running in READ COMMITTED mode will be able to see changes beginning at first SQL statement issued after writer commits. So, the view is stable within a statement but not across statements. This is messier, but necessary if (for example) you have procedures that want to observe concurrent changes. Either way, readers do not block the writer, and the writer does not block readers, so concurrent performance is excellent. Thursday, October 11, 12
  27. Suppose we want to cross check deposit totals against bank

    branch totals: BEGIN; SELECT SUM(balance) FROM accounts; SELECT SUM(branch balance) FROM branches; -- check to see that we got the same result COMMIT; How do we ensure correctness? Thursday, October 11, 12
  28. Suppose we want to cross check deposit totals against bank

    branch totals: BEGIN; SELECT SUM(balance) FROM accounts; SELECT SUM(branch balance) FROM branches; -- check to see that we got the same result COMMIT; This is correct in SERIALIZABLE or REPEATABLE READ mode, but wrong in READ COMMITTED mode. To select SERIALIZABLE mode, either add SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; to each transaction block (just after BEGIN), or do SET DEFAULT TRANSACTION ISOLATION TO SERIALIZABLE READ ONLY; (this can be set in the postgresql.conf file, or per-connection). Thursday, October 11, 12
  29. Rules for read-only transactions Don’t put more reader activity into

    a single transaction than is necessary. Pure reader transactions should typically be run in SERIALIZABLE READ ONLY mode, so that they see a consistent database view throughout the transaction; then the transaction boundaries define the times when you want to recognize updates committed by other clients. If you start one serializable transaction and hold it for the entire run of an application, you’ll never observe any changes in database state, other than those you make yourself. This is what you want in a few cases (for instance, the backup utility pg_dump does it to make a self-consistent dump) but usually it’s not what you want. Thursday, October 11, 12
  30. Beware of really long transactions Rows visible to the long-running

    transaction aren’t available to be reclaimed, even while they’re long-dead to other transactions. Postgres doesn’t have “rollback segments” like Oracle that can run out of space and bomb, but it is possible to run out of disk overall. Worst thing to do: Start a transaction (BEGIN;) and then go to sleep for hours or days. Thursday, October 11, 12
  31. Rules for concurrent writing Two concurrent transactions cannot write (UPDATE,

    DELETE, or SELECT FOR UPDATE) the same row. They can proceed as long as they write non-overlapping sets of rows. If a transaction tries to write a row already written by a concurrent transaction: 1. If other transaction is still in progress, wait for it to commit or rollback. 2. If it rolled back, proceed (using the non-updated version of the row). 3. If it committed: Thursday, October 11, 12
  32. Rules for concurrent writing 3. If it committed: (a) In

    READ COMMITTED mode: proceed, using the newly- committed (latest) version of the row as the starting point for my operation. (But first, recheck the new row to see if it still satisfies the WHERE clause of my query; if not, ignore it.) (b) In REPEATABLE READ or SERIALIZABLE mode: abort with “can’t serialize” error. SERIALIZABLE uses predicate locking and REPEATABLE READ supports pre-9.1 behavior for legacy support. Thursday, October 11, 12
  33. Concurrent writers can fail without some interlock. It’s easy to

    see that there could be a problem: UPDATE webpages SET hits = hits + 1 WHERE url = ’...’; CLIENT 1 CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes hits+1 = 532 computes hits+1 = 532 writes hits = 532 writes hits = 532 commit commit Oops . . . final state of the row is 532, not 533 as we’d wish. Thursday, October 11, 12
  34. READ COMMITTED works perfectly for this. That bug won’t occur

    in Postgres. In READ COMMITTED, the second client is forced to redo it’s calculation. CLIENT 1 CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes hits+1 = 532 computes hits+1 = 532 writes hits = 532 Tries to write, waits. commit re-reads row, sees hits = 532 re-computes hits+1 = 533 writes hits = 533 commit Thursday, October 11, 12
  35. READ COMMITTED vs SERIALIZABLE In SERIALIZABLE mode, the second client

    would instead get a “Can’t serialize access due to concurrent update” error, and would have to retry the transaction until he succeeded. You always need to code a retry loop in the client when you are using SERIALIZABLE mode for a writing transaction. Why would you want to use SERIALIZABLE mode, given that it requires extra client-side programming? Because READ COMMITTED doesn’t scale very well to more complex situations. Thursday, October 11, 12
  36. Example with separate read and write commands BEGIN; SELECT hits

    FROM webpages WHERE url = ’...’; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; COMMIT; This might look silly, but it’s not so silly if the “internal computation” is complex, or if you need data from multiple database rows to make the computation, or if you’d like to know exactly what you just updated the hits count to. Thursday, October 11, 12
  37. READ COMMITTED for this case, does not work. CLIENT 1

    CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes $newval = 532 computes $newval = 532 writes hits = 532 Tries to write row, waits. commit re-reads row, sees hits = 532 re-computes $newval = 532 writes hits = 532 commit Thursday, October 11, 12
  38. Solution #1: use SELECT FOR UPDATE, not just SELECT, to

    read the input data We can make it work by doing this: BEGIN; SELECT hits FROM webpages WHERE url = ’...’ FOR UPDATE; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; COMMIT; FOR UPDATE causes the initial SELECT to acquire a row lock on the target row, thus making it safe to do our internal computation and update the row. SELECT FOR UPDATE returns the latest updated contents of the row, so we aren’t working with stale data. Thursday, October 11, 12
  39. How SELECT FOR UPDATE solution works. CLIENT 1 CLIENT 2

    reads row, sees hits = 531 tries to lock row, waits. computes $newval = 532 writes hits = 532 commit locks row, reads hits = 532 re-computes $newval = 533 writes hits = 533 commit We need both the delay till commit and the read of the updated value to make this work. Notice SELECT FOR UPDATE may return data committed after its start time, which is different from plain SELECT’s behavior in READ COMMITTED mode. Thursday, October 11, 12
  40. Solution #2: use REPEATABLE READ or SERIALIZABLE mode and retry

    on serialization error loop BEGIN; SELECT hits FROM webpages WHERE url = ’...’; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; if (no error) break out of loop; else ROLLBACK; end loop COMMIT; This works because client will re-read the new hits value during retry. The UPDATE will succeed only if the data read is still current. Thursday, October 11, 12
  41. The client could get confused, too Consider this small variant

    of the example, running in READ COMMITTED mode: BEGIN; SELECT hits FROM webpages WHERE url = ’...’; UPDATE webpages SET hits = hits + 1 WHERE url = ’...’; COMMIT; If someone else increments the hits count while the SELECT is running, the UPDATE will compute the correct updated value . . . but it will be two more than what we just read in the SELECT, not one more. (And that’s what we’d see if we did another SELECT.) If your application logic could get confused by this sort of thing, you need to use either SELECT FOR UPDATE or SERIALIZABLE mode. Thursday, October 11, 12
  42. The difference between REPEATABLE READ and SERIALIZABLE mode REPEATABLE READ

    does not have the ability to take out predicate locks and does not use the conflict detection algorithm in SERIALIZABLE. Thursday, October 11, 12
  43. Multi-row constraints, or why serializable is really serializable Suppose we

    want to check that we always have at least $1000 total cash in our checking and savings accounts. A typical transaction will look like: BEGIN; UPDATE myaccounts SET balance = balance - $withdrawal WHERE accountid = ’checking’; SELECT SUM(balance) FROM myaccounts; if (sum 1000.00) COMMIT; else ROLLBACK; Is this safe? YES, only in SERIALIZABLE mode. Thursday, October 11, 12
  44. Suppose we have $600 in each account and we concurrently

    try to withdraw $200 from each, and we use SERIALIZABLE. CLIENT 1 CLIENT 2 reads checking balance, get 600 update checking balance to 400 reads savings balance, get 600 update savings balance to 400 run SUM, get 400+600=1000 run SUM, get 400+600=1000 commit ERROR: serialization failure Predicate locking detects the dependency! The second transaction would have to retry, and the SUM test would fail. Thursday, October 11, 12
  45. Performance comparison The READ COMMITTED SELECT FOR UPDATE approach is

    essentially pessimistic locking: get the lock first, then do your work. The REPEATABLE READ retry-loop approach is essentially optimistic locking with retry. It avoids the overhead of getting the row lock beforehand, at the cost of having to redo the whole transaction if there is a conflict. SERIALIZABLE uses predicate locking to ensure concurrent writers don’t conflict. Overhead is significant, but may simplify code and ensures safety in complex, high concurrency situations. Practically, you’ll see more rollbacks than in REPEATABLE READ. SELECT FOR UPDATE wins if contention is heavy (since the serializable approach will be wasting work pretty often). REPEATABLE READ wins if contention for specific rows is light. SERIALIZABLE wins when you need predicate locking to ensure data safety. Thursday, October 11, 12
  46. • PostgreSQL Documentation for SET TRANSACTION and Concurrency Control •

    Presentations by Kevin Grittner and Dan Ports about Serializable Snapshot Isolation (SSI) To learn more... Thursday, October 11, 12
  47. Temp tables for a query WITH regional_sales AS ( SELECT

    region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product; Thursday, October 11, 12
  48. Recursive CTE WITH RECURSIVE t1(a, b) AS ( VALUES(0,0) UNION

    ALL SELECT CASE CAST(b AS BOOLEAN) WHEN b % 3 = 0 THEN b WHEN b % 5 = 0 THEN b END, b + 1 FROM t1 WHERE b < 1000 ) SELECT sum(a) FROM t1; Euler Project, question 1: If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000. http://wiki.postgresql.org/wiki/Euler_Project,_Question_1 Thursday, October 11, 12
  49. Writeable CTE WITH deleted_posts AS ( DELETE FROM posts WHERE

    created < now() - '6 months'::INTERVAL RETURNING * ) SELECT user_id, count(*) FROM deleted_posts group BY 1; Makes maintaining partitions easier: http://www.depesz.com/index.php/2011/03/16/waiting-for-9-1-writable-cte/ http://xzilla.net/blog/2011/Mar/Upserting-via-Writeable-CTE.html Thursday, October 11, 12
  50. Real world CTE Analyzing sample data to detect correlations. Start

    with a semi-structured text field: AdapterVendorID: 1002, AdapterDeviceID: 6779, AdapterDriverVersion: 8.784.1.0 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ Thursday, October 11, 12
  51. Pick a variable: substring(app_notes from E'AdapterDeviceID: (\\w+)') Normalize the data

    by creating tables to hold all the DeviceIDs... OR... Thursday, October 11, 12
  52. WITH AllDeviceCrashes AS ( SELECT count(*) crash_by_deviceid , substring(app_notes from

    E'AdapterDeviceID: (\\w+)') AdapterDeviceID FROM reports WHERE GROUP BY AdapterDeviceID ) SELECT ... Thursday, October 11, 12
  53. ` WITH AllDeviceCrashes AS ( ... ), AllSignatureCrashes AS (

    ... ), AllSignatureDeviceCrashes AS ( ... ), AllCrashes AS ( ... ) SELECT Adapter, Signature, crash_sig / total_crashes as “P(sig)”, crash_devid / total_crashes as “P(devid)”, crash_sig_devid / total_crashes as “P(sig, devid)”, ... Thursday, October 11, 12
  54. Transactional DDL No MyISAM tables ANSI compliance (MORE THAN ONCE)

    Portability Table functions Stored procedures in tons of languages Complex types Extensibility LISTEN/NOTIFY Foreign Data Wrappers Common Table Expressions Functional INDEXes Temporal data types Simple licensing psql Data integrity Sugary syntax Stability JSON datatype Fast bug fixes ... What they said: Thursday, October 11, 12
  55. Plans for 9.3 • Event Triggers (DDL, for example) •

    Dramatically reduce shared memory use • LATERAL (performance, prettier queries) • UPSERT • Expansion of JSON type support • Performance improvement in SSI Thursday, October 11, 12
  56. Where to watch • Depesz “Waiting for 9.3” • Planet

    PostgreSQL http://planet.postgresql.org • Talks for FOSDEM 2013 and PgCon 2013 Thursday, October 11, 12
  57. “[A] funny thing happened on the way to the SQL

    cemetery: PostgreSQL became cool again.” -Matt Asay http://www.theregister.co.uk/2012/08/31/postgresql_too_cool_for_school/ Thursday, October 11, 12