Safety and Convenience: PostgreSQL's concurrency model and a CTE HOWTO

Slide 1

Slide 1 text

Thursday, October 11, 12

Slide 2

Slide 2 text

How Postgres handles concurrency. Also, CTEs. Selena Deckelmann @selenamarie [email protected] Thursday, October 11, 12

Slide 3

Slide 3 text

Safety and convenience Thursday, October 11, 12

Slide 4

Slide 4 text

select ‘fail’ from test where now() - ‘1 hour’::interval > max(timestamp) Thursday, October 11, 12

Slide 5

Slide 5 text

with maxcte as ( select max(stamp) as lastupdate from test ) select ‘fail’ from maxcte where now() - ‘1 hour’::interval > lastupdate; Thursday, October 11, 12

Slide 6

Slide 6 text

"Postgres is bloatware by design: it was built to house PhD theses." -Joe Hellerstein, 2002 Thursday, October 11, 12

Slide 7

Slide 7 text

• Started in 1986 as a PhD project • First open source commit: July 1996 • About 672,670 lines of C as of 10/10/12 Thursday, October 11, 12

Slide 8

Slide 8 text

7.4 (2002-2010) 8.0 (2004-2010) 8.1 (2005-2010) 8.2 (2005-2011) 8.3 (2006-2012) 8.4 (2008-2012) 9.0 (2010-2012) 9.1 (2010-2012) 9.2 (2011-2012) 0 175 350 525 700 14 15 17 18 20 17 18 20 23 289 336 298 450 624 455 421 342 470 Number of PostgreSQL contributors by release Contributors Named Committers Thursday, October 11, 12

Slide 9

Slide 9 text

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 0 75 150 225 300 13 14 16 16 13 15 11 13 14 15 13 196 242 240 245 209 238 223 276 248 260 244 PostgreSQL contributors by year Contributors Named Committers Thursday, October 11, 12

Slide 10

Slide 10 text

"Salesforce.com bases its entire cloud on Oracle database, but its database platform offering is PostgreSQL. I ﬁnd that interesting." -Larry Ellison, OpenWorld 2012 Thursday, October 11, 12

Slide 11

Slide 11 text

How Postgres handles concurrent operations How to write application queries to get the right results, without sacriﬁcing performance. Thursday, October 11, 12

Slide 12

Slide 12 text

What we want as developers: Transactions that appear to execute one at a time. Easy way? Run one transaction at a time. This is the thinking behind a global lock. Thursday, October 11, 12

Slide 13

Slide 13 text

Concurrency the hard right way Snapshot Isolation Transactions appear to execute one at a time, in order. Also, each transaction has a version of the world that it sees, isolated from other transactions. Thursday, October 11, 12

Slide 14

Slide 14 text

Inside a snapshot Each transaction runs against a “snapshot” of the committed state of the database at a particular instant. It cannot see the effects of uncommitted transactions (except itself), nor of transactions that commit after the snapshot time. Every update of a database row generates a new version of the row. Snapshot information determines which version of a given row is visible to a given transaction. Old versions of rows can be reclaimed once they are no longer visible to any running transaction. Thursday, October 11, 12

Slide 15

Slide 15 text

How this works “Bunnies are dangerous” Thursday, October 11, 12

Slide 16

Slide 16 text

“Bunnies are dangerous” ID #2 - SELECT Thursday, October 11, 12

Slide 17

Slide 17 text

“Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.” ID #3 - UPDATE There’s a new row version! Thursday, October 11, 12

Slide 18

Slide 18 text

“Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.” Once Transaction ID #3 is committed, the original row is no longer visible to future transactions, but still exists and is visible to ID #2. Thursday, October 11, 12

Slide 19

Slide 19 text

“Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.” ID #4 - SELECT Thursday, October 11, 12

Slide 20

Slide 20 text

“Bunnies are dangerous” ID #2 - SELECT “Bunnies are adorable.” ID #4 - SELECT Once Transaction ID #2 ends, then the row associated with ID #2 is no longer visible to any future transactions. Thursday, October 11, 12

Slide 21

Slide 21 text

“Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” ID #6 - UPDATE Thursday, October 11, 12

Slide 22

Slide 22 text

“Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” ID #6 - UPDATE ID #6 - ROLLBACK Thursday, October 11, 12

Slide 23

Slide 23 text

Vacuum? “Cats are not very cute.” “Cats are adorable.” “Cats should rule the world.” “Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” Thursday, October 11, 12

Slide 24

Slide 24 text

“Cats are not very cute.” “Cats are adorable.” “Cats should rule the world.” VACUUM cleans up these rows Vacuum! “Bunnies are dangerous” “Bunnies are adorable.” “Bunnies will rule the world.” Thursday, October 11, 12

Slide 25

Slide 25 text

“Cats are adorable.” “Bunnies are adorable.” Thursday, October 11, 12

Slide 26

Slide 26 text

Life of a row Versions of a row are visible for the life of a transaction. New versions do not become visible until the transaction that created the rows commits. Cleaning up “dead” versions of rows is handled by VACUUM, asynchronously and automatically. Thursday, October 11, 12

Slide 27

Slide 27 text

Three kinds of transaction isolation READ COMMITTED See all committed activity at QUERY start REPEATABLE READ See all committed activity at TRANSACTION start. Throw serialization error on conﬂicts. SERIALIZABLE Guarantee apparent serial execution of transactions with predicate locking. Thursday, October 11, 12

Slide 28

Slide 28 text

Predicate locking is awesome. Postgres keeps track of the rows that your query is changing or accessing and uses that to determine whether or not a serialization conﬂict might occur. Thursday, October 11, 12

Slide 29

Slide 29 text

BEGIN TRANSACTION ISOLATION SERIALIZABLE; UPDATE clicks SET count = count + 1; BEGIN TRANSACTION ISOLATION SERIALIZABLE; DELETE FROM clicks WHERE count < 10; Thursday, October 11, 12

Slide 30

Slide 30 text

COMMIT; And afterward we commit both transactions. COMMIT; ERROR: could not serialize access due to concurrent update Thursday, October 11, 12

Slide 31

Slide 31 text

BEGIN TRANSACTION ISOLATION READ COMMITTED; UPDATE clicks SET count = count + 1; TRY AGAIN. BEGIN TRANSACTION ISOLATION READ COMMITTED; DELETE FROM clicks WHERE count < 10; Thursday, October 11, 12

Slide 32

Slide 32 text

COMMIT; And afterward we commit both transactions. COMMIT; This is what we told Postgres to do: Before each query, refresh the snapshot. So, we see changes carried out by other, concurrent transactions and that affects our operation. Thursday, October 11, 12

Slide 33

Slide 33 text

How to write application queries to get the right results, without sacriﬁcing performance. Thursday, October 11, 12

Slide 34

Slide 34 text

Easy cases All read-only transactions Many reader, one writer Both are trivial - readers don’t block each other. One writer in a transaction is invisible to readers until after commit. Thursday, October 11, 12

Slide 35

Slide 35 text

Closer look at one-writer The writer’s updates will all become visible when it commits. Readers running in SERIALIZABLE or READ COMMITTED mode will not see the changes from writers that commit after they start. So, readers have a stable view of the database throughout each transaction. Readers running in READ COMMITTED mode will be able to see changes beginning at ﬁrst SQL statement issued after writer commits. So, the view is stable within a statement but not across statements. This is messier, but necessary if (for example) you have procedures that want to observe concurrent changes. Either way, readers do not block the writer, and the writer does not block readers, so concurrent performance is excellent. Thursday, October 11, 12

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Suppose we want to cross check deposit totals against bank branch totals: BEGIN; SELECT SUM(balance) FROM accounts; SELECT SUM(branch balance) FROM branches; -- check to see that we got the same result COMMIT; This is correct in SERIALIZABLE or REPEATABLE READ mode, but wrong in READ COMMITTED mode. To select SERIALIZABLE mode, either add SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; to each transaction block (just after BEGIN), or do SET DEFAULT TRANSACTION ISOLATION TO SERIALIZABLE READ ONLY; (this can be set in the postgresql.conf ﬁle, or per-connection). Thursday, October 11, 12

Slide 38

Slide 38 text

Rules for read-only transactions Don’t put more reader activity into a single transaction than is necessary. Pure reader transactions should typically be run in SERIALIZABLE READ ONLY mode, so that they see a consistent database view throughout the transaction; then the transaction boundaries deﬁne the times when you want to recognize updates committed by other clients. If you start one serializable transaction and hold it for the entire run of an application, you’ll never observe any changes in database state, other than those you make yourself. This is what you want in a few cases (for instance, the backup utility pg_dump does it to make a self-consistent dump) but usually it’s not what you want. Thursday, October 11, 12

Slide 39

Slide 39 text

Beware of really long transactions Rows visible to the long-running transaction aren’t available to be reclaimed, even while they’re long-dead to other transactions. Postgres doesn’t have “rollback segments” like Oracle that can run out of space and bomb, but it is possible to run out of disk overall. Worst thing to do: Start a transaction (BEGIN;) and then go to sleep for hours or days. Thursday, October 11, 12

Slide 40

Slide 40 text

Rules for concurrent writing Two concurrent transactions cannot write (UPDATE, DELETE, or SELECT FOR UPDATE) the same row. They can proceed as long as they write non-overlapping sets of rows. If a transaction tries to write a row already written by a concurrent transaction: 1. If other transaction is still in progress, wait for it to commit or rollback. 2. If it rolled back, proceed (using the non-updated version of the row). 3. If it committed: Thursday, October 11, 12

Slide 41

Slide 41 text

Rules for concurrent writing 3. If it committed: (a) In READ COMMITTED mode: proceed, using the newly- committed (latest) version of the row as the starting point for my operation. (But ﬁrst, recheck the new row to see if it still satisﬁes the WHERE clause of my query; if not, ignore it.) (b) In REPEATABLE READ or SERIALIZABLE mode: abort with “can’t serialize” error. SERIALIZABLE uses predicate locking and REPEATABLE READ supports pre-9.1 behavior for legacy support. Thursday, October 11, 12

Slide 42

Slide 42 text

Concurrent writers can fail without some interlock. It’s easy to see that there could be a problem: UPDATE webpages SET hits = hits + 1 WHERE url = ’...’; CLIENT 1 CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes hits+1 = 532 computes hits+1 = 532 writes hits = 532 writes hits = 532 commit commit Oops . . . ﬁnal state of the row is 532, not 533 as we’d wish. Thursday, October 11, 12

Slide 43

Slide 43 text

READ COMMITTED works perfectly for this. That bug won’t occur in Postgres. In READ COMMITTED, the second client is forced to redo it’s calculation. CLIENT 1 CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes hits+1 = 532 computes hits+1 = 532 writes hits = 532 Tries to write, waits. commit re-reads row, sees hits = 532 re-computes hits+1 = 533 writes hits = 533 commit Thursday, October 11, 12

Slide 44

Slide 44 text

READ COMMITTED vs SERIALIZABLE In SERIALIZABLE mode, the second client would instead get a “Can’t serialize access due to concurrent update” error, and would have to retry the transaction until he succeeded. You always need to code a retry loop in the client when you are using SERIALIZABLE mode for a writing transaction. Why would you want to use SERIALIZABLE mode, given that it requires extra client-side programming? Because READ COMMITTED doesn’t scale very well to more complex situations. Thursday, October 11, 12

Slide 45

Slide 45 text

Example with separate read and write commands BEGIN; SELECT hits FROM webpages WHERE url = ’...’; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; COMMIT; This might look silly, but it’s not so silly if the “internal computation” is complex, or if you need data from multiple database rows to make the computation, or if you’d like to know exactly what you just updated the hits count to. Thursday, October 11, 12

Slide 46

Slide 46 text

READ COMMITTED for this case, does not work. CLIENT 1 CLIENT 2 reads row, sees hits = 531 reads row, sees hits = 531 computes $newval = 532 computes $newval = 532 writes hits = 532 Tries to write row, waits. commit re-reads row, sees hits = 532 re-computes $newval = 532 writes hits = 532 commit Thursday, October 11, 12

Slide 47

Slide 47 text

Solution #1: use SELECT FOR UPDATE, not just SELECT, to read the input data We can make it work by doing this: BEGIN; SELECT hits FROM webpages WHERE url = ’...’ FOR UPDATE; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; COMMIT; FOR UPDATE causes the initial SELECT to acquire a row lock on the target row, thus making it safe to do our internal computation and update the row. SELECT FOR UPDATE returns the latest updated contents of the row, so we aren’t working with stale data. Thursday, October 11, 12

Slide 48

Slide 48 text

How SELECT FOR UPDATE solution works. CLIENT 1 CLIENT 2 reads row, sees hits = 531 tries to lock row, waits. computes $newval = 532 writes hits = 532 commit locks row, reads hits = 532 re-computes $newval = 533 writes hits = 533 commit We need both the delay till commit and the read of the updated value to make this work. Notice SELECT FOR UPDATE may return data committed after its start time, which is different from plain SELECT’s behavior in READ COMMITTED mode. Thursday, October 11, 12

Slide 49

Slide 49 text

Solution #2: use REPEATABLE READ or SERIALIZABLE mode and retry on serialization error loop BEGIN; SELECT hits FROM webpages WHERE url = ’...’; -- client internally computes $newval = $hits + 1 UPDATE webpages SET hits = $newval WHERE url = ’...’; if (no error) break out of loop; else ROLLBACK; end loop COMMIT; This works because client will re-read the new hits value during retry. The UPDATE will succeed only if the data read is still current. Thursday, October 11, 12

Slide 50

Slide 50 text

The client could get confused, too Consider this small variant of the example, running in READ COMMITTED mode: BEGIN; SELECT hits FROM webpages WHERE url = ’...’; UPDATE webpages SET hits = hits + 1 WHERE url = ’...’; COMMIT; If someone else increments the hits count while the SELECT is running, the UPDATE will compute the correct updated value . . . but it will be two more than what we just read in the SELECT, not one more. (And that’s what we’d see if we did another SELECT.) If your application logic could get confused by this sort of thing, you need to use either SELECT FOR UPDATE or SERIALIZABLE mode. Thursday, October 11, 12

Slide 51

Slide 51 text

The difference between REPEATABLE READ and SERIALIZABLE mode REPEATABLE READ does not have the ability to take out predicate locks and does not use the conﬂict detection algorithm in SERIALIZABLE. Thursday, October 11, 12

Slide 52

Slide 52 text

Multi-row constraints, or why serializable is really serializable Suppose we want to check that we always have at least $1000 total cash in our checking and savings accounts. A typical transaction will look like: BEGIN; UPDATE myaccounts SET balance = balance - $withdrawal WHERE accountid = ’checking’; SELECT SUM(balance) FROM myaccounts; if (sum 1000.00) COMMIT; else ROLLBACK; Is this safe? YES, only in SERIALIZABLE mode. Thursday, October 11, 12

Slide 53

Slide 53 text

Suppose we have $600 in each account and we concurrently try to withdraw $200 from each, and we use SERIALIZABLE. CLIENT 1 CLIENT 2 reads checking balance, get 600 update checking balance to 400 reads savings balance, get 600 update savings balance to 400 run SUM, get 400+600=1000 run SUM, get 400+600=1000 commit ERROR: serialization failure Predicate locking detects the dependency! The second transaction would have to retry, and the SUM test would fail. Thursday, October 11, 12

Slide 54

Slide 54 text

Performance comparison The READ COMMITTED SELECT FOR UPDATE approach is essentially pessimistic locking: get the lock first, then do your work. The REPEATABLE READ retry-loop approach is essentially optimistic locking with retry. It avoids the overhead of getting the row lock beforehand, at the cost of having to redo the whole transaction if there is a conflict. SERIALIZABLE uses predicate locking to ensure concurrent writers don’t conflict. Overhead is significant, but may simplify code and ensures safety in complex, high concurrency situations. Practically, you’ll see more rollbacks than in REPEATABLE READ. SELECT FOR UPDATE wins if contention is heavy (since the serializable approach will be wasting work pretty often). REPEATABLE READ wins if contention for specific rows is light. SERIALIZABLE wins when you need predicate locking to ensure data safety. Thursday, October 11, 12

Slide 55

Slide 55 text

• PostgreSQL Documentation for SET TRANSACTION and Concurrency Control • Presentations by Kevin Grittner and Dan Ports about Serializable Snapshot Isolation (SSI) To learn more... Thursday, October 11, 12

Slide 56

Slide 56 text

Common Table Expressions Thursday, October 11, 12

Slide 57

Slide 57 text

Temp tables for a query WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product; Thursday, October 11, 12

Slide 58

Slide 58 text

Recursive CTE WITH RECURSIVE t1(a, b) AS ( VALUES(0,0) UNION ALL SELECT CASE CAST(b AS BOOLEAN) WHEN b % 3 = 0 THEN b WHEN b % 5 = 0 THEN b END, b + 1 FROM t1 WHERE b < 1000 ) SELECT sum(a) FROM t1; Euler Project, question 1: If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000. http://wiki.postgresql.org/wiki/Euler_Project,_Question_1 Thursday, October 11, 12

Slide 59

Slide 59 text

Writeable CTE WITH deleted_posts AS ( DELETE FROM posts WHERE created < now() - '6 months'::INTERVAL RETURNING * ) SELECT user_id, count(*) FROM deleted_posts group BY 1; Makes maintaining partitions easier: http://www.depesz.com/index.php/2011/03/16/waiting-for-9-1-writable-cte/ http://xzilla.net/blog/2011/Mar/Upserting-via-Writeable-CTE.html Thursday, October 11, 12

Slide 60

Slide 60 text

Real world CTE Analyzing sample data to detect correlations. Start with a semi-structured text ﬁeld: AdapterVendorID: 1002, AdapterDeviceID: 6779, AdapterDriverVersion: 8.784.1.0 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ Thursday, October 11, 12

Slide 61

Slide 61 text

Pick a variable: substring(app_notes from E'AdapterDeviceID: (\\w+)') Normalize the data by creating tables to hold all the DeviceIDs... OR... Thursday, October 11, 12

Slide 62

Slide 62 text

WITH AllDeviceCrashes AS ( SELECT count(*) crash_by_deviceid , substring(app_notes from E'AdapterDeviceID: (\\w+)') AdapterDeviceID FROM reports WHERE GROUP BY AdapterDeviceID ) SELECT ... Thursday, October 11, 12

Slide 63

Slide 63 text

` WITH AllDeviceCrashes AS ( ... ), AllSignatureCrashes AS ( ... ), AllSignatureDeviceCrashes AS ( ... ), AllCrashes AS ( ... ) SELECT Adapter, Signature, crash_sig / total_crashes as “P(sig)”, crash_devid / total_crashes as “P(devid)”, crash_sig_devid / total_crashes as “P(sig, devid)”, ... Thursday, October 11, 12

Slide 64

Slide 64 text

What features get you excited about Postgres? Thursday, October 11, 12

Slide 65

Slide 65 text

Transactional DDL No MyISAM tables ANSI compliance (MORE THAN ONCE) Portability Table functions Stored procedures in tons of languages Complex types Extensibility LISTEN/NOTIFY Foreign Data Wrappers Common Table Expressions Functional INDEXes Temporal data types Simple licensing psql Data integrity Sugary syntax Stability JSON datatype Fast bug ﬁxes ... What they said: Thursday, October 11, 12

Slide 66

Slide 66 text

Plans for 9.3 • Event Triggers (DDL, for example) • Dramatically reduce shared memory use • LATERAL (performance, prettier queries) • UPSERT • Expansion of JSON type support • Performance improvement in SSI Thursday, October 11, 12

Slide 67

Slide 67 text

Where to watch • Depesz “Waiting for 9.3” • Planet PostgreSQL http://planet.postgresql.org • Talks for FOSDEM 2013 and PgCon 2013 Thursday, October 11, 12

Slide 68

Slide 68 text

“[A] funny thing happened on the way to the SQL cemetery: PostgreSQL became cool again.” -Matt Asay http://www.theregister.co.uk/2012/08/31/postgresql_too_cool_for_school/ Thursday, October 11, 12

Slide 69

Slide 69 text

Thursday, October 11, 12