Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PGOpen '17 - Concurrency Deep-Dive

Tejas Manohar
September 07, 2017

PGOpen '17 - Concurrency Deep-Dive

Unlike many traditional RBDMS, PostgreSQL was architected with concurrency at top of mind. However, many developers (including the speaker once upon a time) don't understand all its magic under the hood. In this talk, we'll explore how Postgres deals with concurrency (MVCC, locks, transactions, etc.), why it matters, and how to make the best of it (with production stories!).

https://postgresql.us/events/schedule/pgopen2017/session/374-concurrency-deep-dive/

Tejas Manohar

September 07, 2017
Tweet

More Decks by Tejas Manohar

Other Decks in Programming

Transcript

  1. POSTGRES CONCURRENCY $ whoami ▸ Tejas Manohar ▸ tejasmanohar on

    GitHub, Twitter, etc. ▸ Software Engineer at Segment
  2. POSTGRES CONCURRENCY WHAT'S THIS TALK ABOUT? ▸ Common ways the

    world handles concurrency ▸ How Postgres implements concurrency control ▸ How you should think about concurrency
  3. WHAT DO DATABASES VALUE? HOW DO WE EVALUATE DATABASES? ▸

    Cost ▸ Scope ▸ Performance ▸ Guarantees
  4. WHAT DO DATABASES GUARANTEE? WELCOME TO BUZZWORD HEAVEN ▸ Many

    databases are "transactional" ▸ Some databases are "ACID" compliant ▸ Others are considered "BASE" ▸ Distributed stores are graded using "CAP" theorem
  5. WHAT DO DATABASES GUARANTEE? POSTGRES IS TRANSACTIONAL transaction. noun unit

    of work performed by a database, characterized by being Atomic, Consistent, Isolated, and Durable ("ACID")
  6. WHAT DOES DATABASES GUARANTEE? WHAT DOES A TRANSACTION LOOK LIKE?

    BEGIN; INSERT INTO transfers (sender, recipient, amount) VALUES ('alice_xyz', 'bob123' 100); UPDATE accounts SET balance = balance - 100 WHERE id = 'alice_xyz'; UPDATE accounts SET balance = balance + 100 WHERE id = 'bob123'; COMMIT;
  7. WHAT DOES DATABASES GUARANTEE? EVERYTHING IS A TRANSACTION BEGIN; INSERT

    INTO my_table VALUES ('useless', 101); COMMIT;
  8. TRANSACTIONS GUARANTEE ATOMICITY atomic. adjective "all or nothing" operation, no

    partial failures derived from Greek atomos for "undividable" ex. If US Bank fails to add to Bob's balance, it should not deduct from Alice's balance. WHAT DOES DATABASES GUARANTEE?
  9. TRANSACTIONS GUARANTEE CONSISTENCY consistent. adjective transactions must keep database in

    valid state WHAT DOES DATABASES GUARANTEE? create table accounts ( id text unique, balance int, constraint no_overdraft (check balance >= 0) );
  10. TRANSACTIONS GUARANTEE ISOLATION isolated. adjective transactions executed concurrently produce the

    same result as transactions executed serially ex. Isolation allows us to write queries as if they're the only thing running against the database. WHAT DOES DATABASES GUARANTEE?
  11. TRANSACTIONS GUARANTEE DURABILITY durable. adjective committed transactions cannot be rolled

    back, even in the case of failure (e.g. kernel panic, RAM failure, etc.) WHAT DOES DATABASES GUARANTEE?
  12. WHY HANDLE CONCURRENCY CONCURRENCY CONTROL PREVENTS RACE CONDITIONS ▸ Dirty

    read-- tx1 reads tx2's uncommitted work ▸ Non-repeatable read -- tx1 reads row twice and gets different result ▸ Phantom read -- tx1 sees row in stmt2 that wasn't reported in stmt1, as tx2 deleted it ▸ Serialization anomaly -- concurrent execution of transactions are nondeterministic or arbitrary
  13. HOW DO YOU HANDLE CONCURRENCY NAIVE SOLUTION: LOCKING ▸ NOT

    Postgres' main strategy of concurrency control ▸ Acquire a lock before each operation, release afterwards ▸ Prevent race conditions altogether func Read(key string) string { mutex.Lock() defer mutex.Unlock() return sharedMemory[key] } func Write(key, val string) { mutex.Lock() defer mutex.Unlock() sharedMemory[key] = val }
  14. "SLIGHTLY LESS NAIVE" LOCKING func Read(key string) string { rwMutex.RLock()

    defer rwMutex.RUnlock() return sharedMemory[key] } func Write(key, val string) { rwMutex.Lock() defer rwMutex.Unlock() sharedMemory[key] = val } HOW DO YOU HANDLE CONCURRENCY
  15. HOW DOES YOUR DATABASE THINK? ▸ Two major lines of

    thought for concurrency control ▸ Pessimistic concurrency control locks everything ▸ Two-phase Locking ▸ Optimistic concurrency control aborts conflicts ▸ Timestamp ordering HOW DO DATABASES HANDLE CONCURRENCY
  16. HOW DO DATABASES HANDLE CONCURRENCY TWO-PHASE LOCKING (2PL) IS PESSIMISTIC

    ▸ Original concurrency control (1970s) ▸ Readers acquire a shared lock ▸ Writers acquire an exclusive lock ▸ Not optimized for high throughput
  17. TIMESTAMP-ORDERING IS OPTIMISTIC ▸ Lock-free concurrency control ▸ The "timestamp"

    must be monotonically increasing ▸ Timestamp is used to evaluate or detect conflicts HOW DO DATABASES HANDLE CONCURRENCY
  18. HOW DOES MVCC WORK POSTGRES IMPLEMENTS MVCC ▸ Multi-Version Concurrency

    Control (MVCC) ▸ Postgres makes multiple versions or "snapshots" of the data for each write transaction ▸ Reading never blocks writing ▸ Writing never blocks reading
  19. AT T=2, UPDATE THAT RECORD UPDATE t SET score =

    5 WHERE name='bob' HOW DOES MVCC WORK
  20. HOW DOES MVCC WORK? Begin TS End TS 0 ∞

    → 2 2 ∞ HOW DOES MVCC WORK
  21. WHAT HAPPENS WHEN WE READ? Begin TS End TS 0

    2 2 ∞ SELECT at t = 3 HOW DOES MVCC WORK
  22. WHAT HAPPENS WHEN WE READ? Begin TS End TS 0

    2 2 ∞ SELECT at t = 3 3 > 2 HOW DOES MVCC WORK
  23. WHAT HAPPENS WHEN WE READ? Begin TS End TS 0

    2 2 ∞ SELECT at t = 3 ∞ > 3 HOW DOES MVCC WORK
  24. WHAT HAPPENS WHEN WE DELETE? Begin TS End TS 0

    2 2 ∞ → 4 HOW DOES MVCC WORK
  25. AT T=5, READ AGAIN Begin TS End TS 0 2

    2 4 SELECT at t = 5 5 > 2 && 5 > 4 => NOTHING HOW DOES MVCC WORK
  26. BUT, IF T=1 READS AGAIN? Begin TS End TS 0

    2 2 4 SELECT at t = 1 HOW DOES MVCC WORK
  27. HOW DOES POSTGRES MODEL THE VERSION CHAIN? HOW DOES MVCC

    WORK ▸ "Begin TS" -> xmin ▸ "End TS" -> xmax ▸ "x" ≈ tx ≈ transaction ID ▸ Instead of a timestamp, "x" is just a sequence
  28. UNDER THE HOOD HOW DOES MVCC WORK postgres@localhost:postgres> CREATE TABLE

    t (i int) postgres@localhost:postgres> INSERT INTO t VALUES (1) postgres@localhost:postgres> SELECT xmin, xmax, * FROM t -[ RECORD 1 ]------------------------- xmin | 557 xmax | 0 i | 1 postgres@localhost:postgres> UPDATE t SET i = 2 postgres@localhost:postgres> SELECT xmin, xmax, * FROM t -[ RECORD 1 ]------------------------- xmin | 558 xmax | 0 i | 2
  29. WHAT HAPPENS IF WE REMOVE VACUUMS? HOW DOES MVCC WORK

    ▸ You're going to run out of disk space ▸ ... or, really weird stuff many queries later
  30. WHAT HAPPENS TO THE OLD VERSIONS? HOW DOES MVCC WORK

    ▸ VACUUMs to the rescue! The process is simple. ▸ Find lowest open transaction ID (xid) ▸ Delete all versions where min(xid) > tx.xmax ▸ If you don't VACUUM, you keep everything
  31. SQL TRANSACTION ISOLATION LEVELS WHAT DOES MVCC GUARANTEE? ▸ ANSI

    SQL defines 4 levels ▸ Read Uncommitted ▸ Read Committed ▸ Repeatable Read ▸ Serializable ▸ Postgres does not allow "Read Uncommitted"
  32. WHAT DOES "READ COMMITTED" GUARANTEE? WHAT DOES MVCC GUARANTEE? ▸

    Default and lowest isolation level in Postgres ▸ Each statement in transaction sees latest committed snapshot ▸ Allows non-repeatable reads, phantom reads, and other classes of serialization anomalies
  33. WHAT DOES "REPEATABLE READ" GUARANTEE? WHAT DOES MVCC GUARANTEE? ▸

    Snapshot for all statements in a transaction is determined before the transaction begins ▸ Allows serialization anomalies
  34. THERE ARE STILL ANOMALIES... WHAT DOES MVCC GUARANTEE? ▸ Imagine

    a bank needs to verify the sum of all your accounts' balances is over 0 ▸ tx1 reads both accounts and updates account 1 ▸ tx2 reads both accounts and updates account 2 ▸ Both update different rows so no conflicts?
  35. WHAT DOES "SERIALIZABLE" GUARANTEE? WHAT DOES MVCC GUARANTEE? ▸ Strictest

    isolation level, entirely deterministic ▸ Guesses write dependencies based on orders of reads & writes in a transaction ▸ Result serially and concurrently should be identical ▸ Never sacrifices isolation for anomalies ▸ More frequently aborts transactions due to "serialization failures", do not consider data good until committed
  36. TRANSACTION ISOLATION LEVELS WHAT DOES MVCC GUARANTEE? Non-repeatable Read Phantom

    Serialization Anomaly Read committed Yes Yes Yes Repeatable Read No No Yes Serializable No No No
  37. POSTGRES HIDES LOCKS TRICKS OF THE TRADE ▸ MVCC provides

    "lock-free concurrency" but many operations require light locks under the hood ▸ Don't need to be an expert on lock taxonomy to administer Postgres ▸ Taxonomy of locks (5+ types) ▸ pg_locks is your on-call friend
  38. USE ON CONFLICT TO PREVENT CONFLICTS TRICKS OF THE TRADE

    ▸ Prefer ON CONFLICT to retrying aborted transactions ▸ Especially useful for batch operations INSERT INTO transfers VALUES ($id, 'bob', 'alice', 100) ON CONFLICT (id) DO NOTHING;