Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transaction Isolation in Postgres (with cats)

Transaction Isolation in Postgres (with cats)

Transactions are a key part of many modern databases, relational and non-relational systems alike. At a basic level, transactions allow us to treat a series of operations as a single unit. Transactions are so important because they provide guarantees that developers can use as assumptions when writing code. This means that you, the developer, don't need to worry about an entire set of concerns because the DB guarantees certain behaviors. This greatly simplifies the code and drastically improves its reliability.

What are these guarantees? You may already be familiar with ACID, which refers to atomicity, consistency, isolation, and durability. From these sets of guarantees, “isolation” tends to be the most confusing, which is a shame because it is also the one where a deeper understanding directly translates to making better design decisions and more secure SaaS.

I start by explaining what problem transaction isolation is even trying to solve. Then, I explain the standard isolation levels as they appear in the SQL92 standard and are still mostly used today. Then, we talk about the problems with SQL92 levels and how Postgres handles isolation and these problems today. This talk will illustrate isolation-level problems and solutions with many SQL examples involving my cats and my attempts to track their eating.

Gwen (Chen) Shapira

April 20, 2024
Tweet

More Decks by Gwen (Chen) Shapira

Other Decks in Programming

Transcript

  1. Nice to meet you! I’m Gwen Shapira - Co-founder of

    Nile: Serverless Postgres for Moden SaaS - Previously: Cloud Native Kafka lead @ Confluent - Previously: Data architect (Hadoop, MySQL, Oracle) - Apache Kafka PMC - Wrote books, tweet a lot @gwenshap 02
  2. We’ll talk about: - Cats. - How to accurately track

    how much they eat - SQL 92 isolation levels - Postgres isolation levels and how they are better - How Postgres Isolation is implemented efficiently
  3. Hello, transactions! BEGIN; UPDATE cat_meals SET dried_food_given = dried_food_given+100 WHERE

    cat_name = 'Luke' and date = '2020-01-01'; UPDATE food_inventory SET left_quantity = left_quantity - 100 WHERE food_name = 'dried_food'; COMMIT;
  4. A C I D tomic - when the cats are

    fed, the inventory is updated onsistent - can’t feed non-existing cats or negative inventory solated - my husband and I can safely feed cats at the same time urable - even if the DB crashes, everything is up to date
  5. What isolation guarantees BEGIN; UPDATE cat_meals SET dried_food_given = dried_food_given+100

    WHERE cat_name = 'Luke' and currdate = '2020- 01-01'; UPDATE food_inventory SET left_quantity = left_quantity - 100 WHERE food_name = 'dried_food'; COMMIT; SELECT * FROM cat_meals WHERE currdate = '2020-01-01';
  6. SQL92 committee started by defining an ideal When you have

    multiple transactions happening at the same time, the resulting state of the DB is one that is possible to achieve by running this set of transactions not concurrently, but sequentially.
  7. Read uncommitted => dirty reads BEGIN; INSERT INTO food_inventory (food_name,

    left_quantity) values ('bonitas', 245); -- stopped at a coffee shop and forgot the food there ROLLBACK; SELECT * FROM food_inventory;
  8. Read committed => non-repeatable reads BEGIN; UPDATE cats SET weight

    = weight-2 WHERE name = 'Luke' and currdate = '2020-01- 02'; COMMIT; BEGIN; SELECT name, weight FROM cats WHERE currdate = current_date; -- a lot more reporting… SELECT sex, avg(weight) FROM cats WHERE currdate = current_date GROUP BY sex;
  9. Write after read problem BEGIN; UPDATE cat_meals SET dried_food_given =

    dried_food_given+100 WHERE cat_name = 'Luke' and currdate = '2020- 01-01'; COMMIT; BEGIN; IF (SELECT dried_food_given FROM cat_meals WHERE cat_name = 'Luke' and currdate = '2020- 01-01') < 240 THEN UPDATE cat_meals SET dried_food_given = dried_food_given + 20 WHERE cat_name = 'Luke' and currdate = '2020- 01-01'; END IF; COMMIT;
  10. Better check: UPDATE cat_meals SET dried_food_given = dried_food_given + 20

    WHERE cat_name = 'Luke' and currdate = '2020-01-01' and dried_food_given < 240;
  11. But this can be tricky UPDATE cat_meals SET dried_food_given =

    dried_food_given + 20 WHERE currdate = '2020-01-01' and cat_name in ( SELECT cat_name FROM cat_meals WHERE dried_food_given < 240 );
  12. We can fix it UPDATE cat_meals SET dried_food_given = dried_food_given

    + 20 WHERE currdate = '2020-01-01' and cat_name in ( SELECT cat_name FROM cat_meals WHERE dried_food_given < 240 FOR UPDATE -- making sure no one else can feed cats at the same time );
  13. Job queue on Postgres: BEGIN; SELECT job FROM job_queue FOR

    UPDATE SKIP LOCKED LIMIT 1 -- do stuff for the job DELETE job from job_queue COMMIT
  14. Repeatable Reads => Phantom Reads BEGIN; UPDATE cats SET weight

    = 24 WHERE name = 'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats;
  15. Some isolation levels no longer make much sense. Let’s take

    “Repeatable Reads”. Why would it allow for phantom rows?
  16. MVCC implementation • Sequential transaction ID / XID assigned to

    each transaction • Store multiple versions of same logical row • Each version has ID of transactions that created and deleted it. • Rules about what each transaction is / is not allowed to see
  17. Follow along as these transaction execute In repeatable read /

    serializable mode BEGIN; UPDATE cats SET weight = 24 WHERE name = 'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats;
  18. BEGIN; UPDATE cats SET weight = 24 WHERE name =

    'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; Take snapshot: select * from pg_current_snapshot() -- Lowest active: 190
  19. BEGIN; UPDATE cats SET weight = 24 WHERE name =

    'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; t_xmin t_xmax name weight 125 0 Luke 19
  20. BEGIN; UPDATE cats SET weight = 24 WHERE name =

    'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; t_xmin t_xmax name weight 125 201 Luke 19 201 0 Luke 24 Assigned TXID 201
  21. BEGIN; UPDATE cats SET weight = 24 WHERE name =

    'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; t_xmin t_xmax name weight 125 201 Luke 19 201 0 Luke 24 201 0 Leia 10
  22. BEGIN; UPDATE cats SET weight = 24 WHERE name =

    'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; t_xmin t_xmax name weight 125 201 Luke 19 201 0 Luke 24 201 0 Leia 10 Only sees version 190 and below
  23. Read committed BEGIN; UPDATE cats SET weight = 24 WHERE

    name = 'Luke'; INSERT INTO cats (name, weight) VALUES ('Leia', 10); COMMIT; BEGIN; SELECT * FROM cats; -- a lot more reporting… SELECT * FROM cats; Take snapshot: select * from pg_current_snapshot() Take snapshot: select * from pg_current_snapshot()
  24. Serialization error BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT * FROM

    cats; UPDATE cats SET weight = weight+3 WHERE name = 'Luke'; COMMIT; BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT * FROM cats; UPDATE cats SET weight = 24 WHERE name = 'Luke'; COMMIT; Attempt to modify a value with new version we can’t see. Leading to “lost write” anomaly.
  25. 3. In high concurrency systems, serialization errors lead to rollbacks

    and retries. Which can be a performance issue.
  26. Testing - postgres/src/test/isolation - Each test defines sessions and series

    of transaction steps - Specify interleaving to test (or it will automatically try them all) - Compare to expected results