Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serializability and Snapshot Isolation.pdf

Serializability and Snapshot Isolation.pdf

Wataru Hirota

July 24, 2019
Tweet

More Decks by Wataru Hirota

Other Decks in Programming

Transcript

  1. Outline - Write skew and snapshot Isolation - 2PL: a

    famous implementation of serializable isolation - Compromise for the trade-off between performance and preventing from race condition
  2. Begin by an example; Is this a case of lost

    update? A doctor call system Constraint: At least one doctor should be on-call (a doctor can leave iff currently_on_call >= 2) Situation: Bob and Alice are on-call -> Bob read the current number (=2) -> Alice read the current number (=2) -> Bob left -> Alice left (anomaly) Is this a lost update?
  3. Write skew: multiple objects are involved Lost update One transaction

    overwrites the other’s write without incorporating its changes (so data is lost). Write skew Update not lost, but constraints fail due to concurrent commits.
  4. How to prevent from write skew? Current situation - Automatic

    detection △ - Most popular implementations of snapshot isolation don’t support the automatic detection of write skew. - Constraints that involves multiple objects △ - Few databases support - Use serializable isolation ✓ - SELECT FOR UPDATE ✓
  5. SELECT FOR UPDATE: a locking statement BEGIN TRANSACTION; SELECT *

    FROM doctors WHERE on_call = true AND shift_id = 1234 FOR UPDATE; UPDATE doctors SET on_call = false WHERE name = 'Alice' AND shift_id = 1234; COMMIT;
  6. Phantoms causing write skew - SELECT query to check if

    some requirements are satisfied - Depending on the result of the first query, the application code decides how to continue - If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction.
  7. Phantoms causing write skew - SELECT query to check if

    some requirements are satisfied - Depending on the result of the first query, the application code decides how to continue - If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction.
  8. Phantoms causing write skew - SELECT query to check if

    some requirements are satisfied - Depending on the result of the first query, the application code decides how to continue - If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction.
  9. Current situation around isolation level - Isolation level is hard

    to understand - Inconsistent term usage - “repeatable read” varies from database engines - it’s difficult to tell whether it is safe to run at a particular isolation level - Some static analysis tools may help race detection - but it’s still hard because most races happen due to bad timings
  10. Is single-threaded execution actually efficient? - Transactions can execute much

    faster - RAM is getting cheaper → the entire active dataset can be on memory - OLTP transactions are usually short - cf) OLAP … usually adopts snapshot isolation (typically read only)
  11. Pros and cons of stored procedure Pros - Prevent from

    the overhead of other concurrency control mechanisms Cons - Code running in a database is difficult to manage - debug, deploy - Badly written stored procedure can cause much more troubles
  12. Partitioning is effective If each transaction reads and writes only

    a single partition → across multiple partitions → (additional overhead)
  13. 2 Phase Locking (2PL) - Widely used algorithm for serializability

    - So it prevents from write skew - Writers don’t just block other writers; they also block readers and vice versa - (cf) Snapshot Isolation … readers never block writers and writers never block readers Acquire a lock Release the lock
  14. Read and write in 2PL Read - shared mode lock

    - some transactions are allowed simultaneously - if the object has an exclusive lock, the read should wait for it Write - Exclusive mode lock - No transactions can have a lock on the object at the same time
  15. Performance of 2PL - Throughput: 2PL << weak isolation -

    2PL heavily reduces concurrency - No limit on how long it may have to wait - frequent dead-lock
  16. Predicate lock 2PL disabled lots of concurrency - One of

    the problem is a too large locking range → How about narrowing down the range of locking lock objects which meet the condition
  17. Index-range query: more efficient than predicate lock Performance of predicate

    query is poor; - many locks - checking locks is time-consuming Index-range locking locks a greater set of objects for the fast checking of locks e.g. room 123 →room 100-150 (a greater and easy to search set)
  18. There’s a major trade-off: performance vs race Serializable isolation ✓

    no race condition Snapshot Isolation ✓ good performance ✓ scalability
  19. There’s a major trade-off: performance vs race Serializable isolation ✓

    No race condition Snapshot Isolation ✓ Good performance ✓ Scalability Serializable snapshot isolation…?
  20. Serializable Snapshot Isolation (SSI) - Fully serializable - Small performance

    penalty compared to snapshot isolation - Firstly execute query, and abort it when it is suspicious SSI is fairly new: it was first described in 2008 [40] and is the subject of Michael Cahill’s PhD thesis [51].
  21. Pessimistic vs optimistic Serializable isolation Pessimistic “anything might possibly go

    wrong” Serializable snapshot isolation Optimistic “Instead of locking, commits which violate constraints will be aborted”
  22. When to use optimistic concurrency control? - enough sparse capacity

    - connection between transactions is not too high e.g. counter increment - Order doesn’t matter - as long as the counter isn’t read in the same transaction,
  23. How the database know if a query result might have

    changed? We need to consider - Detecting reads of a old MVCC object version - Detecting writes that affect prior reads
  24. Detect stale MVCC reads MVCC: Multi Version Concurrency Control Check

    all premises are up-to-date Not checked here
  25. Summary - Write skew may happen in snapshot Isolation -

    2PL: a famous implementation of serializable isolation, usually with index-range lock - Serializable Snapshot Isolation performs well, while guarantee of serializability