The Case for Invariant-Based Concurrency Control

B7dc26518988058faa50712248c80bd3?s=47 pbailis
January 05, 2015

The Case for Invariant-Based Concurrency Control

B7dc26518988058faa50712248c80bd3?s=128

pbailis

January 05, 2015
Tweet

Transcript

  1. CONCURRENCY CONTROL THE CASE FOR INVARIANT-BASED Peter Bailis UC Berkeley

    with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein
  2. CONCURRENCY CONTROL THE CASE FOR INVARIANT-BASED Peter Bailis UC Berkeley

    with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein CIDR 2015 Gong Show 5 January 2015, Pacific Grove, CA
  3. None
  4. None
  5. Serializability is expensive

  6. Use weaker models instead Serializability is expensive

  7. Use weaker models instead Serializability is expensive 1975!

  8. do not support serializability HANA [VLDB 2014]

  9. do not support serializability HANA Actian Ingres YES Aerospike NO

    N Persistit NO N Clustrix NO N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO N MS SQL Server YES NuoDB NO N Oracle 11G NO N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES* SAP Hana NO N ScaleDB NO N VoltDB YES Serializability supported? [VLDB 2014]
  10. do not support serializability HANA Actian Ingres YES Aerospike NO

    N Persistit NO N Clustrix NO N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO N MS SQL Server YES NuoDB NO N Oracle 11G NO N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES* SAP Hana NO N ScaleDB NO N VoltDB YES 8/18 databases surveyed didn’t 15/18 used weak models by default Serializability supported? [VLDB 2014]
  11. READ COMMITTED

  12. READ COMMITTED

  13. READ COMMITTED G0: Write Cycles. A history H exhibits phenomenon

    G0 if DSG(H) contains a directed cycle consisting entirely of write-dependency edges. G1a: Aborted Reads. A history H shows phenomenon G1a if it contains an aborted transaction T1 and a committed transaction T2 such that T2 has read some object (maybe via a predicate) modified by T1. G1b: Intermediate Reads. A history H shows phenomenon G1b if it contains a committed transaction T2 that has read a version of object x (maybe via a predicate) written by transaction T1 that was not T1’s final modification of x. G1c: Circular Information Flow. A history H exhibits phenomenon G1c if DSG(H) contains a directed cycle consisting entirely of dependency edges. [Atul Adya’s Ph.D, 1999]
  14. READ COMMITTED G0: Write Cycles. A history H exhibits phenomenon

    G0 if DSG(H) contains a directed cycle consisting entirely of write-dependency edges. G1a: Aborted Reads. A history H shows phenomenon G1a if it contains an aborted transaction T1 and a committed transaction T2 such that T2 has read some object (maybe via a predicate) modified by T1. G1b: Intermediate Reads. A history H shows phenomenon G1b if it contains a committed transaction T2 that has read a version of object x (maybe via a predicate) written by transaction T1 that was not T1’s final modification of x. G1c: Circular Information Flow. A history H exhibits phenomenon G1c if DSG(H) contains a directed cycle consisting entirely of dependency edges. [Atul Adya’s Ph.D, 1999] Highly nuanced, very technical, sometimes incomplete!
  15. It is insane to assume users can/should reason about weak

    isolation…
  16. It is insane to assume users can/should reason about weak

    isolation… a fate worse than death
  17. It is insane to assume users can/should reason about weak

    isolation… …and yet they still use it! a fate worse than death
  18. None
  19. Coordination costs increase with distribution!

  20. Coordination costs increase with distribution!

  21. Can we provide a more usable high performance concurrency control

    primitive?
  22. Invariants:

  23. Invariants: “usernames should be unique”

  24. Invariants: “usernames should be unique” “each patient should have a

    attending doctor”
  25. Invariants: “usernames should be unique” “each patient should have a

    attending doctor” “account balances should be positive”
  26. 1.) Are easier to reason about than weak isolation Invariants:

    “usernames should be unique” “each patient should have a attending doctor” “account balances should be positive”
  27. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications Invariants: “usernames should be unique” “each patient should have a attending doctor” “account balances should be positive”
  28. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig
  29. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena [Ask for draft; or interview me]
  30. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table [Ask for draft; or interview me]
  31. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table [Ask for draft; or interview me]
  32. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table [Ask for draft; or interview me] 39.2x more common!
  33. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications Invariants: “usernames should be unique” “each patient should have a attending doctor” “account balances should be positive”
  34. Foreign Key Constraints YES Primary Key Constraints YES Row-Level Check

    Constraints YES Multi-Row Check Constraints NO Generic ADT Invariants NO UDF Invariants NO DB supported invariants today:
  35. Foreign Key Constraints YES Primary Key Constraints YES Row-Level Check

    Constraints YES Multi-Row Check Constraints NO Generic ADT Invariants NO UDF Invariants NO DB supported invariants today:
  36. Foreign Key Constraints YES Primary Key Constraints YES Row-Level Check

    Constraints YES Multi-Row Check Constraints NO Generic ADT Invariants NO UDF Invariants NO DB supported invariants today: & little support for distributing, suggesting, mining invariants
  37. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications 3.) Should be a first-class database primitive 4.) Enable more efficient systems design Invariants:
  38. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications 3.) Should be a first-class database primitive 4.) Enable more efficient systems design Invariants:
  39. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications 3.) Should be a first-class database primitive 4.) Enable more efficient systems design Invariants:
  40. None
  41. scale to over 25x prior best on New-Order 0 50

    100 150 200 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) 0 50 100 150 200 Number of Servers 0 20K 40K 60K 80K Throughput (txn/s/server) 6-11x faster than ACID/serializability on New-Order 8 16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL) TPC-C
  42. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications 3.) Should be a first-class database primitive 4.) Enable more efficient systems design Invariants:
  43. 1.) Are easier to reason about than weak isolation 2.)

    Are already specified in many applications 3.) Should be a first-class database primitive 4.) Enable more efficient systems design Invariants: We can do so much better than weak isolation
  44. Image Credits: world by Wayne Tyler Sall surprised by Julian

    Deveaux database by Austin Condiff man by Simon Child by the Noun Project Creative Commons - Attribution (CC by 3.0)