Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2017 Jim Gray Award Talk: Coordination Avoidance in Distributed Databases

2017 Jim Gray Award Talk: Coordination Avoidance in Distributed Databases

More Decks by Stanford Future Data Systems

Other Decks in Technology

Transcript

  1. How should we design database systems that enable new applications

    to scale? “post on timeline” “accept friend request”
  2. transactions cannot make progress independently Problem: Serializability requires Coordination Two-Phase

    Locking Optimistic Concurrency Control Pre-Scheduling Multi-Version Concurrency Control Blocking Waiting Aborts
  3. Major focus: coordination-free execution, or guaranteed response from every replica

    Availability Low latency Perfect horizontal scalability Benefits: OpTRM!
  4. Major focus: coordination-free execution, or guaranteed response from every replica

    Availability Low latency Perfect horizontal scalability Benefits: cost: rarely guarantee application safety properties
  5. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION THESIS WORK: What is the coordination cost of a given safety guarantee? How do we achieve the minimum? “ACID” “NoSQL”
  6. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE APP SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  7. Model Prediction and Training CIDR15, LearningSys15 Atomic Visibility SIGMOD14 Database

    Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION Data Serving and Transactions Analytics
  8. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and

    Training CIDR15, LearningSys15 Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE APP SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  9. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

    write read write read write write read write write write read write WHAT THE DATABASE HEARS read read read read read read
  10. (Abridged) Related Work » Semantics-based concurrency control: esp. commutativity and

    CALM analysis, laws of order » Available storage systems: optimistic replication, causal memory, CRDTs, eventually consistent transactions » Distributed computing: CAP, FLP, NBAC, quorums » Here: focus on necessary coordination for common, modern data-intensive apps
  11. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 Model Prediction and Training CIDR15, LearningSys15
  12. Unavailable Sticky Available Highly Available Legend prevents lost update†, prevents

    write skew‡, requires recency guarantees⊕ Sticky Available Unavailable Highly Available [VLDB 2014]
  13. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 Model Prediction and Training CIDR15, LearningSys15
  14. Constraint Operation Equality, Inequality Any Generate unique ID Any Specify

    unique ID Insert > Increment > Decrement < Decrement < Increment Foreign Key Insert Foreign Key Delete Secondary Indexing Any Materialized Views Any AUTO_INCREMENT Insert Typical database constraints and operations (SQL)
  15. CONSTRAINT: User IDs are unique OPERATION: Add users MERGE: Set

    union {{Stu,ID=1}, {Ann,ID=1}} Constraint violated! {} MERGE add {Stu,ID=1} add {Ann,ID=1} Key idea: Check if constraints can be violated by “merging” independent operations ICT: Invariant Confluence Test
  16. Key idea: Check if constraints can be violated by “merging”

    independent operations CONSTRAINT: User IDs are positive OPERATION: Add users MERGE: Set union {{Stu,ID=1}, {Ann,ID=1}} Constraint holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test
  17. Key idea: Check if constraints can be violated by “merging”

    independent operations OUR CONTRIBUTION: Generalizes classic partitioning-based indistinguishability arguments Theorem. A globally I-valid system can execute a set of transactions T with coordination-freedom, transactional availability, and convergence if and only if T are I-confluent with respect to I. [VLDB 2015] ICT ⟺ safe, coordination-free execution possible ICT: Invariant Confluence Test
  18. Constraint Operation OK? Equality, Inequality Any ??? Generate unique ID

    Any ??? Specify unique ID Insert ??? > Increment ??? > Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Typical database constraints and operations (SQL) Under set merge
  19. Constraint Operation OK? Equality, Inequality Any Y Generate unique ID

    Any Y Specify unique ID Insert N > Increment Y > Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y AUTO_INCREMENT Insert N [VLDB 2015] Typical database constraints and operations (SQL) Under set merge R A M P [SIGMOD 2014]
  20. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig
  21. CONSTRAINTS INCREDIBLY COMMON adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms

    bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table [SIGMOD 2015] 86.9% PASS ICT
  22. 14/16 CONSTRAINTS PASS ICT TPC-C scale to over 25x best

    listed result 0 50 100 150 200 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) 0 50 100 150 200 Number of Servers 0 20K 40K 60K 80K Throughput (txn/s/server) 6-11x faster than ACID/serializability 8 16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL)
  23. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and

    Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  24. Unruly developers are fantastic inspiration • As applications have evolved,

    so have their database demands and desired semantics • Our opportunity: build systems that implement the semantics users want (not just what we want)
  25. • Mounting evidence: many programmers don’t use transactions correctly (or

    at all!) • Need not despair: opportunity for new theory and systems ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications [SIGMOD17]
  26. Michael R. Bernstein, Rick Branson, Mark Callaghan, Adrian Colyer, Sean

    Cribbs, Jonathan Ellis, Alex Feinberg, Andy Gross, Coda Hale, Colin Jones, Evan Jones, Kyle Kingsbury, Adam Marcus, Caitie McCaffrey, Christopher Meiklejohn, Mike Miller, Jeremiah Peschka, Mark Phillips, Henry Robinson, Mehul Shah, Xavier Shay, Justin Sheehy, Ines Sombra, Kelly Sommers, Sriram Srinivasan and a cast of unruly developers and renegades: Also many thanks to a host of phenomenal colleagues and collaborators Peter Alvaro, Neil Conway, Shivaram Venkataraman, Joey Gonzalez, Haoyuan Li, Zhao Zhang, Aaron Davidson, Mike Jordan
  27. Eventual Consistency COORDINATION FREE NO SAFETY Atomic Visibility SIGMOD14 Database

    Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE APP SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE Joint work with Ali Ghodsi, Joe Hellerstein, Ion Stoica, Mike Franklin, Michael Jordan, Alan Fekete, Dan Crankshaw, Shivaram Venkataraman, Neil Conway, Peter Alvaro, Aaron Davidson, Joey Gonzalez, Kyle Kingsbury, Haoyuan Li, and Zhao Zhang