Silence is Golden: Coordination-Avoiding Systems Design

B7dc26518988058faa50712248c80bd3?s=47 pbailis
August 21, 2015

Silence is Golden: Coordination-Avoiding Systems Design

MesosCon 2015 Keynote
26 August 2015
Seattle, WA

Talk video: https://www.youtube.com/watch?v=EYJnWttrC9k
More information: http://bailis.org/

Abstract:

Computer networks make it difficult to design scalable, robust distributed systems that exhibit good performance. Networks can be slow, have limited capacity, and are often unreliable. In an ideal world, we'd build systems that don't rely on the network at all. Unfortunately, as a slew of negative results like the CAP Theorem illustrate, this isn't always possible. Traditional systems abstractions like ACID transactions fundamentally require synchronous communication, or coordination, to implement. As a result, coordination-free systems designs often forego many programmer-friendly abstractions. These systems leave the task of reasoning about correctness to the application developer or, worse, to the end user.

In this talk, I'll discuss an alternative: system designs that coordinate only when necessary to guarantee application correctness. This coordination avoidance maximizes scalability and robustness by minimizing reliance on the network. To illustrate the power of coordination-avoiding systems design, I'll present several case studies from our research spanning database isolation guarantees, indexes and constraints, and open source applications. Perhaps surprisingly, even though traditional implementations of these tasks rely on coordination, many of these tasks don't actually require coordination for correctness. The resulting systems are among the fastest prototypes ever built and operated at scale. Based on these case studies, I'll provide concrete and practical design principles for reasoning about and applying coordination avoidance in the wild.

B7dc26518988058faa50712248c80bd3?s=128

pbailis

August 21, 2015
Tweet

Transcript

  1. SILENCE IS GOLDEN COORDINATION-AVOIDING SYSTEMS DESIGN Peter Bailis @pbailis MesosCon

    2015 Keynote 21 August, Seattle, WA
  2. Attendee Login Room Reservations Social Media Monitoring Database Reasoning about

    Distribution is Hard
  3. Attendee Login Room Reservations Social Media Monitoring Database Reasoning about

    Distribution is Hard
  4. Attendee Login Room Reservations Social Media Monitoring Database Reasoning about

    Distribution is Hard
  5. Attendee Login Room Reservations Social Media Monitoring Database •Should you

    and I be able to simultaneously reserve rooms? •Can you reserve a room while I log in? •Can you tweet while I change my username? Reasoning about Distribution is Hard
  6. Simple, classic strategy: Hide concurrency by coordinating

  7. Mechanisms: Consensus (Paxos, VR, Raft) Zookeeper, etcd, Doozer ACID transactions

    Simple, classic strategy: Hide concurrency by coordinating Abstraction: Serial access to state Replicated State Machines
  8. Coordination is expensive Processes cannot make progress independently

  9. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  10. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  11. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  12. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  13. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  14. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  15. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  16. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  17. Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.)

    Low Latency 4.) Availability Processes cannot make progress independently
  18. A B C D E F G H IN-MEMORY LOCKING

    DISTRIBUTED TRANSACTIONS (EC2) 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) Number of Servers (Items) Accessed per Transaction
  19. A B C D E F G H IN-MEMORY LOCKING

    COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) Number of Servers (Items) Accessed per Transaction
  20. A B C D E F G H IN-MEMORY LOCKING

    COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) LOG SCALE! -398x Number of Servers (Items) Accessed per Transaction
  21. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  22. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  23. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  24. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  25. 133.7+ ms RTT

  26. 133.7+ ms RTT

  27. 133.7+ ms RTT 85.1+ ms RTT

  28. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  29. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  30. This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.)

    Availability Coordination is expensive Processes cannot make progress independently
  31. High cost! Scalability Throughput Latency Availability Simple, classic strategy: Hide

    concurrency by coordinating Abstraction: Serial access to state Fundamental penalties to
  32. Surely there’s a better way to build systems!

  33. Surely there’s a better way to build systems!

  34. Why do we feel it's necessary to yak in order

    to be comfortable? That's when you know you've found somebody really special: when you can just shut up for a minute and comfortably share silence.
  35. Why do we feel it's necessary to yak in order

    to be comfortable? That's when you know you've found somebody really special: when you can just shut up for a minute and comfortably share silence.
  36. Scalable systems can just shut up and comfortably share silence

  37. Scalable systems can just shut up and comfortably share silence

    1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:
  38. Scalable systems can just shut up and comfortably share silence

    1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:
  39. Why is shutting up good?

  40. Coordination-free systems: Why is shutting up good?

  41. Coordination-free systems: Why is shutting up good?

  42. Coordination-free systems: Why is shutting up good?

  43. Coordination-free systems: Why is shutting up good? `

  44. Coordination-free systems: 1.) Enable infinite scale-out Why is shutting up

    good? `
  45. Coordination-free systems: 1.) Enable infinite scale-out Why is shutting up

    good? `
  46. A B C D E F G H IN-MEMORY LOCKING

    COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) -398x Number of Servers (Items) Accessed per Transaction
  47. A B C D E F G H IN-MEMORY LOCKING

    1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE DISTRIBUTED TRANSACTIONS (EC2) -398x Number of Servers (Items) Accessed per Transaction
  48. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency Why is shutting up good?
  49. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency Why is shutting up good?
  50. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency Why is shutting up good?
  51. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency Why is shutting up good?
  52. Why is shutting up good? Coordination-free systems: 1.) Enable infinite

    scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Improve availability
  53. any replica can respond to any request “Always on” Availability

  54. any replica can respond to any request “Always on” Availability

  55. any replica can respond to any request “Always on” Availability

  56. any replica can respond to any request “Always on” Availability

  57. any replica can respond to any request “Always on” Availability

  58. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?
  59. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?
  60. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?
  61. Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.)

    Ensure low latency 4.) Guarantee “always on” response Why is shutting up good? Silence is key to scalability!
  62. Scalable systems can just shut up and comfortably share silence

    1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:
  63. Scalable systems can just shut up and comfortably share silence

    1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:
  64. Attendee Login Room Reservations Social Media Monitoring Database Reasoning about

    Distribution is Hard
  65. Attendee Login Room Reservations Social Media Monitoring Database •Should you

    and I be able to simultaneously reserve rooms? •Can you reserve a room while I log in? •Can you tweet while I change my username? Reasoning about Distribution is Hard
  66. THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects

    can be COMPOSED
  67. THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects

    can be COMPOSED IN A WAY THAT MAKES “SENSE”
  68. IN A WAY THAT MAKES “SENSE” COMPOSED

  69. IN A WAY THAT MAKES “SENSE” COMPOSED (“merged”)

  70. IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”}

    (“merged”)
  71. IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”}

    (“merged”) (invariants over state will hold)
  72. IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”}

    (“merged”) Counters are positive (invariants over state will hold) No two talks share a timeslot No NULL values Usernames are unique
  73. Key question: Can invariants can be violated by merging independent

    operations?
  74. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015]
  75. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB ICT: Invariant Confluence Test [VLDB 2015]
  76. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} ICT: Invariant Confluence Test [VLDB 2015]
  77. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} add {Stu,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  78. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  79. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  80. Key question: Can invariants can be violated by merging independent

    operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} Invariant holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  81. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB
  82. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB
  83. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} Invariant broken! {} MERGE add {Stu,ID=1} add {Ann,ID=1}
  84. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015]
  85. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015] ICT passes? Coordination not required
  86. Key question: Can invariants can be violated by merging independent

    operations? ICT: Invariant Confluence Test [VLDB 2015] ICT passes? ICT fails? Coordination not required Coordination required
  87. THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects

    can be COMPOSED IN A WAY THAT MAKES “SENSE”
  88. THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects

    can be COMPOSED IN A WAY THAT MAKES “SENSE” formalized by ICT
  89. Attendee Login Room Reservations Social Media Monitoring Database When can

    we comfortably share silence?
  90. Attendee Login Room Reservations Social Media Monitoring Database Can we

    simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence?
  91. Attendee Login Room Reservations Social Media Monitoring Database Can we

    simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence?
  92. Attendee Login Room Reservations Social Media Monitoring Database Can we

    simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence? When operations are composable
  93. Constraint Operation Passes ICT? Equality, Inequality Any ??? Generate unique

    ID Any ??? Specify unique ID Insert ??? > Increment ??? > Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? [VLDB 2015] Typical database constraints and operations (SQL)
  94. Constraint Operation Passes ICT? Equality, Inequality Any Y Generate unique

    ID Any Y Specify unique ID Insert N > Increment Y > Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y AUTO_INCREMENT Insert N [VLDB 2015] Typical database constraints and operations (SQL)
  95. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena
  96. 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1

    per table
  97. 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1

    per table 86.9% PASS ICT [SIGMOD 2015]
  98. Always coordinating is inefficient! 67 projects 1.77M LoC 1957 tables

    9986 total; avg. 5.1 per table 86.9% PASS ICT [SIGMOD 2015]
  99. Everything Happens At Once Legacy Implementations Overcoordinate

  100. Users never read intermediate data Read Committed RDBMS Everything Happens

    At Once Legacy Implementations Overcoordinate
  101. Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything

    Happens At Once Legacy Implementations Overcoordinate
  102. Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything

    Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit;
  103. Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything

    Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access
  104. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access
  105. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access
  106. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “peter” Classic implementation: lock records during access
  107. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “peter” Classic implementation: lock records during access
  108. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access
  109. name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS

    Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access
  110. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access
  111. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access Better implementation: use multi-versioning, commit tag
  112. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record Better implementation: use multi-versioning, commit tag
  113. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag
  114. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag “pbailis”
  115. name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy

    Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag “pbailis” OK
  116. Everything Happens At Once Next Level Technique: RAMP Transactions

  117. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit;
  118. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys
  119. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys Classic implementation: lock records
  120. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys Classic implementation: lock records Result: typically implemented incorrectly at scale
  121. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit;
  122. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata
  123. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record
  124. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record loc/record
  125. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record
  126. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status)
  127. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK
  128. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK OK
  129. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK
  130. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK
  131. Everything Happens At Once Next Level Technique: RAMP Transactions Desired

    property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) Key: Prevent read stalls Compact metadata SIGMOD 2014 OK
  132. TPC-C

  133. 14/16 INVARIANTS PASS ICT TPC-C

  134. 14/16 INVARIANTS PASS ICT TPC-C scale to over 25x best

    listed result 0 50 100 150 200 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) 0 50 100 150 200 Number of Servers 0 20K 40K 60K 80K Throughput (txn/s/server) 6-11x faster than ACID/serializability 8 16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL)
  135. Everything Happens At Once Key Design Patterns

  136. Everything Happens At Once Key Design Patterns • Datatype libraries

    can automatically merge operations e.g., Bloom^L, CRDTs
  137. Everything Happens At Once Key Design Patterns • Datatype libraries

    can automatically merge operations e.g., Bloom^L, CRDTs • Multi-versioning can prevent stalls during partial updates e.g., RAMP, COPS, SwiftCloud
  138. Everything Happens At Once Key Design Patterns • Datatype libraries

    can automatically merge operations e.g., Bloom^L, CRDTs • Multi-versioning can prevent stalls during partial updates e.g., RAMP, COPS, SwiftCloud •When you must coordinate, distribute as little as possible e.g., Transaction Chopping
  139. Rethink The API

  140. Rethink The API Read/Write Transaction Distributed Log Consensus Object Distributed

    Log Consensus Object
  141. Rethink The API Read/Write Transaction Distributed Log Consensus Object Are

    too low level! Distributed Log Consensus Object
  142. The Far Side, Gary Larson

  143. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

  144. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

    write read write read write write read write write write read write WHAT THE SYSTEM HEARS read read read read read read write write write read read write read write write
  145. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

    write read write read read write write read WHAT THE SYSTEM HEARS read read read read write write read read write read write write “post on timeline” “accept friend request” write write
  146. The Good Stuff (Papers) ICT in theory and practice Coordination-avoiding

    analytics Index, graph, and view maintenance Transaction isolation Upgrading existing stores Quantifying visibility SIGMOD 2015, VLDB 2015 CIDR 2015 SIGMOD 2014 VLDB 2014 SIGMOD 2013 VLDB 2012, VLDBJ 2014
  147. To avoid coordination, maximize composability of operations Scalable systems can

    comfortably share silence
  148. To avoid coordination, maximize composability of operations Scalable systems can

    comfortably share silence Joint work with Ali Ghodsi, Alan Fekete, Joe Hellerstein, Ion Stoica, and many others (see bailis.org)
  149. To avoid coordination, maximize composability of operations @pbailis Scalable systems

    can comfortably share silence
  150. Many illustrations by the Noun Project (CC-Attribution): surprised by Julian

    Derveaux world by Wayne Tyler Sall database by Austin Condiff earth by Martin Vanco Woman by Simon Child Man by Simon Child Doctor by Simon Child David-Hockney by Simon Child Server by Simon Child clock by christoph robausch