Slide 1

Slide 1 text

SILENCE IS GOLDEN COORDINATION-AVOIDING SYSTEMS DESIGN Peter Bailis @pbailis MesosCon 2015 Keynote 21 August, Seattle, WA

Slide 2

Slide 2 text

Attendee Login Room Reservations Social Media Monitoring Database Reasoning about Distribution is Hard

Slide 3

Slide 3 text

Attendee Login Room Reservations Social Media Monitoring Database Reasoning about Distribution is Hard

Slide 4

Slide 4 text

Attendee Login Room Reservations Social Media Monitoring Database Reasoning about Distribution is Hard

Slide 5

Slide 5 text

Attendee Login Room Reservations Social Media Monitoring Database •Should you and I be able to simultaneously reserve rooms? •Can you reserve a room while I log in? •Can you tweet while I change my username? Reasoning about Distribution is Hard

Slide 6

Slide 6 text

Simple, classic strategy: Hide concurrency by coordinating

Slide 7

Slide 7 text

Mechanisms: Consensus (Paxos, VR, Raft) Zookeeper, etcd, Doozer ACID transactions Simple, classic strategy: Hide concurrency by coordinating Abstraction: Serial access to state Replicated State Machines

Slide 8

Slide 8 text

Coordination is expensive Processes cannot make progress independently

Slide 9

Slide 9 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 10

Slide 10 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 11

Slide 11 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 12

Slide 12 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 13

Slide 13 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 14

Slide 14 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 15

Slide 15 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 16

Slide 16 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 17

Slide 17 text

Coordination is expensive This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Processes cannot make progress independently

Slide 18

Slide 18 text

A B C D E F G H IN-MEMORY LOCKING DISTRIBUTED TRANSACTIONS (EC2) 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) Number of Servers (Items) Accessed per Transaction

Slide 19

Slide 19 text

A B C D E F G H IN-MEMORY LOCKING COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) Number of Servers (Items) Accessed per Transaction

Slide 20

Slide 20 text

A B C D E F G H IN-MEMORY LOCKING COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) LOG SCALE! -398x Number of Servers (Items) Accessed per Transaction

Slide 21

Slide 21 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 22

Slide 22 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 23

Slide 23 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 24

Slide 24 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 25

Slide 25 text

133.7+ ms RTT

Slide 26

Slide 26 text

133.7+ ms RTT

Slide 27

Slide 27 text

133.7+ ms RTT 85.1+ ms RTT

Slide 28

Slide 28 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 29

Slide 29 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 30

Slide 30 text

This limits: 1.) Scalability 2.) Throughput 3.) Low Latency 4.) Availability Coordination is expensive Processes cannot make progress independently

Slide 31

Slide 31 text

High cost! Scalability Throughput Latency Availability Simple, classic strategy: Hide concurrency by coordinating Abstraction: Serial access to state Fundamental penalties to

Slide 32

Slide 32 text

Surely there’s a better way to build systems!

Slide 33

Slide 33 text

Surely there’s a better way to build systems!

Slide 34

Slide 34 text

Why do we feel it's necessary to yak in order to be comfortable? That's when you know you've found somebody really special: when you can just shut up for a minute and comfortably share silence.

Slide 35

Slide 35 text

Why do we feel it's necessary to yak in order to be comfortable? That's when you know you've found somebody really special: when you can just shut up for a minute and comfortably share silence.

Slide 36

Slide 36 text

Scalable systems can just shut up and comfortably share silence

Slide 37

Slide 37 text

Scalable systems can just shut up and comfortably share silence 1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:

Slide 38

Slide 38 text

Scalable systems can just shut up and comfortably share silence 1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:

Slide 39

Slide 39 text

Why is shutting up good?

Slide 40

Slide 40 text

Coordination-free systems: Why is shutting up good?

Slide 41

Slide 41 text

Coordination-free systems: Why is shutting up good?

Slide 42

Slide 42 text

Coordination-free systems: Why is shutting up good?

Slide 43

Slide 43 text

Coordination-free systems: Why is shutting up good? `

Slide 44

Slide 44 text

Coordination-free systems: 1.) Enable infinite scale-out Why is shutting up good? `

Slide 45

Slide 45 text

Coordination-free systems: 1.) Enable infinite scale-out Why is shutting up good? `

Slide 46

Slide 46 text

A B C D E F G H IN-MEMORY LOCKING COORDINATED 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) DISTRIBUTED TRANSACTIONS (EC2) -398x Number of Servers (Items) Accessed per Transaction

Slide 47

Slide 47 text

A B C D E F G H IN-MEMORY LOCKING 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE DISTRIBUTED TRANSACTIONS (EC2) -398x Number of Servers (Items) Accessed per Transaction

Slide 48

Slide 48 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency Why is shutting up good?

Slide 49

Slide 49 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency Why is shutting up good?

Slide 50

Slide 50 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency Why is shutting up good?

Slide 51

Slide 51 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency Why is shutting up good?

Slide 52

Slide 52 text

Why is shutting up good? Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Improve availability

Slide 53

Slide 53 text

any replica can respond to any request “Always on” Availability

Slide 54

Slide 54 text

any replica can respond to any request “Always on” Availability

Slide 55

Slide 55 text

any replica can respond to any request “Always on” Availability

Slide 56

Slide 56 text

any replica can respond to any request “Always on” Availability

Slide 57

Slide 57 text

any replica can respond to any request “Always on” Availability

Slide 58

Slide 58 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?

Slide 59

Slide 59 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?

Slide 60

Slide 60 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on” response Why is shutting up good?

Slide 61

Slide 61 text

Coordination-free systems: 1.) Enable infinite scale-out 2.) Improve throughput 3.) Ensure low latency 4.) Guarantee “always on” response Why is shutting up good? Silence is key to scalability!

Slide 62

Slide 62 text

Scalable systems can just shut up and comfortably share silence 1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:

Slide 63

Slide 63 text

Scalable systems can just shut up and comfortably share silence 1.) Why is shutting up good for systems? 2.) When can systems comfortably share silence? This talk:

Slide 64

Slide 64 text

Attendee Login Room Reservations Social Media Monitoring Database Reasoning about Distribution is Hard

Slide 65

Slide 65 text

Attendee Login Room Reservations Social Media Monitoring Database •Should you and I be able to simultaneously reserve rooms? •Can you reserve a room while I log in? •Can you tweet while I change my username? Reasoning about Distribution is Hard

Slide 66

Slide 66 text

THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects can be COMPOSED

Slide 67

Slide 67 text

THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects can be COMPOSED IN A WAY THAT MAKES “SENSE”

Slide 68

Slide 68 text

IN A WAY THAT MAKES “SENSE” COMPOSED

Slide 69

Slide 69 text

IN A WAY THAT MAKES “SENSE” COMPOSED (“merged”)

Slide 70

Slide 70 text

IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”} (“merged”)

Slide 71

Slide 71 text

IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”} (“merged”) (invariants over state will hold)

Slide 72

Slide 72 text

IN A WAY THAT MAKES “SENSE” COMPOSED 1+1=2 {“a”}+{“b”}={“a”, “b”} (“merged”) Counters are positive (invariants over state will hold) No two talks share a timeslot No NULL values Usernames are unique

Slide 73

Slide 73 text

Key question: Can invariants can be violated by merging independent operations?

Slide 74

Slide 74 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015]

Slide 75

Slide 75 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB ICT: Invariant Confluence Test [VLDB 2015]

Slide 76

Slide 76 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} ICT: Invariant Confluence Test [VLDB 2015]

Slide 77

Slide 77 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} add {Stu,ID=1} ICT: Invariant Confluence Test [VLDB 2015]

Slide 78

Slide 78 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {} add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]

Slide 79

Slide 79 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]

Slide 80

Slide 80 text

Key question: Can invariants can be violated by merging independent operations? INVARIANT: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} Invariant holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]

Slide 81

Slide 81 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB

Slide 82

Slide 82 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB

Slide 83

Slide 83 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015] INVARIANT: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Ann,ID=1}} Invariant broken! {} MERGE add {Stu,ID=1} add {Ann,ID=1}

Slide 84

Slide 84 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015]

Slide 85

Slide 85 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015] ICT passes? Coordination not required

Slide 86

Slide 86 text

Key question: Can invariants can be violated by merging independent operations? ICT: Invariant Confluence Test [VLDB 2015] ICT passes? ICT fails? Coordination not required Coordination required

Slide 87

Slide 87 text

THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects can be COMPOSED IN A WAY THAT MAKES “SENSE”

Slide 88

Slide 88 text

THOSE LIGHT CONES If operations happen concurrently… …ensure their side-effects can be COMPOSED IN A WAY THAT MAKES “SENSE” formalized by ICT

Slide 89

Slide 89 text

Attendee Login Room Reservations Social Media Monitoring Database When can we comfortably share silence?

Slide 90

Slide 90 text

Attendee Login Room Reservations Social Media Monitoring Database Can we simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence?

Slide 91

Slide 91 text

Attendee Login Room Reservations Social Media Monitoring Database Can we simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence?

Slide 92

Slide 92 text

Attendee Login Room Reservations Social Media Monitoring Database Can we simultaneously reserve rooms? Can I log in while you reserve a room? Can I tweet while you change your username? When can we comfortably share silence? When operations are composable

Slide 93

Slide 93 text

Constraint Operation Passes ICT? Equality, Inequality Any ??? Generate unique ID Any ??? Specify unique ID Insert ??? > Increment ??? > Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? [VLDB 2015] Typical database constraints and operations (SQL)

Slide 94

Slide 94 text

Constraint Operation Passes ICT? Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N > Increment Y > Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y AUTO_INCREMENT Insert N [VLDB 2015] Typical database constraints and operations (SQL)

Slide 95

Slide 95 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena

Slide 96

Slide 96 text

67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table

Slide 97

Slide 97 text

67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 86.9% PASS ICT [SIGMOD 2015]

Slide 98

Slide 98 text

Always coordinating is inefficient! 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 86.9% PASS ICT [SIGMOD 2015]

Slide 99

Slide 99 text

Everything Happens At Once Legacy Implementations Overcoordinate

Slide 100

Slide 100 text

Users never read intermediate data Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate

Slide 101

Slide 101 text

Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate

Slide 102

Slide 102 text

Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit;

Slide 103

Slide 103 text

Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access

Slide 104

Slide 104 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access

Slide 105

Slide 105 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; Classic implementation: lock records during access

Slide 106

Slide 106 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “peter” Classic implementation: lock records during access

Slide 107

Slide 107 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “peter” Classic implementation: lock records during access

Slide 108

Slide 108 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access

Slide 109

Slide 109 text

name/record Users never read intermediate data w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access

Slide 110

Slide 110 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access

Slide 111

Slide 111 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access Better implementation: use multi-versioning, commit tag

Slide 112

Slide 112 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record Better implementation: use multi-versioning, commit tag

Slide 113

Slide 113 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag

Slide 114

Slide 114 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag “pbailis”

Slide 115

Slide 115 text

name/record w(name=“peter”);/w(name=“pbailis”);/commit; Read Committed RDBMS Everything Happens At Once Legacy Implementations Overcoordinate r(name=“peter”);/commit; “pbailis” Classic implementation: lock records during access name/record “peter” Better implementation: use multi-versioning, commit tag “pbailis” OK

Slide 116

Slide 116 text

Everything Happens At Once Next Level Technique: RAMP Transactions

Slide 117

Slide 117 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit;

Slide 118

Slide 118 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys

Slide 119

Slide 119 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys Classic implementation: lock records

Slide 120

Slide 120 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; used in indexing, materialized views, foreign keys Classic implementation: lock records Result: typically implemented incorrectly at scale

Slide 121

Slide 121 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit;

Slide 122

Slide 122 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata

Slide 123

Slide 123 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record

Slide 124

Slide 124 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record loc/record

Slide 125

Slide 125 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record

Slide 126

Slide 126 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status)

Slide 127

Slide 127 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK

Slide 128

Slide 128 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK OK

Slide 129

Slide 129 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK

Slide 130

Slide 130 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) OK

Slide 131

Slide 131 text

Everything Happens At Once Next Level Technique: RAMP Transactions Desired property: see all updates, or see none w(status=“talking”);/w(loc=“seattle”);/commit; RAMP: multi-versioning with intention metadata status/record “talking”/(@t=10,/also/loc) loc/record “seattle”/(@t=10,/also/status) Key: Prevent read stalls Compact metadata SIGMOD 2014 OK

Slide 132

Slide 132 text

TPC-C

Slide 133

Slide 133 text

14/16 INVARIANTS PASS ICT TPC-C

Slide 134

Slide 134 text

14/16 INVARIANTS PASS ICT TPC-C scale to over 25x best listed result 0 50 100 150 200 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) 0 50 100 150 200 Number of Servers 0 20K 40K 60K 80K Throughput (txn/s/server) 6-11x faster than ACID/serializability 8 16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL)

Slide 135

Slide 135 text

Everything Happens At Once Key Design Patterns

Slide 136

Slide 136 text

Everything Happens At Once Key Design Patterns • Datatype libraries can automatically merge operations e.g., Bloom^L, CRDTs

Slide 137

Slide 137 text

Everything Happens At Once Key Design Patterns • Datatype libraries can automatically merge operations e.g., Bloom^L, CRDTs • Multi-versioning can prevent stalls during partial updates e.g., RAMP, COPS, SwiftCloud

Slide 138

Slide 138 text

Everything Happens At Once Key Design Patterns • Datatype libraries can automatically merge operations e.g., Bloom^L, CRDTs • Multi-versioning can prevent stalls during partial updates e.g., RAMP, COPS, SwiftCloud •When you must coordinate, distribute as little as possible e.g., Transaction Chopping

Slide 139

Slide 139 text

Rethink The API

Slide 140

Slide 140 text

Rethink The API Read/Write Transaction Distributed Log Consensus Object Distributed Log Consensus Object

Slide 141

Slide 141 text

Rethink The API Read/Write Transaction Distributed Log Consensus Object Are too low level! Distributed Log Consensus Object

Slide 142

Slide 142 text

The Far Side, Gary Larson

Slide 143

Slide 143 text

WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

Slide 144

Slide 144 text

WHAT THE APPLICATION SAYS “post on timeline” “accept friend request” write read write read write write read write write write read write WHAT THE SYSTEM HEARS read read read read read read write write write read read write read write write

Slide 145

Slide 145 text

WHAT THE APPLICATION SAYS “post on timeline” “accept friend request” write read write read read write write read WHAT THE SYSTEM HEARS read read read read write write read read write read write write “post on timeline” “accept friend request” write write

Slide 146

Slide 146 text

The Good Stuff (Papers) ICT in theory and practice Coordination-avoiding analytics Index, graph, and view maintenance Transaction isolation Upgrading existing stores Quantifying visibility SIGMOD 2015, VLDB 2015 CIDR 2015 SIGMOD 2014 VLDB 2014 SIGMOD 2013 VLDB 2012, VLDBJ 2014

Slide 147

Slide 147 text

To avoid coordination, maximize composability of operations Scalable systems can comfortably share silence

Slide 148

Slide 148 text

To avoid coordination, maximize composability of operations Scalable systems can comfortably share silence Joint work with Ali Ghodsi, Alan Fekete, Joe Hellerstein, Ion Stoica, and many others (see bailis.org)

Slide 149

Slide 149 text

To avoid coordination, maximize composability of operations @pbailis Scalable systems can comfortably share silence

Slide 150

Slide 150 text

Many illustrations by the Noun Project (CC-Attribution): surprised by Julian Derveaux world by Wayne Tyler Sall database by Austin Condiff earth by Martin Vanco Woman by Simon Child Man by Simon Child Doctor by Simon Child David-Hockney by Simon Child Server by Simon Child clock by christoph robausch