Slide 1

Slide 1 text

Causal Consistency Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica UC Berkeley Bolt on

Slide 2

Slide 2 text

Slides from Sigmod 2013 paper at http://bailis.org/papers/bolton-sigmod2013.pdf [email protected]

Slide 3

Slide 3 text

July 2000: CAP Conjecture

Slide 4

Slide 4 text

July 2000: CAP Conjecture A system facing network partitions must choose between either availability or strong consistency

Slide 5

Slide 5 text

July 2000: CAP Conjecture A system facing network partitions must choose between either availability or strong consistency Theorem

Slide 6

Slide 6 text

NoSQL

Slide 7

Slide 7 text

NoSQL

Slide 8

Slide 8 text

NoSQL

Slide 9

Slide 9 text

NoSQL

Slide 10

Slide 10 text

NoSQL

Slide 11

Slide 11 text

NoSQL Strong consistency is out! “Partitions matter, and so does low latency” [cf. Abadi: PACELC] ...offer eventual consistency instead

Slide 12

Slide 12 text

Eventual Consistency eventually all replicas agree on the same value Extremely weak consistency model:

Slide 13

Slide 13 text

Eventual Consistency eventually all replicas agree on the same value Extremely weak consistency model: Any value can be returned at any given time ...as long as it’s eventually the same everywhere

Slide 14

Slide 14 text

Eventual Consistency eventually all replicas agree on the same value Extremely weak consistency model: Any value can be returned at any given time ...as long as it’s eventually the same everywhere Provides liveness but no safety guarantees Liveness: something good eventually happens Safety: nothing bad ever happens

Slide 15

Slide 15 text

Do we have to give up safety if we want availability?

Slide 16

Slide 16 text

Do we have to give up safety if we want availability? ?

Slide 17

Slide 17 text

Do we have to give up safety if we want availability? ? No! There’s a spectrum of models.

Slide 18

Slide 18 text

Do we have to give up safety if we want availability? ? No! There’s a spectrum of models.

Slide 19

Slide 19 text

Do we have to give up safety if we want availability? ? No! There’s a spectrum of models. UT Austin TR: No model stronger than Causal Consistency is achievable with HA

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Why Causal Consistency? Highly available, low latency operation Long-identified useful “session” model Natural fit for many modern apps [Bayou Project, 1994-98] [UT Austin 2011 TR]

Slide 23

Slide 23 text

Dilemma! Eventual consistency is the lowest common denominator across systems...

Slide 24

Slide 24 text

Dilemma! Eventual consistency is the lowest common denominator across systems... ...yet eventual consistency is often insufficient for many applications...

Slide 25

Slide 25 text

Dilemma! Eventual consistency is the lowest common denominator across systems... ...and no production-ready storage systems offer highly available causal consistency. ...yet eventual consistency is often insufficient for many applications...

Slide 26

Slide 26 text

In this talk... show how to upgrade existing stores to provide HA causal consistency

Slide 27

Slide 27 text

In this talk... show how to upgrade existing stores to provide HA causal consistency Approach: bolt on a narrow shim layer to upgrade eventual consistency

Slide 28

Slide 28 text

In this talk... show how to upgrade existing stores to provide HA causal consistency Approach: bolt on a narrow shim layer to upgrade eventual consistency Outcome: architecturally separate safety and liveness properties

Slide 29

Slide 29 text

Separation of Concerns

Slide 30

Slide 30 text

Separation of Concerns Shim handles: Consistency/visibility

Slide 31

Slide 31 text

Consistency-related Safety Mostly algorithmic Small code base Separation of Concerns Shim handles: Consistency/visibility

Slide 32

Slide 32 text

Consistency-related Safety Mostly algorithmic Small code base Separation of Concerns Shim handles: Consistency/visibility Underlying store handles: Messaging/propagation Durability/persistence Failure-detection/handling

Slide 33

Slide 33 text

Consistency-related Safety Mostly algorithmic Small code base Separation of Concerns Shim handles: Consistency/visibility Liveness and Replication Lots of engineering Reuse existing efforts! Underlying store handles: Messaging/propagation Durability/persistence Failure-detection/handling

Slide 34

Slide 34 text

Consistency-related Safety Mostly algorithmic Small code base Separation of Concerns Shim handles: Consistency/visibility Liveness and Replication Lots of engineering Reuse existing efforts! Underlying store handles: Messaging/propagation Durability/persistence Failure-detection/handling Guarantee same (useful) semantics across systems! Allows portability, modularity, comparisons

Slide 35

Slide 35 text

Bolt-on Architecture Bolt-on shim layer upgrades the semantics of an eventually consistent data store Clients only communicate with shim Shim communicates with one of many different eventually consistent stores (generic)

Slide 36

Slide 36 text

Bolt-on Architecture Bolt-on shim layer upgrades the semantics of an eventually consistent data store Clients only communicate with shim Shim communicates with one of many different eventually consistent stores (generic) Treat EC store as “storage manager” of distributed DBMS

Slide 37

Slide 37 text

Bolt-on Architecture Bolt-on shim layer upgrades the semantics of an eventually consistent data store Clients only communicate with shim Shim communicates with one of many different eventually consistent stores (generic) Treat EC store as “storage manager” of distributed DBMS for now, an extreme: unmodified EC store

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Bolt-on causal consistency

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

What is Causal Consistency?

Slide 45

Slide 45 text

What is Causal Consistency?

Slide 46

Slide 46 text

What is Causal Consistency? Time

Slide 47

Slide 47 text

What is Causal Consistency? Time First Tweet

Slide 48

Slide 48 text

What is Causal Consistency? Time First Tweet

Slide 49

Slide 49 text

What is Causal Consistency? Time First Tweet Reply to Alex

Slide 50

Slide 50 text

What is Causal Consistency? Time First Tweet Reply to Alex

Slide 51

Slide 51 text

What is Causal Consistency? Time First Tweet Reply to Alex

Slide 52

Slide 52 text

What is Causal Consistency? Time First Tweet Reply to Alex

Slide 53

Slide 53 text

What is Causal Consistency? Time First Tweet Reply to Alex

Slide 54

Slide 54 text

What is Causal Consistency? Reads obey: 1.) Writes Follow Reads (“happens-before”) 2.) Program order 3.) Transitivity [Lamport 1978]

Slide 55

Slide 55 text

What is Causal Consistency? Reads obey: 1.) Writes Follow Reads (“happens-before”) 2.) Program order 3.) Transitivity [Lamport 1978] Here, applications explicitly define happens-before for each write (“explicit causality”) [Ladin et al. 1990, cf. Bailis et al. 2012]

Slide 56

Slide 56 text

What is Causal Consistency? Reads obey: 1.) Writes Follow Reads (“happens-before”) 2.) Program order 3.) Transitivity [Lamport 1978] Here, applications explicitly define happens-before for each write (“explicit causality”) [Ladin et al. 1990, cf. Bailis et al. 2012] First Tweet Reply to Alex

Slide 57

Slide 57 text

What is Causal Consistency? Reads obey: 1.) Writes Follow Reads (“happens-before”) 2.) Program order 3.) Transitivity [Lamport 1978] Here, applications explicitly define happens-before for each write (“explicit causality”) [Ladin et al. 1990, cf. Bailis et al. 2012] First Tweet Reply to Alex happens-before

Slide 58

Slide 58 text

First Tweet Reply to Alex happens-before https://dev.twitter.com/docs/api/1.1/post/statuses/update

Slide 59

Slide 59 text

First Tweet Reply to Alex happens-before happens-before https://dev.twitter.com/docs/api/1.1/post/statuses/update

Slide 60

Slide 60 text

DC1 DC2

Slide 61

Slide 61 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 62

Slide 62 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 63

Slide 63 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 64

Slide 64 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 65

Slide 65 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 66

Slide 66 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 67

Slide 67 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 68

Slide 68 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 69

Slide 69 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 70

Slide 70 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 71

Slide 71 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 72

Slide 72 text

First Tweet Reply to Alex happens-before DC1 DC2

Slide 73

Slide 73 text

1.) Representing Order Two Tasks: How do we efficiently store causal ordering in the EC system? 2.) Controlling Order How do we control the visibility of new updates to the EC system?

Slide 74

Slide 74 text

1.) Representing Order Two Tasks: How do we efficiently store causal ordering in the EC system? 2.) Controlling Order How do we control the visibility of new updates to the EC system?

Slide 75

Slide 75 text

1.) Representing Order Two Tasks: How do we efficiently store causal ordering in the EC system? 2.) Controlling Order How do we control the visibility of new updates to the EC system?

Slide 76

Slide 76 text

1.) Representing Order Two Tasks: How do we efficiently store causal ordering in the EC system? 2.) Controlling Order How do we control the visibility of new updates to the EC system?

Slide 77

Slide 77 text

Strawman: use vector clocks Representing Order [e.g., Bayou, Causal Memory]

Slide 78

Slide 78 text

Strawman: use vector clocks Representing Order First Tweet :0 { } :1, :1 { } :1, Reply-to Alex [e.g., Bayou, Causal Memory]

Slide 79

Slide 79 text

Strawman: use vector clocks Representing Order First Tweet :0 { } :1, :1 { } :1, Reply-to Alex Problem? Given missing dependency (from vector), what key should we check? [e.g., Bayou, Causal Memory]

Slide 80

Slide 80 text

Strawman: use vector clocks Representing Order First Tweet :0 { } :1, :1 { } :1, Reply-to Alex Problem? Given missing dependency (from vector), what key should we check? If I have <3,1>; where is <2,1>? <1,1>? Write to same key? Write to different key? Which? [e.g., Bayou, Causal Memory]

Slide 81

Slide 81 text

Strawman: use dependency pointers Representing Order [e.g., Lazy Replication, COPS]

Slide 82

Slide 82 text

Strawman: use dependency pointers First Tweet A @ timestamp 1092, dependencies = {} Representing Order [e.g., Lazy Replication, COPS]

Slide 83

Slide 83 text

Strawman: use dependency pointers First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order [e.g., Lazy Replication, COPS]

Slide 84

Slide 84 text

Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order [e.g., Lazy Replication, COPS]

Slide 85

Slide 85 text

Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 [e.g., Lazy Replication, COPS]

Slide 86

Slide 86 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@2 [e.g., Lazy Replication, COPS]

Slide 87

Slide 87 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@2 [e.g., Lazy Replication, COPS]

Slide 88

Slide 88 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@2 [e.g., Lazy Replication, COPS]

Slide 89

Slide 89 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@7 [e.g., Lazy Replication, COPS]

Slide 90

Slide 90 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@7 [e.g., Lazy Replication, COPS]

Slide 91

Slide 91 text

C@3 A@1 Strawman: use dependency pointers Problem? First Tweet A @ timestamp 1092, dependencies = {} Reply-to Alex B @ timestamp 1109, dependencies={A@1092} Representing Order A@1→B@2→C@3 B@7 single pointers can be overwritten! [e.g., Lazy Replication, COPS]

Slide 92

Slide 92 text

Representing Order

Slide 93

Slide 93 text

Representing Order Strawman: use vector clocks don’t know what items to check

Slide 94

Slide 94 text

Strawman: use dependency pointers Representing Order single pointers can be overwritten “overwritten histories” Strawman: use vector clocks don’t know what items to check

Slide 95

Slide 95 text

Strawman: use dependency pointers Representing Order single pointers can be overwritten “overwritten histories” Strawman: use vector clocks don’t know what items to check Strawman: use N2 items for messaging

Slide 96

Slide 96 text

Strawman: use dependency pointers Representing Order single pointers can be overwritten “overwritten histories” Strawman: use vector clocks don’t know what items to check Strawman: use N2 items for messaging highly inefficient!

Slide 97

Slide 97 text

Representing Order Solution: store metadata about causal cuts

Slide 98

Slide 98 text

Representing Order Solution: store metadata about causal cuts

Slide 99

Slide 99 text

Representing Order Solution: store metadata about causal cuts short answer: consistent cut applied to data items; not quite the transitive closure

Slide 100

Slide 100 text

short answer: consistent cut applied to data items; not quite the transitive closure Representing Order Solution: store metadata about causal cuts

Slide 101

Slide 101 text

short answer: consistent cut applied to data items; not quite the transitive closure Representing Order Solution: store metadata about causal cuts A@1→B@2→C@3 Causal cut for C@3: {B@2, A@1}

Slide 102

Slide 102 text

short answer: consistent cut applied to data items; not quite the transitive closure Representing Order Solution: store metadata about causal cuts A@1→B@2→C@3 Causal cut for C@3: {B@2, A@1} A@6→B@17→C@20 A@10→B@12 Causal cut for C@20: {B@17, A@10}

Slide 103

Slide 103 text

Two Tasks: 1.) Representing Order How do we efficiently store causal ordering in the EC system? 2.) Controlling Order How do we control the visibility of new updates to the EC system?

Slide 104

Slide 104 text

Two Tasks: 1.) Representing Order 2.) Controlling Order How do we control the visibility of new updates to the EC system? Shim stores causal cut summary along with every key due to overwrites and “unreliable” delivery

Slide 105

Slide 105 text

Two Tasks: 1.) Representing Order 2.) Controlling Order How do we control the visibility of new updates to the EC system? Shim stores causal cut summary along with every key due to overwrites and “unreliable” delivery

Slide 106

Slide 106 text

Controlling Order

Slide 107

Slide 107 text

Controlling Order Standard technique: reveal new writes to readers only when dependencies have been revealed Inductively guarantee clients read from causal cut

Slide 108

Slide 108 text

Controlling Order Standard technique: reveal new writes to readers only when dependencies have been revealed Inductively guarantee clients read from causal cut In bolt-on causal consistency, two challenges:

Slide 109

Slide 109 text

Controlling Order Standard technique: reveal new writes to readers only when dependencies have been revealed Inductively guarantee clients read from causal cut In bolt-on causal consistency, two challenges: Each shim has to check dependencies manually Underlying store doesn’t notify clients of new writes

Slide 110

Slide 110 text

Controlling Order Standard technique: reveal new writes to readers only when dependencies have been revealed Inductively guarantee clients read from causal cut In bolt-on causal consistency, two challenges: Each shim has to check dependencies manually Underlying store doesn’t notify clients of new writes EC store may overwrite “stable” cut Clients need to cache relevant cut to prevent overwrites

Slide 111

Slide 111 text

Controlling Order Standard technique: reveal new writes to readers only when dependencies have been revealed Inductively guarantee clients read from causal cut In bolt-on causal consistency, two challenges: Each shim has to check dependencies manually Underlying store doesn’t notify clients of new writes EC store may overwrite “stable” cut Clients need to cache relevant cut to prevent overwrites

Slide 112

Slide 112 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut

Slide 113

Slide 113 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut

Slide 114

Slide 114 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut

Slide 115

Slide 115 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut

Slide 116

Slide 116 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut SHIM

Slide 117

Slide 117 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut SHIM EC Store

Slide 118

Slide 118 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store

Slide 119

Slide 119 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B)

Slide 120

Slide 120 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092}

Slide 121

Slide 121 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} read(A)

Slide 122

Slide 122 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} read(A) A@1092, deps={}

Slide 123

Slide 123 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} read(A) A@1092, deps={} B@1109

Slide 124

Slide 124 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} read(A) A@1092, deps={} B@1109

Slide 125

Slide 125 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} A@1092, deps={}

Slide 126

Slide 126 text

Each shim has to check dependencies manually EC store may overwrite “stable” cut read(B) SHIM EC Store read(B) B@1109, deps={A@1092} A@1092, deps={} Cache this value for A! EC store might overwrite it with “unresolved” write

Slide 127

Slide 127 text

1.) Representing Order Two Tasks: 2.) Controlling Order How do we control the visibility of new updates to the EC system? Shim stores causal cut summary along with every key due to overwrites and “unreliable” delivery

Slide 128

Slide 128 text

1.) Representing Order Two Tasks: 2.) Controlling Order Shim performs dependency checks for client, caches dependencies Shim stores causal cut summary along with every key due to overwrites and “unreliable” delivery

Slide 129

Slide 129 text

No content

Slide 130

Slide 130 text

UpgradeD CASSANDRA to Causal consistency

Slide 131

Slide 131 text

UpgradeD CASSANDRA to Causal consistency 322 lines Java for CORE Safety Custom serialization Client-side caching

Slide 132

Slide 132 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median

Slide 133

Slide 133 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median

Slide 134

Slide 134 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 135

Slide 135 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 136

Slide 136 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 137

Slide 137 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 138

Slide 138 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 139

Slide 139 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 140

Slide 140 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 141

Slide 141 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 142

Slide 142 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 143

Slide 143 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile

Slide 144

Slide 144 text

Dataset Chain Length Message Depth Serialized Size (b) Twitter 2 4 169 Flickr 3 5 201 Metafilter 6 18 525 TUAW 13 8 275 Median Twitter 40 230 5407 Flickr 44 100 2447 Metafilter 170 870 19375 TUAW 62 100 2438 99th percentile Most chains are small Metadata often < 1KB Power laws mean some chains are difficult

Slide 145

Slide 145 text

Strategy 1: Resolve dependencies at read time

Slide 146

Slide 146 text

Strategy 1: Resolve dependencies at read time

Slide 147

Slide 147 text

Strategy 1: Resolve dependencies at read time

Slide 148

Slide 148 text

Strategy 1: Resolve dependencies at read time Often (but not always) within 40% of eventual Long chains hurt throughput

Slide 149

Slide 149 text

Strategy 1: Resolve dependencies at read time Often (but not always) within 40% of eventual Long chains hurt throughput N.B. Locality in YCSB workload greatly helps read performance; dependencies (or replacements) often cached (used 100x default # keys, but still likely to have concurrent write in cache)

Slide 150

Slide 150 text

A thought... Causal consistency trades visibility for safety How far can we push this visibility?

Slide 151

Slide 151 text

SHIM EC Store What if we serve entirely from cache and fetch new data asynchronously?

Slide 152

Slide 152 text

read(B) SHIM EC Store What if we serve entirely from cache and fetch new data asynchronously?

Slide 153

Slide 153 text

read(B) SHIM EC Store B from cache What if we serve entirely from cache and fetch new data asynchronously?

Slide 154

Slide 154 text

read(B) SHIM EC Store read(B) B from cache What if we serve entirely from cache and fetch new data asynchronously?

Slide 155

Slide 155 text

read(B) SHIM EC Store read(B) B@1109, deps=... B from cache What if we serve entirely from cache and fetch new data asynchronously?

Slide 156

Slide 156 text

read(B) SHIM EC Store read(B) B@1109, deps=... read(A) B from cache What if we serve entirely from cache and fetch new data asynchronously?

Slide 157

Slide 157 text

read(B) SHIM EC Store read(B) B@1109, deps=... read(A) A@1092, deps={} B from cache What if we serve entirely from cache and fetch new data asynchronously?

Slide 158

Slide 158 text

read(B) SHIM EC Store read(B) B@1109, deps=... read(A) A@1092, deps={} B from cache What if we serve entirely from cache and fetch new data asynchronously? EC store reads are async

Slide 159

Slide 159 text

A thought... Causal consistency trades visibility for safety How far can we push this visibility? What if we serve reads entirely from cache and fetch new data asynchronously?

Slide 160

Slide 160 text

A thought... Causal consistency trades visibility for safety How far can we push this visibility? What if we serve reads entirely from cache and fetch new data asynchronously? Continuous trade-off space between dependency resolution depth and fast-path latency hit

Slide 161

Slide 161 text

Strategy 2: Fetch dependencies asynchronously

Slide 162

Slide 162 text

Strategy 2: Fetch dependencies asynchronously

Slide 163

Slide 163 text

Throughput exceeds eventual configuration Still causally consistent, more stale reads Strategy 2: Fetch dependencies asynchronously

Slide 164

Slide 164 text

Sync Reads Async Reads

Slide 165

Slide 165 text

Sync Reads Async Reads Reading from cache is fast; linear speedup

Slide 166

Slide 166 text

Sync Reads Async Reads Reading from cache is fast; linear speedup ...but not reading most recent data... ...in this case, effectively a straw-man.

Slide 167

Slide 167 text

Lessons Causal consistency is achievable without modifications to existing stores represent and control ordering between updates EC is “orderless” until convergence trade-off between visibility and ordering

Slide 168

Slide 168 text

Lessons Causal consistency is achievable without modifications to existing stores works well for workloads with small causal histories, good temporal locality represent and control ordering between updates EC is “orderless” until convergence trade-off between visibility and ordering

Slide 169

Slide 169 text

Rethinking the EC API Uncontrolled overwrites increased metadata and local storage requirements Clients had to check causal dependencies independently, with no aid from EC store

Slide 170

Slide 170 text

Rethinking the EC API What if we eliminated overwrites? via multi-versioning, conditional updates or immutability

Slide 171

Slide 171 text

Rethinking the EC API What if we eliminated overwrites? via multi-versioning, conditional updates or immutability No more overwritten histories Decrease metadata Still have to check for dependency arrivals

Slide 172

Slide 172 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)?

Slide 173

Slide 173 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 174

Slide 174 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 175

Slide 175 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 176

Slide 176 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 177

Slide 177 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 178

Slide 178 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 179

Slide 179 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 180

Slide 180 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 181

Slide 181 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? put( after converges)

Slide 182

Slide 182 text

Rethinking the EC API What if the EC store notified us when dependencies converged (arrived everywhere)? Wait to place writes in shared EC store until dependencies have converged No need for metadata No need for additional checks Ensure durability with client-local EC storage

Slide 183

Slide 183 text

Multi-versioning or Conditional Update Stable Callback Reduces Metadata YES YES No Dependency Checks NO YES

Slide 184

Slide 184 text

Multi-versioning or Conditional Update Stable Callback Reduces Metadata YES YES No Dependency Checks NO YES

Slide 185

Slide 185 text

Multi-versioning or Conditional Update Stable Callback Reduces Metadata YES YES No Dependency Checks NO YES

Slide 186

Slide 186 text

Multi-versioning or Conditional Update Stable Callback Reduces Metadata YES YES No Dependency Checks NO YES Data Store Multi-versioning or Conditional Update Stable Callback Amazon DynamoDB YES NO Amazon S3 NO NO Amazon SimpleDB YES NO Amazon Dynamo YES NO Cloudant Data Layer YES NO Google App Engine YES NO Apache Cassandra NO NO Apache CouchDB YES NO Basho Riak YES NO LinkedIn Voldemort YES NO MongoDB YES NO Yahoo! PNUTS YES NO ...not (yet) common to all stores

Slide 187

Slide 187 text

Rethinking the EC API Our extreme approach (unmodified EC store) definitely impeded efficiency (but is portable) Opportunities to better define surgical improvements to API for future stores/shims!

Slide 188

Slide 188 text

Bolt-on Causal Consistency Modular, “bolt-on” architecture cleanly separates safety and liveness upgraded EC (all liveness) to causal consistency, preserving HA, low latency, liveness Challenges: overwrites, managing causal order

Slide 189

Slide 189 text

Bolt-on Causal Consistency Modular, “bolt-on” architecture cleanly separates safety and liveness upgraded EC (all liveness) to causal consistency, preserving HA, low latency, liveness Challenges: overwrites, managing causal order large design space: took an extreme here, but: room for exploration in EC API bolt-on transactions?

Slide 190

Slide 190 text

(Some) Related Work • S3 DB [SIGMOD 2008]: foundational prior work building on EC stores, not causally consistent, not HA (e.g., RYW implementation), AWS- dependent (e.g., assumes queues) • 28msec architecture [SIGMOD Record 2009]: like SIGMOD 2008, treat EC stores as cheap storage • Cloudy [VLDB 2010]: layered approach to data management, partitioning, load balancing, messaging in middleware; larger focus: extensible query model, storage format, routing, etc. • G-Store [SoCC 2010]: provide client and middleware implementation of entity-grouped linearizable transaction support • Bermbach et al. middleware [IC2E 2013]: provides read-your-writes guarantees with caching • Causal Consistency: Bayou [SOSP 1997], Lazy Replication [TOCS 1992], COPS [SOSP 2011], Eiger [NSDI 2013], ChainReaction [EuroSys 2013], Swift [INRIA] are all custom solutions for causal memory [Ga Tech 1993] (inspired by Lamport [CACM 1978])