Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bolt-on Causal Consistency

pbailis
June 27, 2013

Bolt-on Causal Consistency

pbailis

June 27, 2013
Tweet

More Decks by pbailis

Other Decks in Technology

Transcript

  1. Causal
    Consistency
    Peter Bailis, Ali Ghodsi,
    Joseph M. Hellerstein, Ion Stoica
    UC Berkeley
    Bolt on

    View Slide

  2. Slides from
    Sigmod 2013
    paper at
    http://bailis.org/papers/bolton-sigmod2013.pdf
    [email protected]

    View Slide

  3. July 2000:
    CAP
    Conjecture

    View Slide

  4. July 2000:
    CAP
    Conjecture
    A system facing network partitions
    must choose between either
    availability or strong consistency

    View Slide

  5. July 2000:
    CAP
    Conjecture
    A system facing network partitions
    must choose between either
    availability or strong consistency
    Theorem

    View Slide

  6. NoSQL

    View Slide

  7. NoSQL

    View Slide

  8. NoSQL

    View Slide

  9. NoSQL

    View Slide

  10. NoSQL

    View Slide

  11. NoSQL
    Strong consistency
    is out!
    “Partitions matter, and
    so does low latency”
    [cf. Abadi: PACELC]
    ...offer eventual
    consistency instead

    View Slide

  12. Eventual Consistency
    eventually all replicas agree on the same value
    Extremely weak consistency model:

    View Slide

  13. Eventual Consistency
    eventually all replicas agree on the same value
    Extremely weak consistency model:
    Any value can be returned at any given time
    ...as long as it’s eventually the same everywhere

    View Slide

  14. Eventual Consistency
    eventually all replicas agree on the same value
    Extremely weak consistency model:
    Any value can be returned at any given time
    ...as long as it’s eventually the same everywhere
    Provides liveness but no safety guarantees
    Liveness: something good eventually happens
    Safety: nothing bad ever happens

    View Slide

  15. Do we have to give up safety
    if we want availability?

    View Slide

  16. Do we have to give up safety
    if we want availability?
    ?

    View Slide

  17. Do we have to give up safety
    if we want availability?
    ?
    No! There’s a
    spectrum of models.

    View Slide

  18. Do we have to give up safety
    if we want availability?
    ?
    No! There’s a
    spectrum of models.

    View Slide

  19. Do we have to give up safety
    if we want availability?
    ?
    No! There’s a
    spectrum of models.
    UT Austin TR:
    No model stronger
    than Causal Consistency
    is achievable with HA

    View Slide

  20. View Slide

  21. View Slide

  22. Why Causal Consistency?
    Highly available, low latency operation
    Long-identified useful “session” model
    Natural fit for many modern apps
    [Bayou Project, 1994-98]
    [UT Austin 2011 TR]

    View Slide

  23. Dilemma!
    Eventual consistency is the
    lowest common denominator across systems...

    View Slide

  24. Dilemma!
    Eventual consistency is the
    lowest common denominator across systems...
    ...yet eventual consistency is often
    insufficient for many applications...

    View Slide

  25. Dilemma!
    Eventual consistency is the
    lowest common denominator across systems...
    ...and no production-ready storage systems
    offer highly available causal consistency.
    ...yet eventual consistency is often
    insufficient for many applications...

    View Slide

  26. In this talk...
    show how to upgrade existing
    stores to provide HA causal consistency

    View Slide

  27. In this talk...
    show how to upgrade existing
    stores to provide HA causal consistency
    Approach: bolt on a narrow shim layer
    to upgrade eventual consistency

    View Slide

  28. In this talk...
    show how to upgrade existing
    stores to provide HA causal consistency
    Approach: bolt on a narrow shim layer
    to upgrade eventual consistency
    Outcome: architecturally separate safety
    and liveness properties

    View Slide

  29. Separation of Concerns

    View Slide

  30. Separation of Concerns
    Shim handles:
    Consistency/visibility

    View Slide

  31. Consistency-related Safety
    Mostly algorithmic
    Small code base
    Separation of Concerns
    Shim handles:
    Consistency/visibility

    View Slide

  32. Consistency-related Safety
    Mostly algorithmic
    Small code base
    Separation of Concerns
    Shim handles:
    Consistency/visibility
    Underlying store handles:
    Messaging/propagation
    Durability/persistence
    Failure-detection/handling

    View Slide

  33. Consistency-related Safety
    Mostly algorithmic
    Small code base
    Separation of Concerns
    Shim handles:
    Consistency/visibility
    Liveness and Replication
    Lots of engineering
    Reuse existing efforts!
    Underlying store handles:
    Messaging/propagation
    Durability/persistence
    Failure-detection/handling

    View Slide

  34. Consistency-related Safety
    Mostly algorithmic
    Small code base
    Separation of Concerns
    Shim handles:
    Consistency/visibility
    Liveness and Replication
    Lots of engineering
    Reuse existing efforts!
    Underlying store handles:
    Messaging/propagation
    Durability/persistence
    Failure-detection/handling
    Guarantee same (useful) semantics across systems!
    Allows portability, modularity, comparisons

    View Slide

  35. Bolt-on Architecture
    Bolt-on shim layer upgrades the semantics
    of an eventually consistent data store
    Clients only communicate with shim
    Shim communicates with one of many different
    eventually consistent stores (generic)

    View Slide

  36. Bolt-on Architecture
    Bolt-on shim layer upgrades the semantics
    of an eventually consistent data store
    Clients only communicate with shim
    Shim communicates with one of many different
    eventually consistent stores (generic)
    Treat EC store as “storage manager”
    of distributed DBMS

    View Slide

  37. Bolt-on Architecture
    Bolt-on shim layer upgrades the semantics
    of an eventually consistent data store
    Clients only communicate with shim
    Shim communicates with one of many different
    eventually consistent stores (generic)
    Treat EC store as “storage manager”
    of distributed DBMS
    for now, an extreme: unmodified EC store

    View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. Bolt-on causal consistency

    View Slide

  43. View Slide

  44. What is Causal Consistency?

    View Slide

  45. What is Causal Consistency?

    View Slide

  46. What is Causal Consistency?
    Time

    View Slide

  47. What is Causal Consistency?
    Time
    First
    Tweet

    View Slide

  48. What is Causal Consistency?
    Time
    First
    Tweet

    View Slide

  49. What is Causal Consistency?
    Time
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  50. What is Causal Consistency?
    Time
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  51. What is Causal Consistency?
    Time
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  52. What is Causal Consistency?
    Time
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  53. What is Causal Consistency?
    Time
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  54. What is Causal Consistency?
    Reads obey:
    1.) Writes Follow Reads
    (“happens-before”)
    2.) Program order
    3.) Transitivity
    [Lamport 1978]

    View Slide

  55. What is Causal Consistency?
    Reads obey:
    1.) Writes Follow Reads
    (“happens-before”)
    2.) Program order
    3.) Transitivity
    [Lamport 1978]
    Here, applications
    explicitly define
    happens-before
    for each write
    (“explicit causality”)
    [Ladin et al. 1990,
    cf. Bailis et al. 2012]

    View Slide

  56. What is Causal Consistency?
    Reads obey:
    1.) Writes Follow Reads
    (“happens-before”)
    2.) Program order
    3.) Transitivity
    [Lamport 1978]
    Here, applications
    explicitly define
    happens-before
    for each write
    (“explicit causality”)
    [Ladin et al. 1990,
    cf. Bailis et al. 2012]
    First
    Tweet
    Reply
    to
    Alex

    View Slide

  57. What is Causal Consistency?
    Reads obey:
    1.) Writes Follow Reads
    (“happens-before”)
    2.) Program order
    3.) Transitivity
    [Lamport 1978]
    Here, applications
    explicitly define
    happens-before
    for each write
    (“explicit causality”)
    [Ladin et al. 1990,
    cf. Bailis et al. 2012]
    First
    Tweet
    Reply
    to
    Alex
    happens-before

    View Slide

  58. First
    Tweet
    Reply
    to
    Alex
    happens-before
    https://dev.twitter.com/docs/api/1.1/post/statuses/update

    View Slide

  59. First
    Tweet
    Reply
    to
    Alex
    happens-before
    happens-before
    https://dev.twitter.com/docs/api/1.1/post/statuses/update

    View Slide

  60. DC1 DC2

    View Slide

  61. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  62. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  63. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  64. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  65. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  66. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  67. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  68. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  69. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  70. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  71. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  72. First
    Tweet
    Reply
    to
    Alex
    happens-before
    DC1 DC2

    View Slide

  73. 1.) Representing Order
    Two Tasks:
    How do we efficiently store
    causal ordering in the EC system?
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?

    View Slide

  74. 1.) Representing Order
    Two Tasks:
    How do we efficiently store
    causal ordering in the EC system?
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?

    View Slide

  75. 1.) Representing Order
    Two Tasks:
    How do we efficiently store
    causal ordering in the EC system?
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?

    View Slide

  76. 1.) Representing Order
    Two Tasks:
    How do we efficiently store
    causal ordering in the EC system?
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?

    View Slide

  77. Strawman: use vector clocks
    Representing Order
    [e.g., Bayou, Causal Memory]

    View Slide

  78. Strawman: use vector clocks
    Representing Order
    First
    Tweet
    :0
    { }
    :1,
    :1
    { }
    :1,
    Reply-to
    Alex
    [e.g., Bayou, Causal Memory]

    View Slide

  79. Strawman: use vector clocks
    Representing Order
    First
    Tweet
    :0
    { }
    :1,
    :1
    { }
    :1,
    Reply-to
    Alex
    Problem? Given missing dependency
    (from vector), what key should we check?
    [e.g., Bayou, Causal Memory]

    View Slide

  80. Strawman: use vector clocks
    Representing Order
    First
    Tweet
    :0
    { }
    :1,
    :1
    { }
    :1,
    Reply-to
    Alex
    Problem? Given missing dependency
    (from vector), what key should we check?
    If I have <3,1>; where is <2,1>? <1,1>?
    Write to same key?
    Write to different key? Which?
    [e.g., Bayou, Causal Memory]

    View Slide

  81. Strawman: use dependency pointers
    Representing Order
    [e.g., Lazy Replication, COPS]

    View Slide

  82. Strawman: use dependency pointers
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Representing Order
    [e.g., Lazy Replication, COPS]

    View Slide

  83. Strawman: use dependency pointers
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [e.g., Lazy Replication, COPS]

    View Slide

  84. Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [e.g., Lazy Replication, COPS]

    View Slide

  85. Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  86. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  87. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  88. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  89. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  90. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    [e.g., Lazy Replication, COPS]

    View Slide

  91. [email protected]
    [email protected]
    Strawman: use dependency pointers
    Problem?
    First
    Tweet
    A @ timestamp 1092,
    dependencies = {}
    Reply-to
    Alex
    B @ timestamp 1109,
    dependencies={[email protected]}
    Representing Order
    [email protected][email protected][email protected]
    [email protected]
    single pointers can be overwritten!
    [e.g., Lazy Replication, COPS]

    View Slide

  92. Representing Order

    View Slide

  93. Representing Order
    Strawman: use vector clocks
    don’t know what items to check

    View Slide

  94. Strawman: use dependency pointers
    Representing Order
    single pointers can be overwritten
    “overwritten histories”
    Strawman: use vector clocks
    don’t know what items to check

    View Slide

  95. Strawman: use dependency pointers
    Representing Order
    single pointers can be overwritten
    “overwritten histories”
    Strawman: use vector clocks
    don’t know what items to check
    Strawman: use N2 items for messaging

    View Slide

  96. Strawman: use dependency pointers
    Representing Order
    single pointers can be overwritten
    “overwritten histories”
    Strawman: use vector clocks
    don’t know what items to check
    Strawman: use N2 items for messaging
    highly inefficient!

    View Slide

  97. Representing Order
    Solution: store metadata about causal cuts

    View Slide

  98. Representing Order
    Solution: store metadata about causal cuts

    View Slide

  99. Representing Order
    Solution: store metadata about causal cuts
    short answer: consistent cut applied to data items; not
    quite the transitive closure

    View Slide

  100. short answer: consistent cut applied to data items;
    not quite the transitive closure
    Representing Order
    Solution: store metadata about causal cuts

    View Slide

  101. short answer: consistent cut applied to data items;
    not quite the transitive closure
    Representing Order
    Solution: store metadata about causal cuts
    [email protected][email protected][email protected]
    Causal cut for [email protected]: {[email protected], [email protected]}

    View Slide

  102. short answer: consistent cut applied to data items;
    not quite the transitive closure
    Representing Order
    Solution: store metadata about causal cuts
    [email protected][email protected][email protected]
    Causal cut for [email protected]: {[email protected], [email protected]}
    [email protected][email protected][email protected]
    [email protected][email protected]
    Causal cut for [email protected]: {[email protected], [email protected]}

    View Slide

  103. Two Tasks:
    1.) Representing Order
    How do we efficiently store
    causal ordering in the EC system?
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?

    View Slide

  104. Two Tasks:
    1.) Representing Order
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?
    Shim stores causal cut summary
    along with every key due to
    overwrites and “unreliable” delivery

    View Slide

  105. Two Tasks:
    1.) Representing Order
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?
    Shim stores causal cut summary
    along with every key due to
    overwrites and “unreliable” delivery

    View Slide

  106. Controlling Order

    View Slide

  107. Controlling Order
    Standard technique: reveal new writes to readers
    only when dependencies have been revealed
    Inductively guarantee clients read from causal cut

    View Slide

  108. Controlling Order
    Standard technique: reveal new writes to readers
    only when dependencies have been revealed
    Inductively guarantee clients read from causal cut
    In bolt-on causal consistency, two challenges:

    View Slide

  109. Controlling Order
    Standard technique: reveal new writes to readers
    only when dependencies have been revealed
    Inductively guarantee clients read from causal cut
    In bolt-on causal consistency, two challenges:
    Each shim has to check dependencies manually
    Underlying store doesn’t notify clients of new writes

    View Slide

  110. Controlling Order
    Standard technique: reveal new writes to readers
    only when dependencies have been revealed
    Inductively guarantee clients read from causal cut
    In bolt-on causal consistency, two challenges:
    Each shim has to check dependencies manually
    Underlying store doesn’t notify clients of new writes
    EC store may overwrite “stable” cut
    Clients need to cache relevant cut to prevent overwrites

    View Slide

  111. Controlling Order
    Standard technique: reveal new writes to readers
    only when dependencies have been revealed
    Inductively guarantee clients read from causal cut
    In bolt-on causal consistency, two challenges:
    Each shim has to check dependencies manually
    Underlying store doesn’t notify clients of new writes
    EC store may overwrite “stable” cut
    Clients need to cache relevant cut to prevent overwrites

    View Slide

  112. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut

    View Slide

  113. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut

    View Slide

  114. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut

    View Slide

  115. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut

    View Slide

  116. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    SHIM

    View Slide

  117. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    SHIM
    EC Store

    View Slide

  118. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store

    View Slide

  119. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)

    View Slide

  120. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}

    View Slide

  121. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    read(A)

    View Slide

  122. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    read(A)
    [email protected],
    deps={}

    View Slide

  123. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    read(A)
    [email protected],
    deps={}
    [email protected]

    View Slide

  124. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    read(A)
    [email protected],
    deps={}
    [email protected]

    View Slide

  125. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    [email protected],
    deps={}

    View Slide

  126. Each shim has to check dependencies manually
    EC store may overwrite “stable” cut
    read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps={[email protected]}
    [email protected],
    deps={}
    Cache this value for A!
    EC store might overwrite it
    with “unresolved” write

    View Slide

  127. 1.) Representing Order
    Two Tasks:
    2.) Controlling Order
    How do we control the visibility
    of new updates to the EC system?
    Shim stores causal cut summary
    along with every key due to
    overwrites and “unreliable” delivery

    View Slide

  128. 1.) Representing Order
    Two Tasks:
    2.) Controlling Order
    Shim performs dependency checks
    for client, caches dependencies
    Shim stores causal cut summary
    along with every key due to
    overwrites and “unreliable” delivery

    View Slide

  129. View Slide

  130. UpgradeD
    CASSANDRA
    to
    Causal
    consistency

    View Slide

  131. UpgradeD
    CASSANDRA
    to
    Causal
    consistency
    322 lines Java for CORE Safety
    Custom serialization
    Client-side caching

    View Slide

  132. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median

    View Slide

  133. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median

    View Slide

  134. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  135. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  136. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  137. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  138. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  139. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  140. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  141. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  142. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  143. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile

    View Slide

  144. Dataset Chain Length Message Depth Serialized Size (b)
    Twitter 2 4 169
    Flickr 3 5 201
    Metafilter 6 18 525
    TUAW 13 8 275
    Median
    Twitter 40 230 5407
    Flickr 44 100 2447
    Metafilter 170 870 19375
    TUAW 62 100 2438
    99th percentile
    Most chains are small
    Metadata often < 1KB
    Power laws mean some chains are difficult

    View Slide

  145. Strategy 1: Resolve dependencies at read time

    View Slide

  146. Strategy 1: Resolve dependencies at read time

    View Slide

  147. Strategy 1: Resolve dependencies at read time

    View Slide

  148. Strategy 1: Resolve dependencies at read time
    Often (but not always) within 40% of eventual
    Long chains hurt throughput

    View Slide

  149. Strategy 1: Resolve dependencies at read time
    Often (but not always) within 40% of eventual
    Long chains hurt throughput
    N.B. Locality in YCSB workload greatly helps read
    performance; dependencies (or replacements) often cached
    (used 100x default # keys, but still likely to have concurrent write in cache)

    View Slide

  150. A thought...
    Causal consistency trades visibility for safety
    How far can we push this visibility?

    View Slide

  151. SHIM
    EC Store
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  152. read(B)
    SHIM
    EC Store
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  153. read(B)
    SHIM
    EC Store
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  154. read(B)
    SHIM
    EC Store
    read(B)
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  155. read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps=...
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  156. read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps=...
    read(A)
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  157. read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps=...
    read(A)
    [email protected],
    deps={}
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?

    View Slide

  158. read(B)
    SHIM
    EC Store
    read(B)
    [email protected],
    deps=...
    read(A)
    [email protected],
    deps={}
    B
    from cache
    What if we serve entirely from cache
    and fetch new data asynchronously?
    EC store
    reads are
    async

    View Slide

  159. A thought...
    Causal consistency trades visibility for safety
    How far can we push this visibility?
    What if we serve reads entirely from cache
    and fetch new data asynchronously?

    View Slide

  160. A thought...
    Causal consistency trades visibility for safety
    How far can we push this visibility?
    What if we serve reads entirely from cache
    and fetch new data asynchronously?
    Continuous trade-off space between dependency
    resolution depth and fast-path latency hit

    View Slide

  161. Strategy 2: Fetch dependencies asynchronously

    View Slide

  162. Strategy 2: Fetch dependencies asynchronously

    View Slide

  163. Throughput exceeds eventual configuration
    Still causally consistent, more stale reads
    Strategy 2: Fetch dependencies asynchronously

    View Slide

  164. Sync Reads
    Async Reads

    View Slide

  165. Sync Reads
    Async Reads
    Reading from cache is fast; linear speedup

    View Slide

  166. Sync Reads
    Async Reads
    Reading from cache is fast; linear speedup
    ...but not reading most recent data...
    ...in this case, effectively a straw-man.

    View Slide

  167. Lessons
    Causal consistency is achievable without
    modifications to existing stores
    represent and control ordering between updates
    EC is “orderless” until convergence
    trade-off between visibility and ordering

    View Slide

  168. Lessons
    Causal consistency is achievable without
    modifications to existing stores
    works well for workloads with small causal
    histories, good temporal locality
    represent and control ordering between updates
    EC is “orderless” until convergence
    trade-off between visibility and ordering

    View Slide

  169. Rethinking the EC API
    Uncontrolled overwrites increased metadata
    and local storage requirements
    Clients had to check causal dependencies
    independently, with no aid from EC store

    View Slide

  170. Rethinking the EC API
    What if we eliminated overwrites?
    via multi-versioning,
    conditional updates
    or immutability

    View Slide

  171. Rethinking the EC API
    What if we eliminated overwrites?
    via multi-versioning,
    conditional updates
    or immutability
    No more overwritten histories
    Decrease metadata
    Still have to check for dependency arrivals

    View Slide

  172. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?

    View Slide

  173. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  174. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  175. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  176. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  177. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  178. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  179. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  180. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  181. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    put( after converges)

    View Slide

  182. Rethinking the EC API
    What if the EC store notified us when
    dependencies converged (arrived everywhere)?
    Wait to place writes in shared EC store until
    dependencies have converged
    No need for metadata
    No need for additional checks
    Ensure durability with client-local EC storage

    View Slide

  183. Multi-versioning or
    Conditional Update
    Stable Callback
    Reduces Metadata YES YES
    No Dependency
    Checks
    NO YES

    View Slide

  184. Multi-versioning or
    Conditional Update
    Stable Callback
    Reduces Metadata YES YES
    No Dependency
    Checks
    NO YES

    View Slide

  185. Multi-versioning or
    Conditional Update
    Stable Callback
    Reduces Metadata YES YES
    No Dependency
    Checks
    NO YES

    View Slide

  186. Multi-versioning or
    Conditional Update
    Stable Callback
    Reduces Metadata YES YES
    No Dependency
    Checks
    NO YES
    Data Store
    Multi-versioning or
    Conditional Update
    Stable Callback
    Amazon DynamoDB YES NO
    Amazon S3 NO NO
    Amazon SimpleDB YES NO
    Amazon Dynamo YES NO
    Cloudant Data Layer YES NO
    Google App Engine YES NO
    Apache Cassandra NO NO
    Apache CouchDB YES NO
    Basho Riak YES NO
    LinkedIn Voldemort YES NO
    MongoDB YES NO
    Yahoo! PNUTS YES NO
    ...not (yet) common to all stores

    View Slide

  187. Rethinking the EC API
    Our extreme approach (unmodified EC store)
    definitely impeded efficiency (but is portable)
    Opportunities to better define surgical
    improvements to API for future stores/shims!

    View Slide

  188. Bolt-on Causal Consistency
    Modular, “bolt-on” architecture cleanly separates
    safety and liveness
    upgraded EC (all liveness) to causal consistency,
    preserving HA, low latency, liveness
    Challenges: overwrites, managing causal order

    View Slide

  189. Bolt-on Causal Consistency
    Modular, “bolt-on” architecture cleanly separates
    safety and liveness
    upgraded EC (all liveness) to causal consistency,
    preserving HA, low latency, liveness
    Challenges: overwrites, managing causal order
    large design space:
    took an extreme here, but:
    room for exploration in EC API
    bolt-on transactions?

    View Slide

  190. (Some) Related Work
    • S3 DB [SIGMOD 2008]: foundational prior work building on EC stores,
    not causally consistent, not HA (e.g., RYW implementation), AWS-
    dependent (e.g., assumes queues)
    • 28msec architecture [SIGMOD Record 2009]: like SIGMOD 2008, treat
    EC stores as cheap storage
    • Cloudy [VLDB 2010]: layered approach to data management,
    partitioning, load balancing, messaging in middleware; larger focus:
    extensible query model, storage format, routing, etc.
    • G-Store [SoCC 2010]: provide client and middleware implementation of
    entity-grouped linearizable transaction support
    • Bermbach et al. middleware [IC2E 2013]: provides read-your-writes
    guarantees with caching
    • Causal Consistency: Bayou [SOSP 1997], Lazy Replication [TOCS 1992],
    COPS [SOSP 2011], Eiger [NSDI 2013], ChainReaction [EuroSys 2013],
    Swift [INRIA] are all custom solutions for causal memory [Ga Tech 1993]
    (inspired by Lamport [CACM 1978])

    View Slide