Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coordination Avoidance In Distributed Databases

pbailis
January 01, 2015

Coordination Avoidance In Distributed Databases

Job talk from early 2015

The rise of Internet-scale geo-replicated services has led to considerable upheaval in the design of modern data management systems. Namely, given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., “NoSQL”) has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price?

In this talk, I will discuss the potential for coordination avoidance — the use of as little coordination as possible while ensuring application integrity — in several modern data-intensive domains. Specifically, I will demonstrate how to leverage the semantic requirements of applications in data serving, transaction processing, and statistical analytics to enable more efficient distributed algorithms and system designs. The prototype systems I have built demonstrate order-of-magnitude speedups compared to their traditional, coordinated counterparts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and asynchronous convex optimization. I will also discuss our experiences studying and optimizing a range of open source applications and systems, which exhibit similar results.

pbailis

January 01, 2015
Tweet

More Decks by pbailis

Other Decks in Programming

Transcript

  1. COORDINATION
    AVOIDANCE

    IN

    DISTRIBUTED

    DATABASES
    PETER BAILIS
    UC Berkeley

    View Slide

  2. View Slide

  3. DATA TODAY:

    View Slide

  4. SCALE
    DATA TODAY:
    UNPRECEDENTED

    View Slide

  5. SCALE Billion-user Internet services
    3B Internet users in 2014
    2.3B Mobile broadband users
    DATA TODAY:
    UNPRECEDENTED
    Ericsson Mobility Report,
    UN International Telecommunication Union, Facebook, Google, NSA,

    View Slide

  6. SCALE
    VOLUME
    Billion-user Internet services
    3B Internet users in 2014
    2.3B Mobile broadband users
    Facebook RocksDB: 9B ops/sec
    Google BigTable: 600M ops/sec
    LinkedIn Kafka: 2.5M ops/sec
    DATA TODAY:
    UNPRECEDENTED
    Ericsson Mobility Report,
    UN International Telecommunication Union, Facebook, Google, NSA, @RocksDB, @AKPurtell, Martin Kleppmann

    View Slide

  7. SCALE
    VOLUME
    INTERACTIVITY
    Billion-user Internet services
    3B Internet users in 2014
    2.3B Mobile broadband users
    Facebook RocksDB: 9B ops/sec
    Google BigTable: 600M ops/sec
    LinkedIn Kafka: 2.5M ops/sec
    Impatient users want low latency
    Always-on responsiveness
    Personalized user experiences
    DATA TODAY:
    UNPRECEDENTED
    Ericsson Mobility Report,
    UN International Telecommunication Union, Facebook, Google, NSA, @RocksDB, @AKPurtell, Martin Kleppmann

    View Slide

  8. SCALE
    VOLUME
    INTERACTIVITY
    DATA TODAY:
    UNPRECEDENTED

    View Slide

  9. SCALE
    VOLUME
    INTERACTIVITY
    AND GROWING!
    DATA TODAY:
    UNPRECEDENTED

    View Slide

  10. View Slide

  11. View Slide

  12. “post
    on
    timeline”
    “accept
    friend
    request”

    View Slide

  13. How should we design database systems
    that enable applications to scale?
    “post
    on
    timeline”
    “accept
    friend
    request”

    View Slide

  14. View Slide

  15. CLASSIC:

    ACID

    View Slide

  16. CLASSIC:

    ACID
    serializable transactions
    “accept
    friend
    request”
    “post
    on
    timeline”

    View Slide

  17. CLASSIC:

    ACID
    serializable transactions
    “accept
    friend
    request”
    “post
    on
    timeline”

    View Slide

  18. CLASSIC:

    ACID
    serializable transactions

    View Slide

  19. serializability: equivalence to some serial execution

    View Slide

  20. “post
    on
    timeline”
    serializability: equivalence to some serial execution

    View Slide

  21. “post
    on
    timeline”
    “accept
    friend
    request”
    serializability: equivalence to some serial execution

    View Slide

  22. “post
    on
    timeline”
    “accept
    friend
    request”
    serializability: equivalence to some serial execution
    very general!

    View Slide

  23. r(y)
    w(x←1)
    r(x)
    w(y←1)
    very general!
    serializability: equivalence to some serial execution

    View Slide

  24. r(y)
    w(x←1)
    r(x)
    w(y←1)
    very general!
    …but restricts concurrency
    serializability: equivalence to some serial execution

    View Slide

  25. serializability: equivalence to some serial execution
    very general!
    …but restricts concurrency

    View Slide

  26. serializability: equivalence to some serial execution
    very general!
    …but restricts concurrency
    CONCURRENT EXECUTION

    View Slide

  27. serializability: equivalence to some serial execution
    r(x)=0
    very general!
    …but restricts concurrency
    CONCURRENT EXECUTION

    View Slide

  28. serializability: equivalence to some serial execution
    r(x)=0
    r(y)=0
    very general!
    …but restricts concurrency
    CONCURRENT EXECUTION

    View Slide

  29. serializability: equivalence to some serial execution
    r(x)=0
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    CONCURRENT EXECUTION

    View Slide

  30. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    CONCURRENT EXECUTION

    View Slide

  31. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    CONCURRENT EXECUTION

    View Slide

  32. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    CONCURRENT EXECUTION

    View Slide

  33. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    CONCURRENT EXECUTION

    View Slide

  34. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION

    View Slide

  35. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION

    View Slide

  36. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION

    View Slide

  37. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION
    IS NOT SERIALIZABLE!

    View Slide

  38. serializability: equivalence to some serial execution
    r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    very general!
    …but restricts concurrency
    transactions cannot make progress independently
    Serializability requires Coordination
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION
    IS NOT SERIALIZABLE!

    View Slide

  39. transactions cannot make progress independently
    Serializability requires Coordination

    View Slide

  40. transactions cannot make progress independently
    Serializability requires Coordination
    Two-Phase Locking
    Optimistic Concurrency Control Pre-Scheduling
    Multi-Version Concurrency Control

    View Slide

  41. transactions cannot make progress independently
    Serializability requires Coordination
    Two-Phase Locking
    Optimistic Concurrency Control Pre-Scheduling
    Multi-Version Concurrency Control Blocking
    Waiting
    Aborts

    View Slide

  42. transactions cannot make progress independently
    Serializability requires Coordination
    Two-Phase Locking
    Optimistic Concurrency Control Pre-Scheduling
    Multi-Version Concurrency Control Blocking
    Waiting
    Aborts
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  43. 1. Decreased performance
    transactions cannot make progress independently
    Serializability requires Coordination
    Two-Phase Locking
    Optimistic Concurrency Control Pre-Scheduling
    Multi-Version Concurrency Control Blocking
    Waiting
    Aborts
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. 2 3 4 5 6 7 8
    Number of Servers in Transaction
    0
    200
    400
    600
    800
    1000
    1200
    Maximum Throughput (txns/s)
    Number of Servers in Transaction
    Local datacenter
    (Amazon EC2)
    Based on
    [Bobtail, Xu et al., NSDI 13]
    For conflicting transactions

    View Slide

  48. 2 3 4 5 6 7 8
    Number of Servers in Transaction
    0
    200
    400
    600
    800
    1000
    1200
    Maximum Throughput (txns/s)
    Number of Servers in Transaction
    Local datacenter
    (Amazon EC2)
    Based on
    [Bobtail, Xu et al., NSDI 13]
    For conflicting transactions

    View Slide

  49. 2 3 4 5 6 7 8
    Number of Servers in Transaction
    0
    200
    400
    600
    800
    1000
    1200
    Maximum Throughput (txns/s)
    Number of Servers in Transaction
    +OR +CA +IR +SP +TO +SI +SY
    Participating Datacenters (+VA)
    2
    4
    6
    8
    10
    12
    Maximum Throughput (txn/s)
    Local datacenter
    (Amazon EC2)
    Based on
    [Bobtail, Xu et al., NSDI 13]
    Multi-datacenter
    (Amazon EC2)
    Based on
    [HAT, Bailis et al., VLDB 14]
    For conflicting transactions

    View Slide

  50. 2 3 4 5 6 7 8
    Number of Servers in Transaction
    0
    200
    400
    600
    800
    1000
    1200
    Maximum Throughput (txns/s)
    Number of Servers in Transaction
    +OR +CA +IR +SP +TO +SI +SY
    Participating Datacenters (+VA)
    2
    4
    6
    8
    10
    12
    Maximum Throughput (txn/s)
    Local datacenter
    (Amazon EC2)
    Based on
    [Bobtail, Xu et al., NSDI 13]
    Multi-datacenter
    (Amazon EC2)
    Based on
    [HAT, Bailis et al., VLDB 14]
    For conflicting transactions

    View Slide

  51. 2 3 4 5 6 7 8
    Number of Servers in Transaction
    0
    200
    400
    600
    800
    1000
    1200
    Maximum Throughput (txns/s)
    Number of Servers in Transaction
    +OR +CA +IR +SP +TO +SI +SY
    Participating Datacenters (+VA)
    2
    4
    6
    8
    10
    12
    Maximum Throughput (txn/s)
    Local datacenter
    (Amazon EC2)
    Based on
    [Bobtail, Xu et al., NSDI 13]
    Multi-datacenter
    (Amazon EC2)
    Based on
    [HAT, Bailis et al., VLDB 14]
    For conflicting transactions

    View Slide

  52. 1. Decreased performance
    » due to waiting, communication delays, aborts
    » exacerbated in distributed environment!
    2. Decreased availability during failures
    transactions cannot make progress independently
    Serializability requires Coordination
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  53. 1. Decreased performance
    » due to waiting, communication delays, aborts
    » exacerbated in distributed environment!
    2. Decreased availability during failures
    transactions cannot make progress independently
    Serializability requires Coordination
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  54. 1. Decreased performance
    » due to waiting, communication delays, aborts
    » exacerbated in distributed environment!
    2. Decreased availability during failures
    transactions cannot make progress independently
    Serializability requires Coordination
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  55. 1. Decreased performance
    » due to waiting, communication delays, aborts
    » exacerbated in distributed environment!
    2. Decreased availability during failures
    transactions cannot make progress independently
    Serializability requires Coordination
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  56. 1. Decreased performance
    » due to waiting, communication delays, aborts
    » exacerbated in distributed environment!
    2. Decreased availability during failures
    transactions cannot make progress independently
    Serializability requires Coordination
    Well-known for decades; cf. “CAP”
    Costs of Coordination
    Between Concurrent Transactions

    View Slide

  57. How should we design database systems
    that enable applications to scale?

    View Slide

  58. Serializability
    COORDINATION
    REQUIRED
    How should we design database systems
    that enable applications to scale?

    View Slide

  59. Serializability
    COORDINATION
    REQUIRED
    “NoSQL”
    COORDINATION
    FREE
    How should we design database systems
    that enable applications to scale?

    View Slide

  60. NoSQL

    View Slide

  61. NoSQL

    View Slide

  62. View Slide

  63. Eventual Consistency
    “if no new updates are made to the [database],
    eventually all accesses will return the last updated value[s]”
    — Werner Vogels, Amazon CTO

    View Slide

  64. Eventual Consistency
    “if no new updates are made to the [database],
    eventually all accesses will return the last updated value[s]”
    — Werner Vogels, Amazon CTO

    View Slide

  65. Eventual Consistency
    “if no new updates are made to the [database],
    eventually all accesses will return the last updated value[s]”
    — Werner Vogels, Amazon CTO

    View Slide

  66. Eventual Consistency
    “if no new updates are made to the [database],
    eventually all accesses will return the last updated value[s]”
    — Werner Vogels, Amazon CTO

    View Slide

  67. Eventual Consistency
    “if no new updates are made to the [database],
    eventually all accesses will return the last updated value[s]”
    — Werner Vogels, Amazon CTO
    provides no safety: what happens in the meantime?

    View Slide

  68. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)

    View Slide

  69. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior

    View Slide

  70. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior
    » Key finding: frequently “correct” results…

    View Slide

  71. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior
    » Key finding: frequently “correct” results…
    PBS: Voldemort Database at LinkedIn
    99% of reads return the last update 23ms after write

    View Slide

  72. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior
    » Key finding: frequently “correct” results…
    PBS: Voldemort Database at LinkedIn
    99% of reads return the last update 23ms after write
    32-90% decrease in 99.9th percentile latency

    View Slide

  73. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior
    » Key finding: frequently “correct” results…
    PBS: Voldemort Database at LinkedIn
    99% of reads return the last update 23ms after write
    32-90% decrease in 99.9th percentile latency

    View Slide

  74. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”,
    SIGMOD 2013 (Demo), CACM Research Highlight]
    Probabilistically Bounded Staleness (PBS)
    » Monte Carlo analysis of protocol behavior
    » Key finding: frequently “correct” results…
    PBS: Voldemort Database at LinkedIn
    99% of reads return the last update 23ms after write
    32-90% decrease in 99.9th percentile latency
    …BUT NO GUARANTEES!
    㱺 DIFFICULT TO PROGRAM

    View Slide

  75. View Slide

  76. View Slide

  77. “…sometimes the [write] is
    retrieved from the datastore and
    sometimes it is not.”

    View Slide

  78. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY

    View Slide

  79. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  80. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    MY WORK:

    View Slide

  81. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    MY WORK:

    View Slide

  82. The Far Side,
    Gary Larson

    View Slide

  83. View Slide

  84. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”

    View Slide

  85. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”
    write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read

    View Slide

  86. View Slide

  87. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:

    View Slide

  88. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:
    Study practical database use cases

    View Slide

  89. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:
    Study practical database use cases
    Derive principles and algorithms

    View Slide

  90. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:
    Study practical database use cases
    Derive principles and algorithms
    Build systems to realize the benefits

    View Slide

  91. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:
    Study practical database use cases
    Derive principles and algorithms
    Build systems to realize the benefits

    View Slide

  92. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  93. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  94. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  95. Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  96. Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  97. Atomic Visibility
    SIGMOD14
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  98. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  99. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION

    View Slide

  100. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    Data Serving and Transactions

    View Slide

  101. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    Data Serving and Transactions
    Model Prediction
    and Training
    CIDR15, TBA
    Analytics

    View Slide

  102. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  103. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  104. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  105. (Abridged) Related Work

    View Slide

  106. (Abridged) Related Work
    » Semantics-based concurrency control: esp.
    commutativity and CALM analysis, laws of order
    » Available storage systems: optimistic replication,
    causal memory, CRDTs, eventually consistent transactions
    » Distributed computing: CAP, FLP, NBAC, quorums

    View Slide

  107. (Abridged) Related Work
    » Semantics-based concurrency control: esp.
    commutativity and CALM analysis, laws of order
    » Available storage systems: optimistic replication,
    causal memory, CRDTs, eventually consistent transactions
    » Distributed computing: CAP, FLP, NBAC, quorums
    » Here: focus on necessary coordination for
    common, modern data-intensive apps

    View Slide

  108. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  109. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    1

    View Slide

  110. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    1
    2

    View Slide

  111. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    1
    2 3

    View Slide

  112. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    1

    View Slide

  113. Social Graph

    View Slide

  114. Social Graph

    View Slide

  115. Social Graph
    Facebook

    View Slide

  116. Social Graph
    1.2B+ vertices
    Facebook

    View Slide

  117. Social Graph
    1.2B+ vertices
    420B+ edges
    Facebook

    View Slide

  118. Social Graph
    1.2B+ vertices
    420B+ edges
    Facebook

    View Slide

  119. Social Graph
    1
    2
    3
    4
    5
    6
    User
    Facebook
    1.2B+ vertices
    420B+ edges

    View Slide

  120. Social Graph
    1
    2
    3
    4
    5
    6
    2, 3, 5
    User Adjacency List
    1, 3, 5
    1, 5, 6
    6
    1, 2, 3, 6
    3, 4, 5
    Facebook
    1.2B+ vertices
    420B+ edges

    View Slide

  121. Social Graph
    1 2, 3, 5
    User Adjacency List
    2 1, 3, 5
    3 1, 5, 6
    4 6
    5 1, 2, 3, 6
    6 3, 4, 5
    1.2B+ vertices
    420B+ edges
    Facebook

    View Slide

  122. 1 2, 3, 5 6 3, 4, 5

    View Slide

  123. 1 2, 3, 5 6 3, 4, 5

    View Slide

  124. 1 2, 3, 5 6 3, 4, 5
    ,6 ,1

    View Slide

  125. 1 2, 3, 5 6 3, 4, 5
    ,6 ,1
    To preserve graph,
    should observe either:
    » Both links
    » Neither link

    View Slide

  126. 1 2, 3, 5 6 3, 4, 5
    ,6 ,1
    To preserve graph,
    should observe either:
    » Both links
    » Neither link
    Atomic Visibility

    View Slide

  127. Atomic Visibility

    View Slide

  128. Atomic Visibility
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  129. Atomic Visibility
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  130. Atomic Visibility
    X = 1
    WRITE
    Y = 1
    WRITE
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  131. Atomic Visibility
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    X = 1
    WRITE
    Y = 1
    WRITE
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  132. Atomic Visibility
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    X = 1
    WRITE
    Y = 1
    WRITE
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  133. Atomic Visibility
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    either all or none of each transaction’s updates
    should be visible to other transactions

    View Slide

  134. BUT NOT
    Atomic Visibility
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    either all or none of each transaction’s updates
    should be visible to other transactions
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =

    View Slide

  135. BUT NOT
    Atomic Visibility
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    either all or none of each transaction’s updates
    should be visible to other transactions
    OR
    X = 1
    READ
    Y = 1
    READ
    READ X =
    READ Y =
    “FRACTURED READS”

    View Slide

  136. Atomic Visibility
    is sufficient to correctly maintain:
    social graph structure

    View Slide

  137. r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION
    IS NOT SERIALIZABLE!
    Atomic Visibility
    is not serializability!

    View Slide

  138. r(x)=0
    w(x←1)
    w(y←1)
    r(y)=0
    Should have
    r(y)!1
    r(y)=0
    w(x←1)
    2
    r(x)=0
    w(y←1)
    1
    Should have
    r(x)!1
    r(y)=0
    w(x←1)
    1
    r(x)=0
    w(y←1)
    2
    CONCURRENT EXECUTION
    IS NOT SERIALIZABLE!
    Atomic Visibility
    is not serializability!
    …but respects
    Atomic Visibility!

    View Slide

  139. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared

    View Slide

  140. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  141. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  142. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  143. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  144. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  145. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    WANT
    TO
    PREVENT

    View Slide

  146. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    Require coordination to
    prevent! [VLDB 2014]
    WANT
    TO
    PREVENT

    View Slide

  147. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    Require coordination to
    prevent! [VLDB 2014]
    WANT
    TO
    PREVENT

    View Slide

  148. Fractured
    Reads
    Item Anti-
    Dependency Cycles
    Anti-Dependency
    Cycles
    Serializability Prevents Prevents Prevents
    Snapshot Isolation Prevents Prevents Doesn’t
    prevent
    Atomic Visibility
    via
    Read Atomic
    Prevents Doesn’t
    prevent
    Doesn’t
    prevent
    Eventual
    Consistency
    Doesn’t
    prevent
    Doesn’t
    prevent
    Doesn’t
    prevent
    Atomic Visibility compared
    Require coordination to
    prevent! [VLDB 2014]
    WANT
    TO
    PREVENT

    View Slide

  149. Atomic Visibility
    is sufficient to correctly maintain:
    social graph structure

    View Slide

  150. Also applies to other
    relationships

    View Slide

  151. Also applies to other
    relationships
    an attending
    doctor
    should
    have
    each
    patient

    View Slide

  152. Atomic Visibility
    is sufficient to correctly maintain:
    social graph structure

    View Slide

  153. Atomic Visibility
    is sufficient to correctly maintain:
    referential integrity
    secondary indexes
    materialized views
    social graph structure

    View Slide

  154. Atomic Visibility
    is sufficient to correctly maintain:
    referential integrity
    secondary indexes
    materialized views
    despite being weaker than serializability
    social graph structure

    View Slide

  155. Atomic Visibility via Locking

    View Slide

  156. Atomic Visibility via Locking
    X=0 Y=0
    X = 1
    W
    Y = 1
    W

    View Slide

  157. Atomic Visibility via Locking
    X = 1
    W
    Y = 1
    W
    X=1 Y=1

    View Slide

  158. Atomic Visibility via Locking
    X = 1
    R
    Y = 1
    R
    X = 1
    W
    Y = 1
    W
    X=1 Y=1

    View Slide

  159. Atomic Visibility via Locking
    X = 1
    W
    Y = 1
    W
    Y=0
    X=1

    View Slide

  160. Atomic Visibility via Locking
    X = ?
    R
    X = 1
    W
    Y = 1
    W
    Y=0
    Y = ?
    R
    X=1

    View Slide

  161. Atomic Visibility via Locking
    X = ?
    R
    X = 1
    W
    Y = 1
    W
    Y=0
    Y = ?
    R
    X=1
    Server 1001 Server 1002

    View Slide

  162. Atomic Visibility via Locking
    X = ?
    R
    X = 1
    W
    Y = 1
    W
    Y=0
    Y = ?
    R
    X=1
    Server 1001 Server 1002

    View Slide

  163. Atomic Visibility via Locking
    X = ?
    R
    X = 1
    W
    Y = 1
    W
    Y=0
    Y = ?
    R
    X=1
    Server 1001 Server 1002

    View Slide

  164. View Slide

  165. T
    I
    M
    E

    View Slide

  166. LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    T
    I
    M
    E

    View Slide

  167. LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E

    View Slide

  168. LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E

    View Slide

  169. LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E
    OPTIMISTIC

    View Slide

  170. Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E
    OPTIMISTIC
    VALIDATE
    ATOMICITY

    View Slide

  171. Y
    X
    LOCKING
    VIOLATED?
    ABORT
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E
    OPTIMISTIC
    VALIDATE
    ATOMICITY

    View Slide

  172. Y
    X
    LOCKING
    VIOLATED?
    ABORT
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E
    OPTIMISTIC
    VALIDATE
    ATOMICITY

    View Slide

  173. Y
    X
    LOCKING
    VIOLATED?
    ABORT
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    ATOMICITY
    VIOLATED!
    T
    I
    M
    E
    OPTIMISTIC
    VALIDATE
    ATOMICITY
    BOTH RELY
    ON
    COORDINATION

    View Slide

  174. Due to coordination overheads…

    View Slide

  175. Facebook Tao
    Google Megastore
    LinkedIn Espresso
    Due to coordination overheads…
    Amazon DynamoDB
    Apache Cassandra
    Basho Riak
    Yahoo! PNUTS
    Google App Engine

    View Slide

  176. Facebook Tao
    Google Megastore
    LinkedIn Espresso
    Due to coordination overheads…
    Amazon DynamoDB
    Apache Cassandra
    Basho Riak
    Yahoo! PNUTS
    …consciously choose to
    violate atomic visibility
    Google App Engine

    View Slide

  177. Facebook Tao
    Google Megastore
    LinkedIn Espresso
    Due to coordination overheads…
    Amazon DynamoDB
    Apache Cassandra
    Basho Riak
    Yahoo! PNUTS
    …consciously choose to
    violate atomic visibility
    “[Tao] explicitly favors
    efficiency and availability over
    consistency…[an edge] may
    exist without an inverse; these
    hanging associations are
    scheduled for repair by an
    asynchronous job.”
    Google App Engine

    View Slide

  178. Our contributions:
    to maintain
    social graph structure
    referential integrity
    [SIGMOD 2014, selected for “Best of SIGMOD” ACM TODS]
    secondary indexes
    materialized views

    View Slide

  179. Our contributions:
    to maintain
    1. A new model: atomic visibility (via Read
    Atomic isolation) is (provably) sufficient
    social graph structure
    referential integrity
    [SIGMOD 2014, selected for “Best of SIGMOD” ACM TODS]
    secondary indexes
    materialized views

    View Slide

  180. Our contributions:
    to maintain
    1. A new model: atomic visibility (via Read
    Atomic isolation) is (provably) sufficient
    2. Efficient protocols: RAMP transactions
    enforce atomic visibility without coordination
    social graph structure
    referential integrity
    [SIGMOD 2014, selected for “Best of SIGMOD” ACM TODS]
    secondary indexes
    materialized views

    View Slide

  181. WHAT THE APPLICATION SAYS
    “accept
    friend
    request”
    “update
    index
    entry”
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    read
    WHAT THE DATABASE HEARS
    read
    read read write
    read
    write

    View Slide

  182. “accept
    friend
    request”
    “update
    index
    entry”
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    write
    read

    View Slide

  183. “accept
    friend
    request”
    “update
    index
    entry”
    ATOMIC VISIBILITY
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    write
    read

    View Slide

  184. “accept
    friend
    request”
    “update
    index
    entry”
    RAMP
    TRANSACTION
    ATOMIC VISIBILITY
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    write
    read

    View Slide

  185. “accept
    friend
    request”
    “update
    index
    entry”
    RAMP
    TRANSACTION
    RAMP
    TRANSACTION
    ATOMIC VISIBILITY
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    write
    read

    View Slide

  186. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC
    T
    I
    M
    E
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  187. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    T
    I
    M
    E
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  188. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  189. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    W(Y)
    R(X)
    R(Y)
    W(X)
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  190. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    W(Y)
    R(X)
    R(Y)
    W(X)
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    Don’t
    panic!
    Don’t
    abort!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  191. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    W(Y)
    R(X)
    R(Y)
    W(X)
    DETECT
    RACES
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    Don’t
    panic!
    Don’t
    abort!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  192. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    W(Y)
    R(X)
    R(Y)
    W(X)
    REPAIR
    ATOMICITY
    DETECT
    RACES
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    Don’t
    panic!
    Don’t
    abort!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  193. ATOMICITY
    VIOLATED!
    Y
    X
    LOCKING
    W(Y)
    R(X)
    R(Y)
    W(X)
    W(Y)
    R(X)
    R(Y)
    W(X)
    OPTIMISTIC RAMP TRANSACTIONS
    W(Y)
    R(X)
    R(Y)
    W(X)
    REPAIR
    ATOMICITY
    DETECT
    RACES
    R(Y)
    T
    I
    M
    E
    Without
    coordination,
    atomicity
    violations will
    (initially)
    occur!
    Don’t
    panic!
    Don’t
    abort!
    VIOLATED?
    ABORT
    VALIDATE
    ATOMICITY

    View Slide

  194. RAMP
    TRANSACTIONS
    REPAIR
    ATOMICITY
    DETECT
    RACES

    View Slide

  195. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES

    View Slide

  196. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES

    View Slide

  197. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002

    View Slide

  198. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002
    X=1

    View Slide

  199. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002
    X=1
    X = ?
    R
    Y = ?
    R
    X = 1
    Y = 0

    View Slide

  200. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002
    X=1
    X = ?
    R
    Y = ?
    R
    X = 1
    Y = 0

    View Slide

  201. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002
    X=1
    X = ?
    R
    Y = ?
    R
    X = 1
    Y = 0

    View Slide

  202. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    X=0 Y=0
    Server 1002
    X=1
    X = ?
    R
    Y = ?
    R
    X = 1
    Y = 0
    via intention
    metadata

    View Slide

  203. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    Server 1001
    Y=0
    Server 1002
    X=1
    via intention
    metadata

    View Slide

  204. Y=0 T0 {}
    intention
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    X=1 T1 {Y}
    intention
    · T0
    intention
    ·
    via intention
    metadata

    View Slide

  205. value
    Y=0 T0 {}
    intention
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · T0
    intention
    ·
    via intention
    metadata

    View Slide

  206. value
    Y=0 T0 {}
    intention
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · T0
    intention
    ·
    via intention
    metadata

    View Slide

  207. value
    Y=0 T0 {}
    intention
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · T0
    intention
    ·
    via intention
    metadata
    “A transaction called T1 wrote this and also wrote to Y”

    View Slide

  208. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · value
    Y=0 T0 {}
    intention
    ·
    via intention
    metadata

    View Slide

  209. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · value
    Y=0 T0 {}
    intention
    ·
    via intention
    metadata

    View Slide

  210. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    · value
    Y=0 T0 {}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    Y = ?
    R

    View Slide

  211. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    Y = ?
    R
    X = 1
    W
    Y = 1
    W
    value
    Y=0 T0 {}
    intention
    ·

    View Slide

  212. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    value
    Y=0 T0 {}
    intention
    ·
    “A transaction called T1 wrote this and also wrote to Y”

    View Slide

  213. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    value
    Y=0 T0 {}
    intention
    ·
    “A transaction called T1 wrote this and also wrote to Y”

    View Slide

  214. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    Where is T1’s write to Y?
    value
    Y=0 T0 {}
    intention
    ·

    View Slide

  215. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    Where is T1’s write to Y?
    value
    Y=0 T0 {}
    intention
    ·

    View Slide

  216. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    Where is T1’s write to Y?
    value
    Y=0 T0 {}
    intention
    ·

    View Slide

  217. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    X = ?
    R
    R
    X = 1
    W
    Y = 1
    W
    X = 1
    Y = 0
    Where is T1’s write to Y?
    value
    Y=0 T0 {}
    intention
    ·
    via
    multi-versioning,
    ready bit

    View Slide

  218. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    X = 1
    W
    Y = 1
    W
    value
    X=1 T1 {Y}
    intention
    ·
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    value
    Y=0 T0 {}
    intention
    ·

    View Slide

  219. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    via
    multi-versioning,
    ready bit

    View Slide

  220. Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    via
    multi-versioning,
    ready bit

    View Slide

  221. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    1.) Place write on each server.
    via
    multi-versioning,
    ready bit

    View Slide

  222. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    1.) Place write on each server.
    2.) Set ready bit on each
    write on server.
    via
    multi-versioning,
    ready bit

    View Slide

  223. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    1.) Place write on each server.
    2.) Set ready bit on each
    write on server.
    via
    multi-versioning,
    ready bit
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  224. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  225. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  226. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    X = 1
    W
    Y = 1
    W
    ready
    ready
    X = ?
    R
    Y = ?
    R
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  227. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  228. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  229. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    X = 1
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  230. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    X = 1
    Y = 0
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  231. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    2.) Fetch any missing writes
    using metadata.
    X = 1
    Y = 0
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  232. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    2.) Fetch any missing writes
    using metadata.
    X = 1
    Y = 0
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  233. Y=1 T1 {X}
    ·
    X=1 T1 {Y}
    ·
    Atomic Visibility via RAMP Transactions
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning
    value intention
    X=0 T0 {}
    · value intention
    Y=0 T0 {}
    ·
    ready
    ready
    X = ?
    R
    Y = ?
    R
    1.) Fetch “highest” ready versions.
    2.) Fetch any missing writes
    using metadata.
    X = 1
    Y = 0
    Y = 1
    Ready bit invariant: if ready bit is set, all writes in transaction
    are present on their respective servers

    View Slide

  234. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details

    View Slide

  235. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details

    View Slide

  236. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Ensures that readers
    never have to wait

    View Slide

  237. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Ensures that readers
    never have to wait

    View Slide

  238. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Ensures that readers
    never have to wait

    View Slide

  239. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Ensures that readers
    never have to wait
    2nd RTT for repair, in
    the event of a race

    View Slide

  240. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Ensures that readers
    never have to wait
    2nd RTT for repair, in
    the event of a race

    View Slide

  241. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details

    View Slide

  242. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Transaction IDs: sequence number and client ID
    » Also use to order overwrites!

    View Slide

  243. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Garbage collection of old versions:
    » Set timeout (TTL) for overwritten versions
    » Limit read transaction duration to TTL
    Transaction IDs: sequence number and client ID
    » Also use to order overwrites!

    View Slide

  244. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details

    View Slide

  245. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details

    View Slide

  246. Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    2 1 2 O(txn len)
    write set summary
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Details
    Can we use less
    metadata for intent?

    View Slide

  247. Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(1)
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit
    RAMP Variants

    View Slide

  248. RAMP Variants
    Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(1)
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit

    View Slide

  249. RAMP Variants
    Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(1)
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit

    View Slide

  250. RAMP Variants
    Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(1)
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    Always attempt to repair…
    …no metadata needed!
    via
    multi-versioning,
    ready bit

    View Slide

  251. RAMP Variants
    Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(B(ε))
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    via
    multi-versioning,
    ready bit

    View Slide

  252. RAMP Variants
    Algorithm Write RTT READ RTT
    (best case)
    READ RTT
    (worst case) METADATA
    RAMP-Fast 2 1 2 O(txn len)
    write set
    summary
    RAMP-Small 2 2 2 O(1)
    timestamp
    RAMP-Hybrid 2 1+ε 2 O(B(ε))
    Bloom filter
    REPAIR
    ATOMICITY
    DETECT
    RACES
    via intention
    metadata
    Bloom filter summarizes intent
    False positives: extra read RTTs
    via
    multi-versioning,
    ready bit

    View Slide

  253. SYSTEM KNOWS SEMANTICS
    㱺 CLIENTS CAN COOPERATE
    WITHOUT WAITING FOR EACH OTHER
    RAMP Overview

    View Slide

  254. SYSTEM KNOWS SEMANTICS
    㱺 CLIENTS CAN COOPERATE
    WITHOUT WAITING FOR EACH OTHER
    KEY IDEA:
    DETECT
    RACES
    Storing intention in metadata allows readers
    to check for missing writes
    RAMP Overview

    View Slide

  255. SYSTEM KNOWS SEMANTICS
    㱺 CLIENTS CAN COOPERATE
    WITHOUT WAITING FOR EACH OTHER
    KEY IDEA:
    DETECT
    RACES
    Storing intention in metadata allows readers
    to check for missing writes
    KEY IDEA:
    REPAIR
    ATOMICITY Transactions “hide” writes until others can
    reliably complete them (ready bit)
    RAMP Overview

    View Slide

  256. SYSTEM KNOWS SEMANTICS
    㱺 CLIENTS CAN COOPERATE
    WITHOUT WAITING FOR EACH OTHER
    KEY IDEA:
    DETECT
    RACES
    Storing intention in metadata allows readers
    to check for missing writes
    KEY IDEA:
    REPAIR
    ATOMICITY Transactions “hide” writes until others can
    reliably complete them (ready bit)
    coordination free: transactions do not wait for
    any others to complete
    RAMP Overview

    View Slide

  257. RAMP Evaluation

    View Slide

  258. RAMP Evaluation

    View Slide

  259. RAMP Evaluation
    1. What is the overhead of the RAMP protocols?

    View Slide

  260. RAMP Evaluation
    1. What is the overhead of the RAMP protocols?
    2. What is the benefit of coordination-free execution?

    View Slide

  261. RAMP Evaluation
    1. What is the overhead of the RAMP protocols?
    2. What is the benefit of coordination-free execution?
    3. How do the RAMP protocols scale?

    View Slide

  262. RAMP Evaluation
    evaluated on Amazon EC2 cr1.8xlarge servers
    (1-100 servers; default: 5)
    1. What is the overhead of the RAMP protocols?
    2. What is the benefit of coordination-free execution?
    3. How do the RAMP protocols scale?

    View Slide

  263. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)

    View Slide

  264. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control

    View Slide

  265. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    Doesn’t enforce
    atomic visibility

    View Slide

  266. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL

    View Slide

  267. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only

    View Slide

  268. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only
    RAMP-F RAMP-S
    RAMP-Fast

    View Slide

  269. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only
    RAMP-F RAMP-S
    RAMP-Fast
    Within 5% of
    baseline

    View Slide

  270. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only
    RAMP-F RAMP-S
    RAMP-Fast
    RAMP-F RAMP-S RAMP-H
    RAMP-Small

    View Slide

  271. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only
    RAMP-F RAMP-S
    RAMP-Fast
    RAMP-F RAMP-S RAMP-H
    RAMP-Small
    Always needs
    2RTT reads

    View Slide

  272. RAMP-F RAMP-S RAMP-H NWNR
    RAMP-Hybrid
    YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    0 2000 4000 6000 8000 10000
    Concurrent Clients
    0
    30K
    60K
    90K
    120K
    150K
    180K
    Throughput (txn/s)
    RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    LWSR LWLR E-PCI
    Serializable 2PL
    NWNR LWNR LWSR LWLR E-PCI
    Write Locks Only
    RAMP-F RAMP-S
    RAMP-Fast
    RAMP-F RAMP-S RAMP-H
    RAMP-Small

    View Slide

  273. YCSB: uniform access, 1M items, 4 items/txn, 95% reads
    0 25 50 75 100
    Number of Servers
    0
    2M
    4M
    6M
    8M
    Throughput (ops/s)

    View Slide

  274. RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control
    YCSB: uniform access, 1M items, 4 items/txn, 95% reads
    0 25 50 75 100
    Number of Servers
    0
    2M
    4M
    6M
    8M
    Throughput (ops/s)

    View Slide

  275. RAMP-H NWNR LWNR LWSR LWLR E-PCI
    No Concurrency Control RAMP-F RAMP-S
    RAMP-Fast
    RAMP-F RAMP-S RAMP-H
    RAMP-Small
    RAMP-F RAMP-S RAMP-H NWNR
    RAMP-Hybrid
    YCSB: uniform access, 1M items, 4 items/txn, 95% reads
    0 25 50 75 100
    Number of Servers
    0
    2M
    4M
    6M
    8M
    Throughput (ops/s)

    View Slide

  276. “accept
    friend
    request”
    “update
    index
    entry”
    RAMP
    TRANSACTION
    RAMP
    TRANSACTION
    ATOMIC VISIBILITY
    write
    write
    read
    write
    read
    write
    read
    read
    read
    read
    read
    write
    write
    write
    read

    View Slide

  277. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  278. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  279. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  280. write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read
    WHAT THE APPLICATION SAYS
    my billing
    application
    is “correct”
    my new
    social app
    “does the
    right thing”

    View Slide

  281. View Slide

  282. Database users express
    correctness criteria
    via database constraints

    View Slide

  283. “usernames should be unique”
    “account balances should remain positive”
    “there should only be one administrator”
    Database users express
    correctness criteria
    via database constraints

    View Slide

  284. Constraint Operation
    Equality, Inequality Any
    Generate unique ID Any
    Specify unique ID Insert
    > Increment
    > Decrement
    < Decrement
    < Increment
    Foreign Key Insert
    Foreign Key Delete
    Secondary Indexing Any
    Materialized Views Any
    AUTO_INCREMENT Insert
    Typical database
    constraints and
    operations
    (SQL)

    View Slide

  285. View Slide

  286. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-
    mexican-sofa
    communityengine
    copycopter-
    server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig

    View Slide

  287. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    [SIGMOD 2015]

    View Slide

  288. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    259 total; avg. 0.13 per table
    [SIGMOD 2015]

    View Slide

  289. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    259 total; avg. 0.13 per table
    [SIGMOD 2015]

    View Slide

  290. CONSTRAINTS
    MORE COMMON
    37x
    adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    259 total; avg. 0.13 per table
    [SIGMOD 2015]

    View Slide

  291. write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read
    WHAT THE APPLICATION SAYS
    “no
    duplicate
    users”

    View Slide

  292. write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read
    WHAT THE APPLICATION SAYS
    “no
    duplicate
    users”
    TODAY:
    ENFORCEMENT
    VIA
    COORDINATION

    View Slide

  293. write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read
    WHAT THE APPLICATION SAYS
    “no
    duplicate
    users”
    CAN WE USE
    CONSTRAINTS
    TO
    AVOID
    COORDINATION?

    View Slide

  294. WHAT THE APPLICATION SAYS
    “no
    duplicate
    users”
    constraint
    WHAT THE DATABASE HEARS
    constraint
    constraint
    constraint
    constraint
    constraint
    constraint
    constraint
    “no
    duplicate
    users”
    CAN WE USE
    CONSTRAINTS
    TO
    AVOID
    COORDINATION?

    View Slide

  295. Key idea: Check if constraints can be violated by
    “merging” independent operations

    View Slide

  296. Key idea: Check if constraints can be violated by
    “merging” independent operations
    ICT: Invariant Confluence Test

    View Slide

  297. CONSTRAINT: User IDs are unique
    OPERATION: Add users
    MERGE: Set union
    Key idea: Check if constraints can be violated by
    “merging” independent operations
    ICT: Invariant Confluence Test

    View Slide

  298. CONSTRAINT: User IDs are unique
    OPERATION: Add users
    MERGE: Set union
    {{Stu,ID=1},
    {Ann,ID=1}}
    Constraint
    violated!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    Key idea: Check if constraints can be violated by
    “merging” independent operations
    ICT: Invariant Confluence Test

    View Slide

  299. Key idea: Check if constraints can be violated by
    “merging” independent operations
    CONSTRAINT: User IDs are positive
    OPERATION: Add users
    MERGE: Set union
    ICT: Invariant Confluence Test

    View Slide

  300. Key idea: Check if constraints can be violated by
    “merging” independent operations
    CONSTRAINT: User IDs are positive
    OPERATION: Add users
    MERGE: Set union
    {{Stu,ID=1},
    {Ann,ID=1}}
    Constraint
    holds!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    ICT: Invariant Confluence Test

    View Slide

  301. Key idea: Check if constraints can be violated by
    “merging” independent operations
    ICT: Invariant Confluence Test

    View Slide

  302. Key idea: Check if constraints can be violated by
    “merging” independent operations
    OUR CONTRIBUTION:
    [VLDB 2015]
    ICT: Invariant Confluence Test

    View Slide

  303. Key idea: Check if constraints can be violated by
    “merging” independent operations
    OUR CONTRIBUTION:
    Theorem. A globally I-valid system can execute a set of
    transactions T with coordination-freedom, transactional availability,
    and convergence if and only if T are I-confluent with respect to I.
    [VLDB 2015]
    ICT ⟺ safe, coordination-free execution possible
    ICT: Invariant Confluence Test

    View Slide

  304. Key idea: Check if constraints can be violated by
    “merging” independent operations
    OUR CONTRIBUTION:
    Generalizes classic partitioning-based indistinguishability arguments
    Theorem. A globally I-valid system can execute a set of
    transactions T with coordination-freedom, transactional availability,
    and convergence if and only if T are I-confluent with respect to I.
    [VLDB 2015]
    ICT ⟺ safe, coordination-free execution possible
    ICT: Invariant Confluence Test

    View Slide

  305. Constraint Operation OK?
    Equality, Inequality Any ???
    Generate unique ID Any ???
    Specify unique ID Insert ???
    > Increment ???
    > Decrement ???
    < Decrement ???
    < Increment ???
    Foreign Key Insert ???
    Foreign Key Delete ???
    Secondary Indexing Any ???
    Materialized Views Any ???
    AUTO_INCREMENT Insert ???
    Typical database
    constraints and
    operations
    (SQL)
    Under set merge

    View Slide

  306. Constraint Operation OK?
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    > Increment Y
    > Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y
    AUTO_INCREMENT Insert N [VLDB 2015]
    Typical database
    constraints and
    operations
    (SQL)
    Under set merge

    View Slide

  307. Constraint Operation OK?
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    > Increment Y
    > Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y
    AUTO_INCREMENT Insert N [VLDB 2015]
    Typical database
    constraints and
    operations
    (SQL)
    R
    A
    M
    P
    Under set merge

    View Slide

  308. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    259 total; avg. 0.13 per table
    [SIGMOD 2015]

    View Slide

  309. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    259 total; avg. 0.13 per table
    86.9% PASS ICT
    [SIGMOD 2015]

    View Slide

  310. View Slide

  311. TPC-C

    View Slide

  312. 14/16 CONSTRAINTS PASS ICT
    TPC-C

    View Slide

  313. 14/16 CONSTRAINTS PASS ICT
    TPC-C
    6-11x faster than
    ACID/serializability
    8 16 32 48 64
    Number of Warehouses
    40K
    100K
    600K
    Throughput (txns/s)
    Coordination-Avoiding Serializable (2PL)

    View Slide

  314. 14/16 CONSTRAINTS PASS ICT
    TPC-C
    scale to
    over 25x
    best listed result
    0 50 100 150 200
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    Total Throughput (txn/s)
    0 50 100 150 200
    Number of Servers
    0
    20K
    40K
    60K
    80K
    Throughput (txn/s/server)
    6-11x faster than
    ACID/serializability
    8 16 32 48 64
    Number of Warehouses
    40K
    100K
    600K
    Throughput (txns/s)
    Coordination-Avoiding Serializable (2PL)

    View Slide

  315. WHAT THE APPLICATION SAYS
    “no
    duplicate
    users”
    constraint
    WHAT THE DATABASE HEARS
    constraint
    constraint
    constraint
    constraint
    constraint
    constraint
    constraint
    “no
    duplicate
    users”
    CAN WE USE
    CONSTRAINTS
    TO
    AVOID
    COORDINATION?

    View Slide

  316. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  317. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  318. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  319. Key idea: Exploit statistical robustness in system designs

    View Slide

  320. PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    Key idea: Exploit statistical robustness in system designs

    View Slide

  321. PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs

    View Slide

  322. PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    ML task: Express algorithms via async iterator (e.g., ADMM)
    Bulk
    Async
    Parallel
    TIME
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs
    Break dataflow
    barriers using new
    iterator model

    View Slide

  323. VELOX: FAST ONLINE PREDICTIONS
    [CIDR 2015]
    PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    ML task: Express algorithms via async iterator (e.g., ADMM)
    Bulk
    Async
    Parallel
    TIME
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs
    Break dataflow
    barriers using new
    iterator model

    View Slide

  324. VELOX: FAST ONLINE PREDICTIONS
    [CIDR 2015]
    Fast
    incremental
    personalization
    Batch
    retrain
    shared
    features
    PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    ML task: Express algorithms via async iterator (e.g., ADMM)
    Bulk
    Async
    Parallel
    TIME
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs
    Break dataflow
    barriers using new
    iterator model

    View Slide

  325. VELOX: FAST ONLINE PREDICTIONS
    [CIDR 2015]
    Fast
    incremental
    personalization
    Batch
    retrain
    shared
    features
    PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    ML task: Express algorithms via async iterator (e.g., ADMM)
    Bulk
    Async
    Parallel
    TIME
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs
    Prioritize model
    maintenance by
    robustness
    Break dataflow
    barriers using new
    iterator model

    View Slide

  326. VELOX: FAST ONLINE PREDICTIONS
    [CIDR 2015]
    Fast
    incremental
    personalization
    Batch
    retrain
    shared
    features
    PLASMA: ASYNCHRONOUS LEARNING
    [Ongoing]
    ML task: Express algorithms via async iterator (e.g., ADMM)
    Bulk
    Async
    Parallel
    TIME
    TIME
    Bulk
    Synch
    Parallel
    Key idea: Exploit statistical robustness in system designs
    Prioritize model
    maintenance by
    robustness
    ML task: Split models according to robustness
    Break dataflow
    barriers using new
    iterator model

    View Slide

  327. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  328. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  329. DESIGN DATABASE SYSTEMS
    THAT EXPLOIT SEMANTICS OF
    HIGH-VALUE USE CASES
    MY APPROACH:
    Study practical database use cases
    Derive principles and algorithms
    Build systems to realize the benefits

    View Slide

  330. View Slide

  331. PBS: Integrated into Cassandra 1.2 release
    + recent extensions at a major Internet company

    View Slide

  332. PBS: Integrated into Cassandra 1.2 release
    RAMP: Proposed feature in Cassandra 3.0
    (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant
    + recent extensions at a major Internet company

    View Slide

  333. PBS: Integrated into Cassandra 1.2 release
    RAMP: Proposed feature in Cassandra 3.0
    (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant
    + recent extensions at a major Internet company
    HAT Isolation: part of [email protected]’s Hermitage testing suite

    View Slide

  334. PBS: Integrated into Cassandra 1.2 release
    RAMP: Proposed feature in Cassandra 3.0
    (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant
    + recent extensions at a major Internet company
    HAT Isolation: part of [email protected]’s Hermitage testing suite
    Active dialogue with developer, NoSQL community
    via invited talks, blogging, social media

    View Slide

  335. Current Practice
    PBS VLDB12, SIGMOD13, VLDBJ14, CACM14
    EC Today CACM/Queue13
    Consistency without Borders SoCC13
    Network Partitions CACM/Queue14
    Feral Concurrency Control SIGMOD15
    Principles
    I-Confluence VLDB15
    HATs HotOS13, VLDB14
    Explicit Causality SoCC12
    Systems
    Bolt-On SIGMOD13
    RAMP + Indexing SIGMOD14
    Velox CIDR15
    Plasma + BAP Ongoing
    MY WORK:
    COORDINATION AVOIDANCE

    View Slide

  336. Current Practice
    PBS VLDB12, SIGMOD13, VLDBJ14, CACM14
    EC Today CACM/Queue13
    Consistency without Borders SoCC13
    Network Partitions CACM/Queue14
    Feral Concurrency Control SIGMOD15
    Principles
    I-Confluence VLDB15
    HATs HotOS13, VLDB14
    Explicit Causality SoCC12
    Systems
    Bolt-On SIGMOD13
    RAMP + Indexing SIGMOD14
    Velox CIDR15
    Plasma + BAP Ongoing
    MY WORK:
    COORDINATION AVOIDANCE

    View Slide

  337. View Slide

  338. FUTURE WORK

    View Slide

  339. FUTURE WORK
    Automatically coordinated applications

    View Slide

  340. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis

    View Slide

  341. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution

    View Slide

  342. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning

    View Slide

  343. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning
    View materialization and selection for model maintenance

    View Slide

  344. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning
    View materialization and selection for model maintenance
    Bounded divergence control for coordinating learners

    View Slide

  345. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning
    View materialization and selection for model maintenance
    Bounded divergence control for coordinating learners
    Next-Generation Data Applications

    View Slide

  346. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning
    View materialization and selection for model maintenance
    Bounded divergence control for coordinating learners
    Next-Generation Data Applications
    Next 10-100x growth in data volume due to sensors, apps

    View Slide

  347. FUTURE WORK
    Automatically coordinated applications
    Bespoke analysis and coordination synthesis
    “Query optimization” for transaction execution
    DB meets “Big Data” Learning
    View materialization and selection for model maintenance
    Bounded divergence control for coordinating learners
    Next-Generation Data Applications
    Next 10-100x growth in data volume due to sensors, apps
    New interfaces for increased coordination costs, heterogeneity

    View Slide

  348. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”
    write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read

    View Slide

  349. Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  350. Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    Joint work with Ali Ghodsi, Joe Hellerstein,
    Ion Stoica, Mike Franklin, Michael Jordan,
    Alan Fekete, Dan Crankshaw, Shivaram
    Venkataraman, Neil Conway, Peter Alvaro,
    Aaron Davidson, Joey Gonzalez, Kyle Kingsbury,
    Haoyuan Li, and Zhao Zhang

    View Slide

  351. Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    Joint work with Ali Ghodsi, Joe Hellerstein,
    Ion Stoica, Mike Franklin, Michael Jordan,
    Alan Fekete, Dan Crankshaw, Shivaram
    Venkataraman, Neil Conway, Peter Alvaro,
    Aaron Davidson, Joey Gonzalez, Kyle Kingsbury,
    Haoyuan Li, and Zhao Zhang

    View Slide

  352. Many illustrations by the Noun Project (CC-Attribution):
    surprised by Julian Derveaux
    world by Wayne Tyler Sall
    database by Austin Condiff
    earth by Martin Vanco
    Woman by Simon Child
    Man by Simon Child
    Doctor by Simon Child
    David-Hockney by Simon Child
    Server by Simon Child
    clock by christoph robausch

    View Slide