Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coordination Avoidance In Distributed Databases

pbailis
January 01, 2015

Coordination Avoidance In Distributed Databases

Job talk from early 2015

The rise of Internet-scale geo-replicated services has led to considerable upheaval in the design of modern data management systems. Namely, given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., “NoSQL”) has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price?

In this talk, I will discuss the potential for coordination avoidance — the use of as little coordination as possible while ensuring application integrity — in several modern data-intensive domains. Specifically, I will demonstrate how to leverage the semantic requirements of applications in data serving, transaction processing, and statistical analytics to enable more efficient distributed algorithms and system designs. The prototype systems I have built demonstrate order-of-magnitude speedups compared to their traditional, coordinated counterparts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and asynchronous convex optimization. I will also discuss our experiences studying and optimizing a range of open source applications and systems, which exhibit similar results.

pbailis

January 01, 2015
Tweet

More Decks by pbailis

Other Decks in Programming

Transcript

  1. SCALE Billion-user Internet services 3B Internet users in 2014 2.3B

    Mobile broadband users DATA TODAY: UNPRECEDENTED Ericsson Mobility Report, UN International Telecommunication Union, Facebook, Google, NSA,
  2. SCALE VOLUME Billion-user Internet services 3B Internet users in 2014

    2.3B Mobile broadband users Facebook RocksDB: 9B ops/sec Google BigTable: 600M ops/sec LinkedIn Kafka: 2.5M ops/sec DATA TODAY: UNPRECEDENTED Ericsson Mobility Report, UN International Telecommunication Union, Facebook, Google, NSA, @RocksDB, @AKPurtell, Martin Kleppmann
  3. SCALE VOLUME INTERACTIVITY Billion-user Internet services 3B Internet users in

    2014 2.3B Mobile broadband users Facebook RocksDB: 9B ops/sec Google BigTable: 600M ops/sec LinkedIn Kafka: 2.5M ops/sec Impatient users want low latency Always-on responsiveness Personalized user experiences DATA TODAY: UNPRECEDENTED Ericsson Mobility Report, UN International Telecommunication Union, Facebook, Google, NSA, @RocksDB, @AKPurtell, Martin Kleppmann
  4. How should we design database systems that enable applications to

    scale? “post on timeline” “accept friend request”
  5. serializability: equivalence to some serial execution r(x)=0 w(y←1) r(y)=0 very

    general! …but restricts concurrency CONCURRENT EXECUTION
  6. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency CONCURRENT EXECUTION
  7. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 CONCURRENT EXECUTION
  8. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 CONCURRENT EXECUTION
  9. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 CONCURRENT EXECUTION
  10. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION
  11. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION
  12. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION
  13. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION IS NOT SERIALIZABLE!
  14. serializability: equivalence to some serial execution r(x)=0 w(x←1) w(y←1) r(y)=0

    very general! …but restricts concurrency transactions cannot make progress independently Serializability requires Coordination Should have r(y)!1 r(y)=0 w(x←1) 2 r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION IS NOT SERIALIZABLE!
  15. transactions cannot make progress independently Serializability requires Coordination Two-Phase Locking

    Optimistic Concurrency Control Pre-Scheduling Multi-Version Concurrency Control
  16. transactions cannot make progress independently Serializability requires Coordination Two-Phase Locking

    Optimistic Concurrency Control Pre-Scheduling Multi-Version Concurrency Control Blocking Waiting Aborts
  17. transactions cannot make progress independently Serializability requires Coordination Two-Phase Locking

    Optimistic Concurrency Control Pre-Scheduling Multi-Version Concurrency Control Blocking Waiting Aborts Costs of Coordination Between Concurrent Transactions
  18. 1. Decreased performance transactions cannot make progress independently Serializability requires

    Coordination Two-Phase Locking Optimistic Concurrency Control Pre-Scheduling Multi-Version Concurrency Control Blocking Waiting Aborts Costs of Coordination Between Concurrent Transactions
  19. 2 3 4 5 6 7 8 Number of Servers

    in Transaction 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) Number of Servers in Transaction Local datacenter (Amazon EC2) Based on [Bobtail, Xu et al., NSDI 13] For conflicting transactions
  20. 2 3 4 5 6 7 8 Number of Servers

    in Transaction 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) Number of Servers in Transaction Local datacenter (Amazon EC2) Based on [Bobtail, Xu et al., NSDI 13] For conflicting transactions
  21. 2 3 4 5 6 7 8 Number of Servers

    in Transaction 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) Number of Servers in Transaction +OR +CA +IR +SP +TO +SI +SY Participating Datacenters (+VA) 2 4 6 8 10 12 Maximum Throughput (txn/s) Local datacenter (Amazon EC2) Based on [Bobtail, Xu et al., NSDI 13] Multi-datacenter (Amazon EC2) Based on [HAT, Bailis et al., VLDB 14] For conflicting transactions
  22. 2 3 4 5 6 7 8 Number of Servers

    in Transaction 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) Number of Servers in Transaction +OR +CA +IR +SP +TO +SI +SY Participating Datacenters (+VA) 2 4 6 8 10 12 Maximum Throughput (txn/s) Local datacenter (Amazon EC2) Based on [Bobtail, Xu et al., NSDI 13] Multi-datacenter (Amazon EC2) Based on [HAT, Bailis et al., VLDB 14] For conflicting transactions
  23. 2 3 4 5 6 7 8 Number of Servers

    in Transaction 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) Number of Servers in Transaction +OR +CA +IR +SP +TO +SI +SY Participating Datacenters (+VA) 2 4 6 8 10 12 Maximum Throughput (txn/s) Local datacenter (Amazon EC2) Based on [Bobtail, Xu et al., NSDI 13] Multi-datacenter (Amazon EC2) Based on [HAT, Bailis et al., VLDB 14] For conflicting transactions
  24. 1. Decreased performance » due to waiting, communication delays, aborts

    » exacerbated in distributed environment! 2. Decreased availability during failures transactions cannot make progress independently Serializability requires Coordination Costs of Coordination Between Concurrent Transactions
  25. 1. Decreased performance » due to waiting, communication delays, aborts

    » exacerbated in distributed environment! 2. Decreased availability during failures transactions cannot make progress independently Serializability requires Coordination Costs of Coordination Between Concurrent Transactions
  26. 1. Decreased performance » due to waiting, communication delays, aborts

    » exacerbated in distributed environment! 2. Decreased availability during failures transactions cannot make progress independently Serializability requires Coordination Costs of Coordination Between Concurrent Transactions
  27. 1. Decreased performance » due to waiting, communication delays, aborts

    » exacerbated in distributed environment! 2. Decreased availability during failures transactions cannot make progress independently Serializability requires Coordination Costs of Coordination Between Concurrent Transactions
  28. 1. Decreased performance » due to waiting, communication delays, aborts

    » exacerbated in distributed environment! 2. Decreased availability during failures transactions cannot make progress independently Serializability requires Coordination Well-known for decades; cf. “CAP” Costs of Coordination Between Concurrent Transactions
  29. Eventual Consistency “if no new updates are made to the

    [database], eventually all accesses will return the last updated value[s]” — Werner Vogels, Amazon CTO
  30. Eventual Consistency “if no new updates are made to the

    [database], eventually all accesses will return the last updated value[s]” — Werner Vogels, Amazon CTO
  31. Eventual Consistency “if no new updates are made to the

    [database], eventually all accesses will return the last updated value[s]” — Werner Vogels, Amazon CTO
  32. Eventual Consistency “if no new updates are made to the

    [database], eventually all accesses will return the last updated value[s]” — Werner Vogels, Amazon CTO
  33. Eventual Consistency “if no new updates are made to the

    [database], eventually all accesses will return the last updated value[s]” — Werner Vogels, Amazon CTO provides no safety: what happens in the meantime?
  34. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS)
  35. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior
  36. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior » Key finding: frequently “correct” results…
  37. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior » Key finding: frequently “correct” results… PBS: Voldemort Database at LinkedIn 99% of reads return the last update 23ms after write
  38. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior » Key finding: frequently “correct” results… PBS: Voldemort Database at LinkedIn 99% of reads return the last update 23ms after write 32-90% decrease in 99.9th percentile latency
  39. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior » Key finding: frequently “correct” results… PBS: Voldemort Database at LinkedIn 99% of reads return the last update 23ms after write 32-90% decrease in 99.9th percentile latency
  40. [VLDB 2012, VLDB Journal 2014 “Best of VLDB 2012”, SIGMOD

    2013 (Demo), CACM Research Highlight] Probabilistically Bounded Staleness (PBS) » Monte Carlo analysis of protocol behavior » Key finding: frequently “correct” results… PBS: Voldemort Database at LinkedIn 99% of reads return the last update 23ms after write 32-90% decrease in 99.9th percentile latency …BUT NO GUARANTEES! 㱺 DIFFICULT TO PROGRAM
  41. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 MY WORK:
  42. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 MY WORK:
  43. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

    write read write read write write read write write write read write WHAT THE DATABASE HEARS read read read read read read
  44. DESIGN DATABASE SYSTEMS THAT EXPLOIT SEMANTICS OF HIGH-VALUE USE CASES

    MY APPROACH: Study practical database use cases
  45. DESIGN DATABASE SYSTEMS THAT EXPLOIT SEMANTICS OF HIGH-VALUE USE CASES

    MY APPROACH: Study practical database use cases Derive principles and algorithms
  46. DESIGN DATABASE SYSTEMS THAT EXPLOIT SEMANTICS OF HIGH-VALUE USE CASES

    MY APPROACH: Study practical database use cases Derive principles and algorithms Build systems to realize the benefits
  47. DESIGN DATABASE SYSTEMS THAT EXPLOIT SEMANTICS OF HIGH-VALUE USE CASES

    MY APPROACH: Study practical database use cases Derive principles and algorithms Build systems to realize the benefits
  48. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  49. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  50. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  51. Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency

    COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  52. Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED

    GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  53. Atomic Visibility SIGMOD14 Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13

    Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  54. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  55. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION
  56. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION Data Serving and Transactions
  57. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Weak Isolation HotOS13,

    VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION Data Serving and Transactions Model Prediction and Training CIDR15, TBA Analytics
  58. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and

    Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  59. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and

    Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14
  60. Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and

    Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO SAFETY COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  61. (Abridged) Related Work » Semantics-based concurrency control: esp. commutativity and

    CALM analysis, laws of order » Available storage systems: optimistic replication, causal memory, CRDTs, eventually consistent transactions » Distributed computing: CAP, FLP, NBAC, quorums
  62. (Abridged) Related Work » Semantics-based concurrency control: esp. commutativity and

    CALM analysis, laws of order » Available storage systems: optimistic replication, causal memory, CRDTs, eventually consistent transactions » Distributed computing: CAP, FLP, NBAC, quorums » Here: focus on necessary coordination for common, modern data-intensive apps
  63. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  64. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE 1
  65. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE 1 2
  66. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE 1 2 3
  67. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE 1
  68. Social Graph 1 2 3 4 5 6 User Facebook

    1.2B+ vertices 420B+ edges
  69. Social Graph 1 2 3 4 5 6 2, 3,

    5 User Adjacency List 1, 3, 5 1, 5, 6 6 1, 2, 3, 6 3, 4, 5 Facebook 1.2B+ vertices 420B+ edges
  70. Social Graph 1 2, 3, 5 User Adjacency List 2

    1, 3, 5 3 1, 5, 6 4 6 5 1, 2, 3, 6 6 3, 4, 5 1.2B+ vertices 420B+ edges Facebook
  71. 1 2, 3, 5 6 3, 4, 5 ,6 ,1

    To preserve graph, should observe either: » Both links » Neither link
  72. 1 2, 3, 5 6 3, 4, 5 ,6 ,1

    To preserve graph, should observe either: » Both links » Neither link Atomic Visibility
  73. Atomic Visibility X = 1 WRITE Y = 1 WRITE

    either all or none of each transaction’s updates should be visible to other transactions
  74. Atomic Visibility OR X = 1 READ Y = 1

    READ READ X = READ Y = X = 1 WRITE Y = 1 WRITE either all or none of each transaction’s updates should be visible to other transactions
  75. Atomic Visibility OR X = 1 READ Y = 1

    READ READ X = READ Y = X = 1 WRITE Y = 1 WRITE either all or none of each transaction’s updates should be visible to other transactions
  76. Atomic Visibility OR X = 1 READ Y = 1

    READ READ X = READ Y = either all or none of each transaction’s updates should be visible to other transactions
  77. BUT NOT Atomic Visibility OR X = 1 READ Y

    = 1 READ READ X = READ Y = either all or none of each transaction’s updates should be visible to other transactions OR X = 1 READ Y = 1 READ READ X = READ Y =
  78. BUT NOT Atomic Visibility OR X = 1 READ Y

    = 1 READ READ X = READ Y = either all or none of each transaction’s updates should be visible to other transactions OR X = 1 READ Y = 1 READ READ X = READ Y = “FRACTURED READS”
  79. r(x)=0 w(x←1) w(y←1) r(y)=0 Should have r(y)!1 r(y)=0 w(x←1) 2

    r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION IS NOT SERIALIZABLE! Atomic Visibility is not serializability!
  80. r(x)=0 w(x←1) w(y←1) r(y)=0 Should have r(y)!1 r(y)=0 w(x←1) 2

    r(x)=0 w(y←1) 1 Should have r(x)!1 r(y)=0 w(x←1) 1 r(x)=0 w(y←1) 2 CONCURRENT EXECUTION IS NOT SERIALIZABLE! Atomic Visibility is not serializability! …but respects Atomic Visibility!
  81. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared
  82. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  83. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  84. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  85. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  86. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  87. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared WANT TO PREVENT
  88. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared Require coordination to prevent! [VLDB 2014] WANT TO PREVENT
  89. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared Require coordination to prevent! [VLDB 2014] WANT TO PREVENT
  90. Fractured Reads Item Anti- Dependency Cycles Anti-Dependency Cycles Serializability Prevents

    Prevents Prevents Snapshot Isolation Prevents Prevents Doesn’t prevent Atomic Visibility via Read Atomic Prevents Doesn’t prevent Doesn’t prevent Eventual Consistency Doesn’t prevent Doesn’t prevent Doesn’t prevent Atomic Visibility compared Require coordination to prevent! [VLDB 2014] WANT TO PREVENT
  91. Atomic Visibility is sufficient to correctly maintain: referential integrity secondary

    indexes materialized views despite being weaker than serializability social graph structure
  92. Atomic Visibility via Locking X = 1 R Y =

    1 R X = 1 W Y = 1 W X=1 Y=1
  93. Atomic Visibility via Locking X = ? R X =

    1 W Y = 1 W Y=0 Y = ? R X=1
  94. Atomic Visibility via Locking X = ? R X =

    1 W Y = 1 W Y=0 Y = ? R X=1 Server 1001 Server 1002
  95. Atomic Visibility via Locking X = ? R X =

    1 W Y = 1 W Y=0 Y = ? R X=1 Server 1001 Server 1002
  96. Atomic Visibility via Locking X = ? R X =

    1 W Y = 1 W Y=0 Y = ? R X=1 Server 1001 Server 1002
  97. Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y) R(X) R(Y)

    W(X) ATOMICITY VIOLATED! T I M E OPTIMISTIC VALIDATE ATOMICITY
  98. Y X LOCKING VIOLATED? ABORT W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) ATOMICITY VIOLATED! T I M E OPTIMISTIC VALIDATE ATOMICITY
  99. Y X LOCKING VIOLATED? ABORT W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) ATOMICITY VIOLATED! T I M E OPTIMISTIC VALIDATE ATOMICITY
  100. Y X LOCKING VIOLATED? ABORT W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) ATOMICITY VIOLATED! T I M E OPTIMISTIC VALIDATE ATOMICITY BOTH RELY ON COORDINATION
  101. Facebook Tao Google Megastore LinkedIn Espresso Due to coordination overheads…

    Amazon DynamoDB Apache Cassandra Basho Riak Yahoo! PNUTS Google App Engine
  102. Facebook Tao Google Megastore LinkedIn Espresso Due to coordination overheads…

    Amazon DynamoDB Apache Cassandra Basho Riak Yahoo! PNUTS …consciously choose to violate atomic visibility Google App Engine
  103. Facebook Tao Google Megastore LinkedIn Espresso Due to coordination overheads…

    Amazon DynamoDB Apache Cassandra Basho Riak Yahoo! PNUTS …consciously choose to violate atomic visibility “[Tao] explicitly favors efficiency and availability over consistency…[an edge] may exist without an inverse; these hanging associations are scheduled for repair by an asynchronous job.” Google App Engine
  104. Our contributions: to maintain social graph structure referential integrity [SIGMOD

    2014, selected for “Best of SIGMOD” ACM TODS] secondary indexes materialized views
  105. Our contributions: to maintain 1. A new model: atomic visibility

    (via Read Atomic isolation) is (provably) sufficient social graph structure referential integrity [SIGMOD 2014, selected for “Best of SIGMOD” ACM TODS] secondary indexes materialized views
  106. Our contributions: to maintain 1. A new model: atomic visibility

    (via Read Atomic isolation) is (provably) sufficient 2. Efficient protocols: RAMP transactions enforce atomic visibility without coordination social graph structure referential integrity [SIGMOD 2014, selected for “Best of SIGMOD” ACM TODS] secondary indexes materialized views
  107. WHAT THE APPLICATION SAYS “accept friend request” “update index entry”

    write write read write read write read read read read read write write read WHAT THE DATABASE HEARS read read read write read write
  108. “accept friend request” “update index entry” write write read write

    read write read read read read read write write write read
  109. “accept friend request” “update index entry” ATOMIC VISIBILITY write write

    read write read write read read read read read write write write read
  110. “accept friend request” “update index entry” RAMP TRANSACTION ATOMIC VISIBILITY

    write write read write read write read read read read read write write write read
  111. “accept friend request” “update index entry” RAMP TRANSACTION RAMP TRANSACTION

    ATOMIC VISIBILITY write write read write read write read read read read read write write write read
  112. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC T I M E VIOLATED? ABORT VALIDATE ATOMICITY
  113. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS T I M E VIOLATED? ABORT VALIDATE ATOMICITY
  114. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS T I M E Without coordination, atomicity violations will (initially) occur! VIOLATED? ABORT VALIDATE ATOMICITY
  115. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS W(Y) R(X) R(Y) W(X) T I M E Without coordination, atomicity violations will (initially) occur! VIOLATED? ABORT VALIDATE ATOMICITY
  116. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS W(Y) R(X) R(Y) W(X) T I M E Without coordination, atomicity violations will (initially) occur! Don’t panic! Don’t abort! VIOLATED? ABORT VALIDATE ATOMICITY
  117. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS W(Y) R(X) R(Y) W(X) DETECT RACES T I M E Without coordination, atomicity violations will (initially) occur! Don’t panic! Don’t abort! VIOLATED? ABORT VALIDATE ATOMICITY
  118. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS W(Y) R(X) R(Y) W(X) REPAIR ATOMICITY DETECT RACES T I M E Without coordination, atomicity violations will (initially) occur! Don’t panic! Don’t abort! VIOLATED? ABORT VALIDATE ATOMICITY
  119. ATOMICITY VIOLATED! Y X LOCKING W(Y) R(X) R(Y) W(X) W(Y)

    R(X) R(Y) W(X) OPTIMISTIC RAMP TRANSACTIONS W(Y) R(X) R(Y) W(X) REPAIR ATOMICITY DETECT RACES R(Y) T I M E Without coordination, atomicity violations will (initially) occur! Don’t panic! Don’t abort! VIOLATED? ABORT VALIDATE ATOMICITY
  120. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002
  121. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002 X=1
  122. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002 X=1 X = ? R Y = ? R X = 1 Y = 0
  123. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002 X=1 X = ? R Y = ? R X = 1 Y = 0
  124. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002 X=1 X = ? R Y = ? R X = 1 Y = 0
  125. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 X=0 Y=0 Server 1002 X=1 X = ? R Y = ? R X = 1 Y = 0 via intention metadata
  126. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W Server 1001 Y=0 Server 1002 X=1 via intention metadata
  127. Y=0 T0 {} intention · Atomic Visibility via RAMP Transactions

    REPAIR ATOMICITY DETECT RACES X = 1 W Y = 1 W X=1 T1 {Y} intention · T0 intention · via intention metadata
  128. value Y=0 T0 {} intention · Atomic Visibility via RAMP

    Transactions REPAIR ATOMICITY DETECT RACES X = 1 W Y = 1 W value X=1 T1 {Y} intention · T0 intention · via intention metadata
  129. value Y=0 T0 {} intention · Atomic Visibility via RAMP

    Transactions REPAIR ATOMICITY DETECT RACES X = 1 W Y = 1 W value X=1 T1 {Y} intention · T0 intention · via intention metadata
  130. value Y=0 T0 {} intention · Atomic Visibility via RAMP

    Transactions REPAIR ATOMICITY DETECT RACES X = 1 W Y = 1 W value X=1 T1 {Y} intention · T0 intention · via intention metadata “A transaction called T1 wrote this and also wrote to Y”
  131. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W value X=1 T1 {Y} intention · value Y=0 T0 {} intention · via intention metadata
  132. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W value X=1 T1 {Y} intention · value Y=0 T0 {} intention · via intention metadata
  133. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W value X=1 T1 {Y} intention · value Y=0 T0 {} intention · via intention metadata X = ? R Y = ? R
  134. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R Y = ? R X = 1 W Y = 1 W value Y=0 T0 {} intention ·
  135. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 value Y=0 T0 {} intention · “A transaction called T1 wrote this and also wrote to Y”
  136. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 value Y=0 T0 {} intention · “A transaction called T1 wrote this and also wrote to Y”
  137. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 Where is T1’s write to Y? value Y=0 T0 {} intention ·
  138. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 Where is T1’s write to Y? value Y=0 T0 {} intention ·
  139. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 Where is T1’s write to Y? value Y=0 T0 {} intention ·
  140. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES value

    X=1 T1 {Y} intention · via intention metadata X = ? R R X = 1 W Y = 1 W X = 1 Y = 0 Where is T1’s write to Y? value Y=0 T0 {} intention · via multi-versioning, ready bit
  141. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES X

    = 1 W Y = 1 W value X=1 T1 {Y} intention · via intention metadata via multi-versioning, ready bit value Y=0 T0 {} intention ·
  142. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES via

    intention metadata value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W via multi-versioning, ready bit
  143. Atomic Visibility via RAMP Transactions REPAIR ATOMICITY DETECT RACES via

    intention metadata value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready via multi-versioning, ready bit
  144. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready 1.) Place write on each server. via multi-versioning, ready bit
  145. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready 1.) Place write on each server. 2.) Set ready bit on each write on server. via multi-versioning, ready bit
  146. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready 1.) Place write on each server. 2.) Set ready bit on each write on server. via multi-versioning, ready bit Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  147. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  148. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  149. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · X = 1 W Y = 1 W ready ready X = ? R Y = ? R Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  150. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  151. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  152. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. X = 1 Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  153. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. X = 1 Y = 0 Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  154. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. 2.) Fetch any missing writes using metadata. X = 1 Y = 0 Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  155. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. 2.) Fetch any missing writes using metadata. X = 1 Y = 0 Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  156. Y=1 T1 {X} · X=1 T1 {Y} · Atomic Visibility

    via RAMP Transactions REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning value intention X=0 T0 {} · value intention Y=0 T0 {} · ready ready X = ? R Y = ? R 1.) Fetch “highest” ready versions. 2.) Fetch any missing writes using metadata. X = 1 Y = 0 Y = 1 Ready bit invariant: if ready bit is set, all writes in transaction are present on their respective servers
  157. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details
  158. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details
  159. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Ensures that readers never have to wait
  160. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Ensures that readers never have to wait
  161. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Ensures that readers never have to wait
  162. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Ensures that readers never have to wait 2nd RTT for repair, in the event of a race
  163. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Ensures that readers never have to wait 2nd RTT for repair, in the event of a race
  164. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details
  165. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Transaction IDs: sequence number and client ID » Also use to order overwrites!
  166. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Garbage collection of old versions: » Set timeout (TTL) for overwritten versions » Limit read transaction duration to TTL Transaction IDs: sequence number and client ID » Also use to order overwrites!
  167. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details
  168. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details
  169. Write RTT READ RTT (best case) READ RTT (worst case)

    METADATA 2 1 2 O(txn len) write set summary REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Details Can we use less metadata for intent?
  170. Algorithm Write RTT READ RTT (best case) READ RTT (worst

    case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(1) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit RAMP Variants
  171. RAMP Variants Algorithm Write RTT READ RTT (best case) READ

    RTT (worst case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(1) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit
  172. RAMP Variants Algorithm Write RTT READ RTT (best case) READ

    RTT (worst case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(1) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit
  173. RAMP Variants Algorithm Write RTT READ RTT (best case) READ

    RTT (worst case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(1) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata Always attempt to repair… …no metadata needed! via multi-versioning, ready bit
  174. RAMP Variants Algorithm Write RTT READ RTT (best case) READ

    RTT (worst case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(B(ε)) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata via multi-versioning, ready bit
  175. RAMP Variants Algorithm Write RTT READ RTT (best case) READ

    RTT (worst case) METADATA RAMP-Fast 2 1 2 O(txn len) write set summary RAMP-Small 2 2 2 O(1) timestamp RAMP-Hybrid 2 1+ε 2 O(B(ε)) Bloom filter REPAIR ATOMICITY DETECT RACES via intention metadata Bloom filter summarizes intent False positives: extra read RTTs via multi-versioning, ready bit
  176. SYSTEM KNOWS SEMANTICS 㱺 CLIENTS CAN COOPERATE WITHOUT WAITING FOR

    EACH OTHER KEY IDEA: DETECT RACES Storing intention in metadata allows readers to check for missing writes RAMP Overview
  177. SYSTEM KNOWS SEMANTICS 㱺 CLIENTS CAN COOPERATE WITHOUT WAITING FOR

    EACH OTHER KEY IDEA: DETECT RACES Storing intention in metadata allows readers to check for missing writes KEY IDEA: REPAIR ATOMICITY Transactions “hide” writes until others can reliably complete them (ready bit) RAMP Overview
  178. SYSTEM KNOWS SEMANTICS 㱺 CLIENTS CAN COOPERATE WITHOUT WAITING FOR

    EACH OTHER KEY IDEA: DETECT RACES Storing intention in metadata allows readers to check for missing writes KEY IDEA: REPAIR ATOMICITY Transactions “hide” writes until others can reliably complete them (ready bit) coordination free: transactions do not wait for any others to complete RAMP Overview
  179. RAMP Evaluation 1. What is the overhead of the RAMP

    protocols? 2. What is the benefit of coordination-free execution?
  180. RAMP Evaluation 1. What is the overhead of the RAMP

    protocols? 2. What is the benefit of coordination-free execution? 3. How do the RAMP protocols scale?
  181. RAMP Evaluation evaluated on Amazon EC2 cr1.8xlarge servers (1-100 servers;

    default: 5) 1. What is the overhead of the RAMP protocols? 2. What is the benefit of coordination-free execution? 3. How do the RAMP protocols scale?
  182. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s)
  183. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control
  184. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control Doesn’t enforce atomic visibility
  185. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL
  186. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only
  187. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only RAMP-F RAMP-S RAMP-Fast
  188. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only RAMP-F RAMP-S RAMP-Fast Within 5% of baseline
  189. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only RAMP-F RAMP-S RAMP-Fast RAMP-F RAMP-S RAMP-H RAMP-Small
  190. YCSB: WorkloadA, 95% reads, 1M items, 4 items/txn 0 2000

    4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only RAMP-F RAMP-S RAMP-Fast RAMP-F RAMP-S RAMP-H RAMP-Small Always needs 2RTT reads
  191. RAMP-F RAMP-S RAMP-H NWNR RAMP-Hybrid YCSB: WorkloadA, 95% reads, 1M

    items, 4 items/txn 0 2000 4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) 0 2000 4000 6000 8000 10000 Concurrent Clients 0 30K 60K 90K 120K 150K 180K Throughput (txn/s) RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control LWSR LWLR E-PCI Serializable 2PL NWNR LWNR LWSR LWLR E-PCI Write Locks Only RAMP-F RAMP-S RAMP-Fast RAMP-F RAMP-S RAMP-H RAMP-Small
  192. YCSB: uniform access, 1M items, 4 items/txn, 95% reads 0

    25 50 75 100 Number of Servers 0 2M 4M 6M 8M Throughput (ops/s)
  193. RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control YCSB:

    uniform access, 1M items, 4 items/txn, 95% reads 0 25 50 75 100 Number of Servers 0 2M 4M 6M 8M Throughput (ops/s)
  194. RAMP-H NWNR LWNR LWSR LWLR E-PCI No Concurrency Control RAMP-F

    RAMP-S RAMP-Fast RAMP-F RAMP-S RAMP-H RAMP-Small RAMP-F RAMP-S RAMP-H NWNR RAMP-Hybrid YCSB: uniform access, 1M items, 4 items/txn, 95% reads 0 25 50 75 100 Number of Servers 0 2M 4M 6M 8M Throughput (ops/s)
  195. “accept friend request” “update index entry” RAMP TRANSACTION RAMP TRANSACTION

    ATOMIC VISIBILITY write write read write read write read read read read read write write write read
  196. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  197. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  198. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  199. write read write read write write read write write write

    read write WHAT THE DATABASE HEARS read read read read read read WHAT THE APPLICATION SAYS my billing application is “correct” my new social app “does the right thing”
  200. “usernames should be unique” “account balances should remain positive” “there

    should only be one administrator” Database users express correctness criteria via database constraints
  201. Constraint Operation Equality, Inequality Any Generate unique ID Any Specify

    unique ID Insert > Increment > Decrement < Decrement < Increment Foreign Key Insert Foreign Key Delete Secondary Indexing Any Materialized Views Any AUTO_INCREMENT Insert Typical database constraints and operations (SQL)
  202. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig
  203. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables [SIGMOD 2015]
  204. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table [SIGMOD 2015]
  205. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table [SIGMOD 2015]
  206. CONSTRAINTS MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy

    browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table [SIGMOD 2015]
  207. write read write read write write read write write write

    read write WHAT THE DATABASE HEARS read read read read read read WHAT THE APPLICATION SAYS “no duplicate users”
  208. write read write read write write read write write write

    read write WHAT THE DATABASE HEARS read read read read read read WHAT THE APPLICATION SAYS “no duplicate users” TODAY: ENFORCEMENT VIA COORDINATION
  209. write read write read write write read write write write

    read write WHAT THE DATABASE HEARS read read read read read read WHAT THE APPLICATION SAYS “no duplicate users” CAN WE USE CONSTRAINTS TO AVOID COORDINATION?
  210. WHAT THE APPLICATION SAYS “no duplicate users” constraint WHAT THE

    DATABASE HEARS constraint constraint constraint constraint constraint constraint constraint “no duplicate users” CAN WE USE CONSTRAINTS TO AVOID COORDINATION?
  211. Key idea: Check if constraints can be violated by “merging”

    independent operations ICT: Invariant Confluence Test
  212. CONSTRAINT: User IDs are unique OPERATION: Add users MERGE: Set

    union Key idea: Check if constraints can be violated by “merging” independent operations ICT: Invariant Confluence Test
  213. CONSTRAINT: User IDs are unique OPERATION: Add users MERGE: Set

    union {{Stu,ID=1}, {Ann,ID=1}} Constraint violated! {} MERGE add {Stu,ID=1} add {Ann,ID=1} Key idea: Check if constraints can be violated by “merging” independent operations ICT: Invariant Confluence Test
  214. Key idea: Check if constraints can be violated by “merging”

    independent operations CONSTRAINT: User IDs are positive OPERATION: Add users MERGE: Set union ICT: Invariant Confluence Test
  215. Key idea: Check if constraints can be violated by “merging”

    independent operations CONSTRAINT: User IDs are positive OPERATION: Add users MERGE: Set union {{Stu,ID=1}, {Ann,ID=1}} Constraint holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test
  216. Key idea: Check if constraints can be violated by “merging”

    independent operations ICT: Invariant Confluence Test
  217. Key idea: Check if constraints can be violated by “merging”

    independent operations OUR CONTRIBUTION: [VLDB 2015] ICT: Invariant Confluence Test
  218. Key idea: Check if constraints can be violated by “merging”

    independent operations OUR CONTRIBUTION: Theorem. A globally I-valid system can execute a set of transactions T with coordination-freedom, transactional availability, and convergence if and only if T are I-confluent with respect to I. [VLDB 2015] ICT ⟺ safe, coordination-free execution possible ICT: Invariant Confluence Test
  219. Key idea: Check if constraints can be violated by “merging”

    independent operations OUR CONTRIBUTION: Generalizes classic partitioning-based indistinguishability arguments Theorem. A globally I-valid system can execute a set of transactions T with coordination-freedom, transactional availability, and convergence if and only if T are I-confluent with respect to I. [VLDB 2015] ICT ⟺ safe, coordination-free execution possible ICT: Invariant Confluence Test
  220. Constraint Operation OK? Equality, Inequality Any ??? Generate unique ID

    Any ??? Specify unique ID Insert ??? > Increment ??? > Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Typical database constraints and operations (SQL) Under set merge
  221. Constraint Operation OK? Equality, Inequality Any Y Generate unique ID

    Any Y Specify unique ID Insert N > Increment Y > Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y AUTO_INCREMENT Insert N [VLDB 2015] Typical database constraints and operations (SQL) Under set merge
  222. Constraint Operation OK? Equality, Inequality Any Y Generate unique ID

    Any Y Specify unique ID Insert N > Increment Y > Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y AUTO_INCREMENT Insert N [VLDB 2015] Typical database constraints and operations (SQL) R A M P Under set merge
  223. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table [SIGMOD 2015]
  224. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table 86.9% PASS ICT [SIGMOD 2015]
  225. 14/16 CONSTRAINTS PASS ICT TPC-C 6-11x faster than ACID/serializability 8

    16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL)
  226. 14/16 CONSTRAINTS PASS ICT TPC-C scale to over 25x best

    listed result 0 50 100 150 200 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) 0 50 100 150 200 Number of Servers 0 20K 40K 60K 80K Throughput (txn/s/server) 6-11x faster than ACID/serializability 8 16 32 48 64 Number of Warehouses 40K 100K 600K Throughput (txns/s) Coordination-Avoiding Serializable (2PL)
  227. WHAT THE APPLICATION SAYS “no duplicate users” constraint WHAT THE

    DATABASE HEARS constraint constraint constraint constraint constraint constraint constraint “no duplicate users” CAN WE USE CONSTRAINTS TO AVOID COORDINATION?
  228. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  229. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  230. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  231. PLASMA: ASYNCHRONOUS LEARNING [Ongoing] TIME Bulk Synch Parallel Key idea:

    Exploit statistical robustness in system designs
  232. PLASMA: ASYNCHRONOUS LEARNING [Ongoing] ML task: Express algorithms via async

    iterator (e.g., ADMM) Bulk Async Parallel TIME TIME Bulk Synch Parallel Key idea: Exploit statistical robustness in system designs Break dataflow barriers using new iterator model
  233. VELOX: FAST ONLINE PREDICTIONS [CIDR 2015] PLASMA: ASYNCHRONOUS LEARNING [Ongoing]

    ML task: Express algorithms via async iterator (e.g., ADMM) Bulk Async Parallel TIME TIME Bulk Synch Parallel Key idea: Exploit statistical robustness in system designs Break dataflow barriers using new iterator model
  234. VELOX: FAST ONLINE PREDICTIONS [CIDR 2015] Fast incremental personalization Batch

    retrain shared features PLASMA: ASYNCHRONOUS LEARNING [Ongoing] ML task: Express algorithms via async iterator (e.g., ADMM) Bulk Async Parallel TIME TIME Bulk Synch Parallel Key idea: Exploit statistical robustness in system designs Break dataflow barriers using new iterator model
  235. VELOX: FAST ONLINE PREDICTIONS [CIDR 2015] Fast incremental personalization Batch

    retrain shared features PLASMA: ASYNCHRONOUS LEARNING [Ongoing] ML task: Express algorithms via async iterator (e.g., ADMM) Bulk Async Parallel TIME TIME Bulk Synch Parallel Key idea: Exploit statistical robustness in system designs Prioritize model maintenance by robustness Break dataflow barriers using new iterator model
  236. VELOX: FAST ONLINE PREDICTIONS [CIDR 2015] Fast incremental personalization Batch

    retrain shared features PLASMA: ASYNCHRONOUS LEARNING [Ongoing] ML task: Express algorithms via async iterator (e.g., ADMM) Bulk Async Parallel TIME TIME Bulk Synch Parallel Key idea: Exploit statistical robustness in system designs Prioritize model maintenance by robustness ML task: Split models according to robustness Break dataflow barriers using new iterator model
  237. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  238. Serializability COORDINATION REQUIRED GUARANTEED SAFETY Eventual Consistency COORDINATION FREE NO

    SAFETY Atomic Visibility SIGMOD14 Database Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  239. DESIGN DATABASE SYSTEMS THAT EXPLOIT SEMANTICS OF HIGH-VALUE USE CASES

    MY APPROACH: Study practical database use cases Derive principles and algorithms Build systems to realize the benefits
  240. PBS: Integrated into Cassandra 1.2 release RAMP: Proposed feature in

    Cassandra 3.0 (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant + recent extensions at a major Internet company
  241. PBS: Integrated into Cassandra 1.2 release RAMP: Proposed feature in

    Cassandra 3.0 (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant + recent extensions at a major Internet company HAT Isolation: part of Kleppmann@LinkedIn’s Hermitage testing suite
  242. PBS: Integrated into Cassandra 1.2 release RAMP: Proposed feature in

    Cassandra 3.0 (Reportedly) on roadmap for Facebook Apollo, IBM Cloudant + recent extensions at a major Internet company HAT Isolation: part of Kleppmann@LinkedIn’s Hermitage testing suite Active dialogue with developer, NoSQL community via invited talks, blogging, social media
  243. Current Practice PBS VLDB12, SIGMOD13, VLDBJ14, CACM14 EC Today CACM/Queue13

    Consistency without Borders SoCC13 Network Partitions CACM/Queue14 Feral Concurrency Control SIGMOD15 Principles I-Confluence VLDB15 HATs HotOS13, VLDB14 Explicit Causality SoCC12 Systems Bolt-On SIGMOD13 RAMP + Indexing SIGMOD14 Velox CIDR15 Plasma + BAP Ongoing MY WORK: COORDINATION AVOIDANCE
  244. Current Practice PBS VLDB12, SIGMOD13, VLDBJ14, CACM14 EC Today CACM/Queue13

    Consistency without Borders SoCC13 Network Partitions CACM/Queue14 Feral Concurrency Control SIGMOD15 Principles I-Confluence VLDB15 HATs HotOS13, VLDB14 Explicit Causality SoCC12 Systems Bolt-On SIGMOD13 RAMP + Indexing SIGMOD14 Velox CIDR15 Plasma + BAP Ongoing MY WORK: COORDINATION AVOIDANCE
  245. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning
  246. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning View materialization and selection for model maintenance
  247. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning View materialization and selection for model maintenance Bounded divergence control for coordinating learners
  248. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning View materialization and selection for model maintenance Bounded divergence control for coordinating learners Next-Generation Data Applications
  249. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning View materialization and selection for model maintenance Bounded divergence control for coordinating learners Next-Generation Data Applications Next 10-100x growth in data volume due to sensors, apps
  250. FUTURE WORK Automatically coordinated applications Bespoke analysis and coordination synthesis

    “Query optimization” for transaction execution DB meets “Big Data” Learning View materialization and selection for model maintenance Bounded divergence control for coordinating learners Next-Generation Data Applications Next 10-100x growth in data volume due to sensors, apps New interfaces for increased coordination costs, heterogeneity
  251. WHAT THE APPLICATION SAYS “post on timeline” “accept friend request”

    write read write read write write read write write write read write WHAT THE DATABASE HEARS read read read read read read
  252. Eventual Consistency COORDINATION FREE NO SAFETY Atomic Visibility SIGMOD14 Database

    Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE
  253. Eventual Consistency COORDINATION FREE NO SAFETY Atomic Visibility SIGMOD14 Database

    Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE Joint work with Ali Ghodsi, Joe Hellerstein, Ion Stoica, Mike Franklin, Michael Jordan, Alan Fekete, Dan Crankshaw, Shivaram Venkataraman, Neil Conway, Peter Alvaro, Aaron Davidson, Joey Gonzalez, Kyle Kingsbury, Haoyuan Li, and Zhao Zhang
  254. Eventual Consistency COORDINATION FREE NO SAFETY Atomic Visibility SIGMOD14 Database

    Constraints VLDB15, SIGMOD15 Model Prediction and Training CIDR15, TBA Weak Isolation HotOS13, VLDB14 Causality SOCC12, SIGMOD13 COORDINATION AVOIDANCE GUARANTEED SAFETY WITHOUT COORDINATION MORE SEMANTICS MORE SAFETY PBS VLDB12, VLDBJ14, SIGMOD13, CACM14 COORDINATION FREE Joint work with Ali Ghodsi, Joe Hellerstein, Ion Stoica, Mike Franklin, Michael Jordan, Alan Fekete, Dan Crankshaw, Shivaram Venkataraman, Neil Conway, Peter Alvaro, Aaron Davidson, Joey Gonzalez, Kyle Kingsbury, Haoyuan Li, and Zhao Zhang
  255. Many illustrations by the Noun Project (CC-Attribution): surprised by Julian

    Derveaux world by Wayne Tyler Sall database by Austin Condiff earth by Martin Vanco Woman by Simon Child Man by Simon Child Doctor by Simon Child David-Hockney by Simon Child Server by Simon Child clock by christoph robausch