Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing safe and highly available distributed applications

Sreeja S Nair
July 02, 2021
41

Designing safe and highly available distributed applications

Sreeja S Nair

July 02, 2021
Tweet

Transcript

  1. Designing safe and highly-available distributed applications Sreeja S. Nair Carlos

    Baquero, Associate professor, Universidade do Minho Reviewer Béatrice Bérard, Professor, Sorbonne Université Examiner Carla Ferreira, Associate professor, Universidade Nova de Lisboa Examiner Éric Gressier-Soudan, Professor, Conservatoire National des Arts et Métiers Reviewer Bradley King, Co-founder & Field CTO, Scality Examiner Martin Kleppmann, Senior Research Associate, University of Cambridge Examiner Gustavo Petri, Researcher, Arm Cambridge Examiner Marc Shapiro, Distinguished Research Scholar, Sorbonne Université-Inria Advisor Thesis defended on 1 July 2021 before a defense committee composed of:
  2. Trade-offs 👨💻 ‣ High availability ‣ Strong Consistency CAP Theorem

    Consistency, Availability, Partition Tolerance [Gilbert&Lynch’02] 5
  3. Trade-offs 👨💻 ‣ High availability ‣ Strong Consistency 👨💻 ‣

    High availability ‣ Eventual Consistency CAP Theorem Consistency, Availability, Partition Tolerance [Gilbert&Lynch’02] 5
  4. Trade-offs 👨💻 ‣ High availability ‣ Strong Consistency 👨💻 ‣

    High availability ‣ Eventual Consistency CAP Theorem Consistency, Availability, Partition Tolerance [Gilbert&Lynch’02] Safety 5
  5. Trade-offs 👨💻 ‣ High availability ‣ Strong Consistency 👨💻 ‣

    High availability ‣ Eventual Consistency 👨💻 ‣ High availability ‣ Safety CAP Theorem Consistency, Availability, Partition Tolerance [Gilbert&Lynch’02] Safety 5
  6. Designing distributed applications Static Analysis Design con fl ict resolution

    policies Synthesize concurrency control Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? 2. CRDT Tree case study 3. Coordination lattice + metrics 6 Safe Y Y N N 1. New Proof rule + Tool
  7. Contribution Part I 7 Proof rule and tool for verifying

    safety of highly-available distributed objects ‣ Modular ‣ Automated Veri fi cation Static Analysis Speci fi cation and invariants
  8. Invariants of Auction Bids can be placed only when the

    status is active When auction is closed, there is a winner Winner is the highest bid 8
  9. 105 100 Sequentially Safe Operations with Preconditions 100 Place bid

    105 100 Close auction 100 105 Place bid 100 105 100 105 100 Close auction 9 Preplace_bid ≜ status = ACTIVE Preclose_auction ≜ ∄b ∈ Bids ⋅ b > winner
  10. Concurrency Place bid 100 105 105 100 100 100 105

    A state in a replica evolves with • A local update by operation • Merge Merge is the only point of observable concurrency 10 Marc
  11. Safe Merge with Preconditions 100 105 100 105 100 95

    100 95 105 95 11 Premerge ≜ status = CLOSED ⟹ highest(Bids, w) ∧ highest(Bids′  , w) ∧ status′  = CLOSED ⟹ highest(Bids′  , w′  ) ∧ highest(Bids, w)
  12. Precondition for merge? Never block any merge!! It must hold

    true since merge can happen at any time • Can be also called Concurrency Invariant The weakest precondition to be upheld for the resulting state of merge to uphold the data invariant Ensures all concurrent operations are still safe Invglobal = Invdata ∧ Invconc 12
  13. Suf fi cient Condition for Safe Distributed Objects Initial state

    satis fi es the global invariant Each update and merge preserves the global invariant 13 σ ⊨ Invdata ∧ Invconc ∧ Preop σnew = op(σ) σnew ⊨ Invdata ∧ Invconc σ ⊨ Invdata ∧ σ′  ⊨ Invdata ∧ (σ, σ′  ) ⊨ Invconc σnew = merge(σ, σ′  ) σnew ⊨ Invdata ∧ Invconc
  14. Tool Support //stat e var status:bv2 ; var winner:BidId ;

    var bids:Bid ; var token:[ReplicaId]bool ; @invarian t function inv(status:bv2, winner:BidId, bids:Bid, token:[ReplicaId]bool) returns(bool ) @gte q function gteq(status1:bv2, winner1:BidId, bids1:Bid, token1:[ReplicaId]bool, status2:bv2, winner2:BidId, bids2:Bid, token2:[ReplicaId]bool) returns(bool ) @merg e procedure merge(status1:bv2, winner1:BidId, bids1:Bid, token1:[ReplicaId]bool ) procedure startAuction( ) procedure placeBid(bid_identifier:BidId, value:int ) procedure closeAuction( ) procedure releaseToken() 14 https://github.com/sreeja/soteria_tool
  15. Future work for Part I • An equivalent for concurrency

    invariant for the class of distributed applications that propagate operations • Proof rule for distributed applications regardless of the update propagation mechanism • Improving usability of the tool Proof rule and tool for verifying safety of highly-available distributed objects ‣ Modular ‣ Automated Veri fi cation 15
  16. Designing Safe and Highly Available Distributed Applications Static Analysis Design

    con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT 16 Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  17. Designing Safe and Highly Available Distributed Applications Static Analysis Design

    con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT 16 Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  18. Designing Safe and Highly Available Distributed Applications Static Analysis Design

    con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT 16 Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  19. Designing Safe and Highly Available Distributed Applications Static Analysis Design

    con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT 16 Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  20. Contribution Part II 17 Maram - Coordination-free safe and highly-available

    replicated tree • Trade-o ff : lost move Design con fl ict resolution policies
  21. Concurrent move in a replicated tree / a b move

    a under b / b a / a b move b under a / a b 18
  22. Concurrent move in a replicated tree / a b move

    a under b / b a / a b move b under a / a b 18 move b under a / b a Con fl ict resolution • Skip one move among the pair of con fl icts deterministically
  23. Impact of con fl ict resolution / a b /

    a b / b a move b under / 1 move a under b 2 19
  24. Impact of con fl ict resolution / a b /

    a b / b a move b under / 1 move a under b 2 19 Skipping 1 leads to cycle! / a b move a under b 2 / a b Independence Analysis • The conditions under which a historical move might impact the current move - Historical Enabler Move
  25. Coordination-free move 20 move (n, p’): if historical enabler move

    skipped: skip if con fl ict: skip else: apply
  26. Future Work for Part II • Design a state-based replicated

    tree with the same con fl ict resolution semantics • Prove the soundness of independence analysis for safe distributed applications that have tentative e ff ectors • Optimize metadata required for con fl ict resolution policies Coordination-free safe and highly-available replicated tree • Trade-o ff : lost move 22
  27. Designing Safe and Highly Available Distributed Applications 23 Static Analysis

    Design con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  28. Designing Safe and Highly Available Distributed Applications 23 Static Analysis

    Design con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  29. Designing Safe and Highly Available Distributed Applications 23 Static Analysis

    Design con fl ict resolution policies Synthesize concurrency control Ex: Tree CRDT Speci fi cation and invariants Workload characteristics Con fl icts? Can the application afford anomalies? Safe Y Y N N
  30. Contribution Part III 24 Selecting optimal distributed lock con fi

    guration • Coordination lattice • Metrics Synthesize concurrency control Workload characteristics
  31. Constraints imposed by the system and application • Coordination to

    ensure safety • Skewed workload in locality and frequency • Network latency • Cost of locking Variables that can be controlled • Granularity • Type of lock - Mutual exclusion - Shared/exclusive lock • Placement of lock - Distance between client and lock 25 Selecting Ef fi cient Concurrency Control for Safety
  32. Coordination needed to ensure safety using locks 26 Safety constraint

    Place bid Close auction Unregister buyer Remove bid auction auction buyer
  33. Skewed in frequency per operation 27 Workload constraint Place bid

    Close auction Unregiste r buyer Remove bid 90% 1% 7% 2% Frequency of Place bid operation for a single auction Paris Houston Singapore 95% 5% 0% Skewed in frequency per replica
  34. 29 Lock cost constraint Function of network latency and computation

    Di ff erent for di ff erent placement; same for both lock modes
  35. Constraints Safety constraint • Locks to ensure safety Workload constraint

    • Skewed workload in locality and frequency Network constraint • Latency between replicas Lock cost constraint • Cost of locking as a function of computation and network latency 30
  36. 31 Granularity variable Place bid Close auction Unregister buyer Remove

    bid auction buyer Place bid Close auction Unregister buyer Remove bid auction auction buyer Place bid Close auction Unregister buyer Remove bid auction buyer + auction Place bid Close auction Unregister buyer Remove bid buyer + auction Coarsening Coarsening Coarsening Coarsening
  37. Effect of granularity Coarsening Lock acquisition time Contention Place bid

    Close auction Unregister buyer Remove bid auction auction buyer Place bid Close auction Unregister buyer Remove bid auction buyer + auction Place bid Close auction Unregister buyer Remove bid buyer + auction 2 locks 1 lock 1 lock 32
  38. 33 Mode of locking variable Mutual exclusion Shared/exclusive lock Place

    bid Close auction Remove bid auction Mutex Mutex Mutex Exclusive mode introduces contention Shared mode allows parallel execution Place bid Close auction Remove bid auction Shared Shared Exclusive Place bid Close auction Remove bid auction Exclusive Exclusive Shared
  39. 33 Mode of locking variable Mutual exclusion Shared/exclusive lock Place

    bid Close auction Remove bid auction Mutex Mutex Mutex Exclusive mode introduces contention Shared mode allows parallel execution Place bid Close auction Remove bid auction Shared Shared Exclusive Place bid Close auction Remove bid auction Exclusive Exclusive Shared
  40. 34 Lock placement variable Distance between client and lock affects

    lock acquisition time Paris Houston Singapore 95% 5% 0% Frequency of Place bid operation for a single auction
  41. Selecting Coordination Con fi guration Placement - Minimize acquisition cost

    Mode - Maximize parallelism allowed Granularity - Minimize execution time 35
  42. The Coordination Lattice Place bid Close auction Unregister buyer Remove

    bid a1 a2 b Coordination Element 36 b a1 a2 b a1a2 ba1 a2 ba1a2
  43. Coordination Element 37 b a1a2 Mode b unregister buyer place

    bid MX EX EX SX1 EX SH SX2 SH EX mode placement a1a2 MX SX1 SX2 H ouston Paris Singapore Mode a1a2 place bid close auction remove bid MX EX EX EX SX1 EX SH EX SX2 SH EX SH Coordination con fi guration mode placement b MX SX1 SX2 H ouston Paris Singapore
  44. Navigating the coordination lattice CcRepExecTime = ∑ r∈R (CcOpSerial(r) +

    CcRepSerial(r) + CcOpParallel(r)) |R| Minimize 38 CcOpSerial(r) = Impact of serialization due to exclusive mode of locking CcRepSerial(r) = Impact of serialization due to serialization inside a replica CcOpParallel(r) = Parallelism allowed by shared mode of locking Cost of locking is impacted by lock placement, a component inside all the metrics
  45. Two operation con fl ict x y a Placement 1

    Houston 2 Paris 3 Singapore Mode x y 1 EX EX 2 EX SH 3 SH EX granularity-mode-placement Execution time (ms) Worklo ad H P S x 0 500 0 y 250 0 250 39
  46. Two operation con fl ict x y a Placement 1

    Houston 2 Paris 3 Singapore Mode x y 1 EX EX 2 EX SH 3 SH EX granularity-mode-placement Execution time (ms) Worklo ad H P S x 0 500 0 y 250 0 250 39 1. Colocate lock with the workload 2. Choose shared mode for the most frequently distributed operation
  47. Three operations con fl ict x y z a1 a2

    a1 , a2 a1 a2 Placement a1 a2 1 H H 2 P H 3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode a1 a2 x y y z 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X Placement abc 1 H 2 P 3 S Mode abc x y z 1 X X X 2 X S X 3 S X S Workload H P S x 0 1 0 y 100 100 100 z 0 1 0 40
  48. 41 Execution time (ms) granularity-mode-placement Three operations con fl ict

    Workl oad H P S x 0 1 0 y 100 100 100 z 0 1 0 Placement ab bc 1 H H 2 P H 3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X Mode abc place close remove 1 X X X 2 X S X 3 S X S Placement abc 1 H 2 P 3 S x y z a1 a2
  49. 42 Execution time (ms) granularity-mode-placement Three operations con fl ict

    Workl oad H P S x 100 100 0 y 0 0 0 z 0 0 50 Placement ab bc 1 H H 2 P H 3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X Mode abc place close remove 1 X X X 2 X S X 3 S X S Placement abc 1 H 2 P 3 S x y z a1 a2
  50. 42 Execution time (ms) granularity-mode-placement Three operations con fl ict

    Workl oad H P S x 100 100 0 y 0 0 0 z 0 0 50 Placement ab bc 1 H H 2 P H 3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X Mode abc place close remove 1 X X X 2 X S X 3 S X S Placement abc 1 H 2 P 3 S Coarsening is a trade-off between contention and acquisition cost x y z a1 a2
  51. Future Work for Part III • Probabilistic constraint model •

    Support dynamic recon fi guration of coordination con fi guration • More concurrency control mechanisms Selecting optimal distributed lock con fi guration • Coordination lattice • Metrics 43
  52. Conclusion • Proof rule of Part I veri fi es

    safety of distributed applications • Replicated tree of Part II illustrates con fl ict resolution policy design • Coordination lattice of Part III is a fi rst step to systematically explore performance implications of concurrency control • A step towards engineering Just-Right-Consistent distributed applications 44
  53. Propagating updates in Distributed Applications Sreeja Marc Gustavo Sreeja Marc

    Gustavo Operation-based update propagation State-based update propagation 46
  54. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Operation-based update propagation State-based update propagation 46
  55. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation 46
  56. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation 46
  57. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation 46
  58. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation 46
  59. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation 46
  60. Propagating updates in Distributed Applications Sreeja Marc Gustavo Start auction

    Sreeja Marc Gustavo Start auction Start auction Start auction Operation-based update propagation State-based update propagation Proof rule and tool for verifying safety of highly-available distributed objects ‣ Modular ‣ Automated Veri fi cation 46
  61. Evolution of Auction Object Sreeja Marc Gustavo Start auction Place

    bid Place bid 100 100 100 105 100 100 100 105 100 47 105 100
  62. Evolution of Auction Object Sreeja Marc Gustavo Start auction Place

    bid Place bid 100 Close auction 100 100 100 105 100 100 100 105 100 47 105 100
  63. Evolution of Auction Object Sreeja Marc Gustavo Start auction Place

    bid Place bid 100 Close auction 100 100 100 105 100 100 100 100 100 105 105 100 47 105 100
  64. Evolution of Auction Object Sreeja Marc Gustavo Start auction Place

    bid Place bid 100 Close auction 100 100 100 105 100 100 100 100 100 105 105 100 47 105 100
  65. Safety of Auction Object Object Invariant • Bids can be

    placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states 48
  66. Safety of Auction Object Object Invariant • Bids can be

    placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states 48
  67. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states 48
  68. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states 48
  69. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states Place bid • If auction concurrently closed in other replica, precondition of merge violated! 48
  70. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states Place bid • If auction concurrently closed in other replica, precondition of merge violated! 48
  71. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states Place bid • If auction concurrently closed in other replica, precondition of merge violated! Close auction • Similar to place bid 48
  72. Safety of Auction Object Start auction Object Invariant • Bids

    can be placed only when the status is active • When auction is closed, there is a winner • Winner is the highest bid Concurrency Invariant • Winner in either state is the highest bid in both states Place bid • If auction concurrently closed in other replica, precondition of merge violated! Close auction • Similar to place bid 48
  73. Safety of Auction Object Sreeja Marc Gustavo Start auction Place

    bid Place bid 100 Close auction 100 100 100 105 105 100 100 100 100 100 105 105 100 49 Con fl ict
  74. Tool Support //stat e var status:bv2 ; var winner:BidId ;

    var bids:Bid ; var token:[ReplicaId]bool ; @invarian t function inv(status:bv2, winner:BidId, bids:Bid, token:[ReplicaId]bool) returns(bool ) @gte q function gteq(status1:bv2, winner1:BidId, bids1:Bid, token1:[ReplicaId]bool, status2:bv2, winner2:BidId, bids2:Bid, token2:[ReplicaId]bool) returns(bool ) @merg e procedure merge(status1:bv2, winner1:BidId, bids1:Bid, token1:[ReplicaId]bool ) procedure startAuction( ) procedure placeBid(bid_identifier:BidId, value:int ) procedure closeAuction( ) procedure releaseToken() 50 https://github.com/sreeja/soteria_tool
  75. Issues with Tree CRDT Tree CRDTs have been implemented with

    add/remove operations using set CRDTs [0] Move operations are known to be unsafe in concurrent execution [1] A fi x for concurrent move operations is to form a total order for operations [2], but it is expensive Another solution is to convert a move operation into a copy- delete pair [3], but ending up in multiple copies of the node [0] Stéphane Martin, Mehdi Ahmed-Nacer, Pascal Urso. Abstract unordered and ordered trees CRDT. [1] M. Najafzadeh, M. Shapiro, P. Eugster : “Co-Design and Veri fi cation of an Available File System”, VMCAI 2018 [2] Kleppmann, M., Gomes, V. B., Mulligan, D. P., & Beresford, A. R. OpSets: Sequential Speci fi cations for Replicated Datatypes. [3] Vinh Tao, Marc Shapiro, and Vianney Rancurel. Merging semantics for con fl ict updates in geo-distributed fi le systems. 53
  76. Sequential speci fi cation of a tree State - Set

    of nodes • Node - (id, parent) Operations • add_node(id, parent) • remove_node(id, parent) • move(from_parent, id, to_parent) Invariant • Maintain tree structure : all nodes reachable from root (This implies no cycles since every node has a single parent) 54
  77. Trees • Tree is a set of nodes and with

    parent-child relations, and no cycles • Special node called Root has no parent, created at tree initialisation • All non-root nodes have exactly one parent and must be reachable from root • Operations supported : add, remove, move • Used to store File system structure, xml, json etc. 55 Root
  78. add_node add(n, p) Precondition ∧ Tree invariant ∧ n is

    unique and not already present ∧ p already exists and reachable from root Postcondition ∧ Tree invariant ∧ (n, p) added to set of nodes 56 p n
  79. add_node add(n, p) Precondition ∧ Tree invariant ∧ n is

    unique and not already present ∧ p already exists and reachable from root Postcondition ∧ Tree invariant ∧ (n, p) added to set of nodes 56 p n
  80. remove_node remove_node(n, p) Precondition ∧ Tree invariant ∧ n is

    reachable from root ∧ n has no children Postcondition ∧ Tree invariant ∧ (n, p) is removed 57 n p
  81. remove_node remove_node(n, p) Precondition ∧ Tree invariant ∧ n is

    reachable from root ∧ n has no children Postcondition ∧ Tree invariant ∧ (n, p) is removed 57 p
  82. move move(p, c, p’) Precondition ∧ Tree invariant ∧ p

    reachable from root ∧ p’ reachable from root ∧ p parent of c ∧ c is not ancestor of p’ Postcondition ∧ Tree invariant ∧ (c, p’) 58 c p p’
  83. move move(p, c, p’) Precondition ∧ Tree invariant ∧ p

    reachable from root ∧ p’ reachable from root ∧ p parent of c ∧ c is not ancestor of p’ Postcondition ∧ Tree invariant ∧ (c, p’) 58 c p p’
  84. move(p, c, p’) Precondition ∧ Tree invariant ∧ p reachable

    from root ∧ p’ reachable from root ∧ p parent of c ∧ c is not ancestor of p’ Postcondition ∧ Tree invariant ∧ (c, p’) 59 c p p’ Move causing cycle
  85. move(p, c, p’) Precondition ∧ Tree invariant ∧ p reachable

    from root ∧ p’ reachable from root ∧ p parent of c ∧ c is not ancestor of p’ Postcondition ∧ Tree invariant ∧ (c, p’) 59 c p p’ Move causing cycle
  86. move(p, c, p’) Precondition ∧ Tree invariant ∧ p reachable

    from root ∧ p’ reachable from root ∧ p parent of c ∧ c is not ancestor of p’ Postcondition ∧ Tree invariant ∧ (c, p’) 59 c p p’ Move causing cycle
  87. Move in concurrent execution move (p, c, p’): Preconditions: Root

    →* p // RP ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF 60
  88. Move in concurrent execution move (p, c, p’): Preconditions: Root

    →* p // RP ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF 60 Stable (No recursive remove)
  89. Move in concurrent execution move (p, c, p’): Preconditions: Root

    →* p // RP ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF 60 Stable (No recursive remove) Unstable against remove(p’)
  90. Unstable <Root →* p’> 61 remove(p’, a) move(p, c, p’)

    p’ is not reachable ! p p’ c a p c a
  91. Unstable <Root →* p’> 61 remove(p’, a) move(p, c, p’)

    p’ is not reachable ! remove(p’, a) p p’ c a p c a p p’ c a
  92. Unstable <Root →* p’> 61 remove(p’, a) move(p, c, p’)

    p’ is not reachable ! remove(p’, a) Tombstone marked p p’ c a p c a p p’ c a p p’ c a
  93. Unstable <Root →* p’> 61 remove(p’, a) move(p, c, p’)

    p’ is not reachable ! remove(p’, a) move(p, c, p’) Tombstone marked p p’ c a p c a p p’ c a p p’ c a
  94. Unstable <Root →* p’> 61 remove(p’, a) move(p, c, p’)

    p’ is not reachable ! remove(p’, a) move(p, c, p’) Tombstone marked p p’ c a p c a p p’ c a p p’ c a p p’ c a
  95. Move in concurrent execution move (p, c, p’): Preconditions: Root

    →* p // RP ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF 62 Stable (No recursive remove) Unstable against remove(p’) Fixed with tombstones
  96. Move in concurrent execution move (p, c, p’): Preconditions: Root

    →* p // RP ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF 62 Stable (No recursive remove) Unstable against remove(p’) Unstable against move(p,c,p’’) Fixed with tombstones
  97. Unstable p → c 63 move(p, c, p’) p p’

    p’’ c p p’ p’’ c
  98. Unstable p → c 63 move(p, c, p’) move(p, c,

    p’’) p is not a parent of c ! p p’ p’’ c p p’ p’’ c
  99. Unstable p → c 63 move(p, c, p’) move(p, c,

    p’’) p is not a parent of c ! Con fl ict resolution: If high priority move: apply else: skip p p’ p’’ c p p’ p’’ c
  100. Unstable p → c 63 move(p, c, p’) move(p, c,

    p’’) p is not a parent of c ! move(p, c, p’) m1 p p’ p’’ c p p’ p’’ c p p’ p’’ c
  101. Unstable p → c 63 move(p, c, p’) move(p, c,

    p’’) p is not a parent of c ! move(p, c, p’) m1 p p’ p’’ c p p’ p’’ c p p’ p’’ c p p’ p’’ c
  102. Unstable p → c 63 move(p, c, p’) move(p, c,

    p’’) p is not a parent of c ! move(p, c, p’) move(p, c, p’’) priority(m2) > priority(m1) m1 m2 p p’ p’’ c p p’ p’’ c p p’ p’’ c p p’ p’’ c p p’ p’’ c
  103. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64
  104. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove)
  105. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove) Unstable against remove(p’)
  106. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove) Unstable against remove(p’) Fixed with tombstones
  107. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove) Unstable against remove(p’) Unstable against move(p,c,p’’) Fixed with tombstones
  108. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove) Unstable against remove(p’) Unstable against move(p,c,p’’) Fixed with tombstones Con fl ict resolution applied
  109. move (p, c, p’): Preconditions: Root →* p // RP

    ∧ Root →* p' // RP' ∧ p → c // PC ∧ ¬ (c →* p') // NOT_UNDER_SELF Move in concurrent execution 64 Stable (No recursive remove) Unstable against remove(p’) Unstable against move(p,c,p’’) Unstable against concurrent move Fixed with tombstones Con fl ict resolution applied
  110. Unstable NOT_UNDER_SELF 65 Consider move(p, c, p’) In a sequential

    execution NOT_UNDER_SELF forbids to move p’ under c However, a concurrent move of p’ under c would cause a cycle NOT_UNDER_SELF is not stable! Generalises to moving any node p’’ ∈ [p’, LCA) ∀n . n → * p′  ∧ n ↛ * c ∀n . c → * n Critical ancestors Critical descendants LCA p c p’
  111. Con fl ict condition on NOT_UNDER_SELF 66 move(p1, c1, p1’)

    and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a
  112. Con fl ict condition on NOT_UNDER_SELF 66 Critical ancestors of

    move(p, c, p’) move(p1, c1, p1’) and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a move(p, c, p’)
  113. Con fl ict condition on NOT_UNDER_SELF 66 Critical ancestors of

    move(p, c, p’) move(p1, c1, p1’) and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a move(p, c, p’)
  114. Con fl ict condition on NOT_UNDER_SELF 66 Critical ancestors of

    move(p, c, p’) Critical ancestors of move(a, p’, c) move(p1, c1, p1’) and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a move(p, c, p’) move(a, p’, c)
  115. Con fl ict condition on NOT_UNDER_SELF 66 Critical ancestors of

    move(p, c, p’) Critical ancestors of move(a, p’, c) move(p1, c1, p1’) and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a move(p, c, p’) move(a, p’, c)
  116. Con fl ict condition on NOT_UNDER_SELF 66 Critical ancestors of

    move(p, c, p’) Critical ancestors of move(a, p’, c) move(p1, c1, p1’) and move(p2, c2, p2’) con fl ict when a node c1 that is being moved, is also a critical ancestor of another move: c1 ∈ [ p2’, LCA (c2, p2’) ) p p’ c a move(p, c, p’) move(a, p’, c) Move of critical ancestor causes a cycle!
  117. Comparison to SoA0 67 Critical ancestors of move(p, c, p’)

    p p’ c a move(p, c, p’) Read locks Write lock
  118. Comparison to SoA0 67 Critical ancestors of move(p, c, p’)

    p p’ c a move(p, c, p’) Read locks Write lock
  119. Comparison to SoA0 67 Critical ancestors of move(p, c, p’)

    p p’ c a move(p, c, p’) move(a, p’, c) Read locks Write lock
  120. Comparison to SoA0 67 Critical ancestors of move(p, c, p’)

    p p’ c a move(p, c, p’) move(a, p’, c) move(a, p’, c) cannot happen since p’ → c Read locks Write lock
  121. Comparison to SoA0 67 Critical ancestors of move(p, c, p’)

    p p’ c a move(p, c, p’) move(a, p’, c) move(a, p’, c) cannot happen since p’ → c Read locks Write lock Same result achieved without a lock!
  122. Concurrent up-moves are safe 68 move(p, c, p’) is up-move

    if rank(c) > rank(p’) NOT_UNDER_SELF is unstable if any critical ancestors move under critical descendants ∀ d ∈ critical descendants, a ∈ critical ancestors, rank(d) > rank(a) This implies that only a down-move can cause c →* p’ p p’ c
  123. Classifying moves Up-move - moving near to the root, or

    the same distance Down-move - moving away from the root 69 Concurrent moves on critical ancestors Up-move Down-move Up-move No Con fl ict Con fl ict Down-move Con fl ict Con fl ict
  124. Con fl ict resolution for NOT_UNDER_SELF 70 Up- move Concurrent

    move on critical ancestor High priority move Apply Skip Y N Y Y N N Concurrent move is up Y N
  125. Impact of con fl ict resolution / a c b

    d / a c b d Lisbon Paris move b under d down-move move d under b down-move / a c b d / a c b d move a under b down-move / a c b d 1 2 3 71
  126. Impact of con fl ict resolution / a c b

    d / a c b d Lisbon Paris move b under d down-move move d under b down-move / a c b d / a c b d move a under b down-move / a c b d 1 2 3 1;2 concurrent with 3 71
  127. Impact of con fl ict resolution / a c b

    d / a c b d Lisbon Paris move b under d down-move move d under b down-move / a c b d / a c b d move a under b down-move / a c b d 1 2 3 1;2 concurrent with 3 1 con fl icts with 3, 2 doesn’t con fl ict with 3 71
  128. Impact of con fl ict resolution / a c b

    d / a c b d Lisbon Paris move b under d down-move move d under b down-move / a c b d / a c b d move a under b down-move / a c b d 1 2 3 1;2 concurrent with 3 1 con fl icts with 3, 2 doesn’t con fl ict with 3 3 has higher priority than 1 1 skips 71
  129. Impact of con fl ict resolution / a c b

    d / a c b d Lisbon Paris move b under d down-move move d under b down-move / a c b d / a c b d move a under b down-move / a c b d 1 2 3 1;2 concurrent with 3 1 con fl icts with 3, 2 doesn’t con fl ict with 3 Both replicas apply 3, then 2 3 has higher priority than 1 1 skips 71
  130. Impact of con fl ict resolution / a c b

    d Paris move d under b down-move / a c b d move a under b down-move 2 3 / a c b d 72
  131. Impact of con fl ict resolution / a c b

    d Paris move d under b down-move / a c b d move a under b down-move 2 3 Both replicas apply 3, then 2 / a c b d 72
  132. Impact of con fl ict resolution / a c b

    d Paris move d under b down-move / a c b d move a under b down-move 2 3 Both replicas apply 3, then 2 / a c b d CYCLE!! 72
  133. Impact of con fl ict resolution / a c b

    d Paris move d under b down-move / a c b d move a under b down-move 2 3 Both replicas apply 3, then 2 / a c b d CYCLE!! Reason: 2 is dependent on the skipped 1 72
  134. Impact of con fl ict resolution / a c b

    d Paris move d under b down-move / a c b d move a under b down-move 2 3 Both replicas apply 3, then 2 / a c b d Independence Analysis CYCLE!! Reason: 2 is dependent on the skipped 1 72
  135. Historic enabler move Dependency up-move (nen, pen’) down-move (nen, pen’)

    up-move (n, p’) down-move (n, p’) p′  ∈ 𝙳 (nen ) (nen ∈ 𝙳 (n) ∧ p′  ∈ 𝙳 (nen )) ∨ n ∈ 𝙳 (nen ) (nen ∈ 𝙳 (n) ∧ p′  ∈ 𝙳 (nen )) ∨ n ∈ 𝙳 (nen ) p′  ∈ 𝙳 (nen ) 73
  136. Metadata for move operation To determine concurrent moves To determine

    concurrent moves that will result in a cycle (critical ancestors) To determine the type of move To determine skipped historical enabler moves (critical descendants) 74
  137. Coordination-free solution 75 move (p, c, p’): if historical enabler

    move skipped: skip if concurrent move of same node: highest priority move wins if up-move: apply if concurrent move operations on critical ancestors: if concurrent up-move present: skip else: highest priority move wins (concurrent down-move) else: apply
  138. Auction : Static Analysis Con fl icts detected • Add

    product - unregister seller • Create auction - unregister seller • Add to lot - add to lot • Add to lot - start auction • Add to lot - remove auction • Add to lot - remove product • Start auction - remove from lot • Place bid - close auction • Place bid - unregister buyer • Close auction - remove bid 78
  139. Auction : Synchronisation Condition Generation 79 • Add product -

    unregister seller seller • Create auction - unregister seller seller • Add to lot - add to lot seller, (product, seller) • Add to lot - start auction auction • Add to lot - remove auction auction • Add to lot - remove product (product, seller) • Start auction - remove from lot auction • Place bid - close auction auction • Place bid - unregister buyer buyer • Close auction - remove bid auction
  140. Auction : Grouping 80 Add product Unregister seller Create auction

    Place bid Close auction Unregister buyer Remove bid Register seller Register buyer Add to lot Start auction Remove from lot Remove auction Remove product
  141. Auction : Filtering 81 Add product Unregister seller Create auction

    Place bid Close auction Unregister buyer Remove bid Register seller Register buyer Add to lot Start auction Remove from lot Remove auction Remove product
  142. Auction : Filtering 81 Add product Unregister seller Create auction

    Place bid Close auction Unregister buyer Remove bid Add to lot Start auction Remove from lot Remove auction Remove product
  143. Auction Place bid Close auction auction a b lab−auction Placement

    1 Houston 2 Paris 3 Singapore Mode place bid close auction 1 X X 2 X S 3 S X granularity-mode-placement Execution time (ms) Work load H P S place 0 1000 0 close 0 0 0 84
  144. Auction Place bid Close auction auction a b lab−auction Placement

    1 Houston 2 Paris 3 Singapore Mode place bid close auction 1 X X 2 X S 3 S X granularity-mode-placement Execution time (ms) Work load H P S place 333 333 333 close 0 0 0 85
  145. Auction place bid close auction remove bid a1 a2 a1

    , a2 a1 a2 Placement ab bc 1 H H 2 P H 3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X Placement abc 1 H 2 P 3 S Mode abc place close remove 1 X X X 2 X S X 3 S X S Workload H P S place 455 455 0 close 37 37 0 remove 8 8 0 86
  146. Auction Placement ab bc 1 H H 2 P H

    3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X g-m-p Execution time (ms) Placement abc 1 H 2 P 3 S Mode abc place close remove 1 X X X 2 X S X 3 S X S Workload H P S place 455 455 0 close 37 37 0 remove 8 8 0 87
  147. Auction Placement ab bc 1 H H 2 P H

    3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X g-m-p Execution time (ms) Placement abc 1 H 2 P 3 S Mode abc place close remove 1 X X X 2 X S X 3 S X S Workload H P S place 455 455 0 close 37 37 0 remove 8 8 0 87 1. Colocate lock with the workload 2. Choose shared mode for the most frequently distributed operation 3. Coarsening is a trade-off between contention and acquisition cost
  148. Auction Placement ab bc 1 H H 2 P H

    3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X g-m-p Execution time (ms) Placement abc 1 H 2 P 3 S Mode abc place close remove 1 X X X 2 X S X 3 S X S Workload H P S place 0 909 0 close 0 75 0 remove 0 15 0 88 Granularity 1 - 2 locks Granularity 2 - single lock
  149. Auction Placement ab bc 1 H H 2 P H

    3 S H 4 H P 5 P P 6 S P 7 H S 8 P S 9 S S Mode ab bc place close close remove 1 X X X X 2 X S X X 3 S X X X 4 X X X S 5 X S X S 6 S X X S 7 X X S X 8 X S S X 9 S X S X g-m-p Execution time (ms) Placement abc 1 H 2 P 3 S Mode abc place close remove 1 X X X 2 X S X 3 S X S Workload H P S place 303 303 303 close 25 25 25 remove 5 5 5 89 Granularity 1 - 2 locks Granularity 2 - single lock