Slide 1

Slide 1 text

Lithium A split-brain resolver for Akka-Cluster Dennis van der Bij @MrDnx DennisVDB

Slide 2

Slide 2 text

OMS • SwissBorg’s OMS (order management system) • Aggregates the prices of 4 crypto-exchanges • Best-execution 2

Slide 3

Slide 3 text

OMS’ objectives • Best-execution • High availability 3

Slide 4

Slide 4 text

OMS cluster Node-2 Node-3 Node-1 Node-4 Node-5 • Persistent actors • Singleton actors • … You are here S Super-important singleton 4

Slide 5

Slide 5 text

Unreachable nodes Node-2 Node-3 Node-1 Node-4 Node-5 • S cannot be reached • Need to start S on a reachable node • Singleton actors are not migrated when nodes are unreachable S Partition A Partition B Dead or alive? 5

Slide 6

Slide 6 text

Membership state • Leader chosen deterministically • Leader manages state transitions on convergence • Convergence cannot be reached with unreachable nodes • Eventually-perfect FD* • Nodes cannot become fully-fledged members or gracefully leave the cluster *Hayashibara, Naohiro, et al. "The φ accrual failure detector." Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004 Joining Up Leaving Exiting Removed Down Leader Leader Leader Leader 6

Slide 7

Slide 7 text

Remove from membership state Node-2 Node-3 Node-1 Node-2 Node-3 Node-1 S 7

Slide 8

Slide 8 text

Remove from membership state Node-4 Node-5 S 8

Slide 9

Slide 9 text

Split-brain Node-2 Node-3 Node-1 Node-4 Node-5 Network partition One cluster becomes two clusters S S 9

Slide 10

Slide 10 text

Split-brain resolver • Prevent split-brains to happen in the 1st place • Pick only one partition that will survive - Survivor will down the unreachable nodes - Non-survivors will down themselves 10

Slide 11

Slide 11 text

Existing solutions • Lightbend SBR - Multiple strategies - Multi-DC - Starting at $50’000 per year • Four OSS SBR’s - Two used in production, single strategy (MOIA) - Two others, multiple strategies (fail my tests) 11

Slide 12

Slide 12 text

Lithium • Strategies - Static-quorum, keep-majority, keep-oldest, and keep-referee • Multi-datacenter support • Tests, tests, tests 12

Slide 13

Slide 13 text

Static-quorum • Pick partition with at least N nodes • Downs the cluster: more than nodes, no partition with at least N nodes. 2N − 1 13

Slide 14

Slide 14 text

Static-quorum 14 Node-2 Node-3 Node-1 Node-4 Node-5 N = 3

Slide 15

Slide 15 text

Static-quorum 15 Node-2 Node-3 Node-1 Node-4 Node-5 Node-2 Node-3 Node-1 N = 3

Slide 16

Slide 16 text

16 Keep-majority • Pick partition with a majority of nodes (or lowest address) • Downs the cluster: no partition with a majority 16

Slide 17

Slide 17 text

Keep-majority 17 Node-2 Node-3 Node-1 Node-4 Node-5

Slide 18

Slide 18 text

Keep-majority 18 Node-2 Node-3 Node-1 Node-4 Node-5 Node-2 Node-3 Node-1

Slide 19

Slide 19 text

19 Keep-oldest • Pick partition containing the oldest member • Oldest member hosts the singleton instance • Nearly entire cluster is downed when oldest is alone 19 19

Slide 20

Slide 20 text

Keep-oldest 20 Node-2 Node-3 Node-1 Node-4 Node-5 Oldest

Slide 21

Slide 21 text

Keep-oldest 21 Node-2 Node-3 Node-1 Node-4 Node-5 Node-4 Node-5 Oldest

Slide 22

Slide 22 text

22 Keep-referee • Pick the partition containing the “referee” node • Downs most of the cluster when the referee is alone 22 22

Slide 23

Slide 23 text

Keep-referee 23 Node-2 Node-3 Node-1 Node-4 Node-5 Referee

Slide 24

Slide 24 text

Keep-referee 24 Node-2 Node-3 Node-1 Node-4 Node-5 Node-4 Node-5 Referee

Slide 25

Slide 25 text

Choosing a strategy 25 Use “role” to only take in account certain members

Slide 26

Slide 26 text

How it works • Provide instance of DowningProvider • Each cluster member runs an instance of Lithium 26

Slide 27

Slide 27 text

Demo 27

Slide 28

Slide 28 text

Tests, tests, tests • ~70% LOCs are tests • Unit tests + property-based tests • “multi-jvm” tests 28

Slide 29

Slide 29 text

Scenarios • Use property-based tests to detect edge-cases • Splits during membership changes 29

Slide 30

Slide 30 text

Multi-JVM tests • Simulate a cluster locally • Split links between members programmatically • Observe how it gets resolved 30

Slide 31

Slide 31 text

Demo 31

Slide 32

Slide 32 text

Comparison Static-quorum Keep-majority Keep-oldest Keep-referee Multi-DC Lithium ARD N/A N/A N/A N/A SAD N/A ACCD N/A ADR N/A N/A N/A N/A 32

Slide 33

Slide 33 text

https://github.com/SwissBorg/lithium/ 33 @MrDnx DennisVDB