Lithium: a split-brain resolver for Akka-Cluster

Lithium: a split-brain resolver for Akka-Cluster

When using Akka-Cluster, when some nodes become unreachable, no one can join or even leave the cluster anymore. To bring back the cluster to a fully working state, the unreachable nodes must be downed. However, because there is no way of knowing if a node has crashed or is victim of a network partition, if done incorrectly the downing could lead to data corruption, a split-brain, and a headache fixing it.

In order to automatically and correctly recover from unreachable nodes, Lightbend provides a resolver through it’s subscription. For individuals and companies that cannot afford the subscription, some open-source solutions exist but do not come near it in terms of features and correctness. To fix that gap, I developed an open-source split-brain resolver called Lithium as part of my EPFL master project.

In this talk I will introduce Lithium, explain how it works helps with recovering the cluster from unreachable nodes, its internals, and everything to know to set it up.

84927d0a1eee5f4388297a45d16ef0f2?s=128

Dennis van der Bij

October 09, 2019
Tweet

Transcript

  1. Lithium A split-brain resolver for Akka-Cluster Dennis van der Bij

    @MrDnx DennisVDB
  2. OMS • SwissBorg’s OMS (order management system) • Aggregates the

    prices of 4 crypto-exchanges • Best-execution 2
  3. OMS’ objectives • Best-execution • High availability 3

  4. OMS cluster Node-2 Node-3 Node-1 Node-4 Node-5 • Persistent actors

    • Singleton actors • … You are here S Super-important singleton 4
  5. Unreachable nodes Node-2 Node-3 Node-1 Node-4 Node-5 • S cannot

    be reached • Need to start S on a reachable node • Singleton actors are not migrated when nodes are unreachable S Partition A Partition B Dead or alive? 5
  6. Membership state • Leader chosen deterministically • Leader manages state

    transitions on convergence • Convergence cannot be reached with unreachable nodes • Eventually-perfect FD* • Nodes cannot become fully-fledged members or gracefully leave the cluster *Hayashibara, Naohiro, et al. "The φ accrual failure detector." Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004 Joining Up Leaving Exiting Removed Down Leader Leader Leader Leader 6
  7. Remove from membership state Node-2 Node-3 Node-1 Node-2 Node-3 Node-1

    S 7
  8. Remove from membership state Node-4 Node-5 S 8

  9. Split-brain Node-2 Node-3 Node-1 Node-4 Node-5 Network partition One cluster

    becomes two clusters S S 9
  10. Split-brain resolver • Prevent split-brains to happen in the 1st

    place • Pick only one partition that will survive - Survivor will down the unreachable nodes - Non-survivors will down themselves 10
  11. Existing solutions • Lightbend SBR - Multiple strategies - Multi-DC

    - Starting at $50’000 per year • Four OSS SBR’s - Two used in production, single strategy (MOIA) - Two others, multiple strategies (fail my tests) 11
  12. Lithium • Strategies - Static-quorum, keep-majority, keep-oldest, and keep-referee •

    Multi-datacenter support • Tests, tests, tests 12
  13. Static-quorum • Pick partition with at least N nodes •

    Downs the cluster: more than nodes, no partition with at least N nodes. 2N − 1 13
  14. Static-quorum 14 Node-2 Node-3 Node-1 Node-4 Node-5 N = 3

  15. Static-quorum 15 Node-2 Node-3 Node-1 Node-4 Node-5 Node-2 Node-3 Node-1

    N = 3
  16. 16 Keep-majority • Pick partition with a majority of nodes

    (or lowest address) • Downs the cluster: no partition with a majority 16
  17. Keep-majority 17 Node-2 Node-3 Node-1 Node-4 Node-5

  18. Keep-majority 18 Node-2 Node-3 Node-1 Node-4 Node-5 Node-2 Node-3 Node-1

  19. 19 Keep-oldest • Pick partition containing the oldest member •

    Oldest member hosts the singleton instance • Nearly entire cluster is downed when oldest is alone 19 19
  20. Keep-oldest 20 Node-2 Node-3 Node-1 Node-4 Node-5 Oldest

  21. Keep-oldest 21 Node-2 Node-3 Node-1 Node-4 Node-5 Node-4 Node-5 Oldest

  22. 22 Keep-referee • Pick the partition containing the “referee” node

    • Downs most of the cluster when the referee is alone 22 22
  23. Keep-referee 23 Node-2 Node-3 Node-1 Node-4 Node-5 Referee

  24. Keep-referee 24 Node-2 Node-3 Node-1 Node-4 Node-5 Node-4 Node-5 Referee

  25. Choosing a strategy 25 Use “role” to only take in

    account certain members
  26. How it works • Provide instance of DowningProvider • Each

    cluster member runs an instance of Lithium 26
  27. Demo 27

  28. Tests, tests, tests • ~70% LOCs are tests • Unit

    tests + property-based tests • “multi-jvm” tests 28
  29. Scenarios • Use property-based tests to detect edge-cases • Splits

    during membership changes 29
  30. Multi-JVM tests • Simulate a cluster locally • Split links

    between members programmatically • Observe how it gets resolved 30
  31. Demo 31

  32. Comparison Static-quorum Keep-majority Keep-oldest Keep-referee Multi-DC Lithium ARD N/A N/A

    N/A N/A SAD N/A ACCD N/A ADR N/A N/A N/A N/A 32
  33. https://github.com/SwissBorg/lithium/ 33 @MrDnx DennisVDB