Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consistent, Distributed Elixir

Consistent, Distributed Elixir

An explanation of consistency guarantees in erlang and what we might be able to do with a consistency protocol like Raft.

06f8b41980eb4c577fa40c41d5030c19?s=128

Chris Keathley

February 23, 2018
Tweet

Transcript

  1. Consistent, Distributed Elixir Chris Keathley / @ChrisKeathley / c@keathley.io

  2. I have a problem…

  3. None
  4. “Phoenix is not your application”

  5. I don’t know what my application is any more

  6. Elixir is awesome

  7. Processes

  8. Problem: “We need to limit access to an external resource”

  9. Solution: “Lets just use processes!”

  10. Global lock

  11. :unlocked Global lock

  12. :unlocked Global lock Client

  13. Global lock Client :lock :unlocked

  14. Global lock Client :lock :unlocked

  15. Global lock Client :locked

  16. Global lock Client :ok :locked

  17. Global lock Client :ok :locked

  18. Global lock Client :locked

  19. Global lock Client :unlock :locked

  20. :locked Global lock Client :unlock

  21. :locked Global lock Client

  22. :unlocked Global lock Client

  23. :unlocked Global lock Client :ok

  24. :unlocked Global lock Client :ok

  25. :unlocked Global lock Client

  26. :unlocked Global lock Client Client

  27. :unlocked Global lock Client Client

  28. :unlocked Global lock Client Client

  29. :unlocked Global lock Client Client

  30. Global lock Client Client :locked

  31. Global lock Client Client :locked

  32. Global lock Client Client :locked

  33. Global lock Client Client :locked

  34. Global lock Client Client :locked

  35. Global lock Client Client :locked

  36. Global lock Client Client :locked

  37. Global lock Client Client :locked

  38. Global lock Client Client :locked

  39. Global lock Client Client :locked

  40. Global lock Client Client :locked

  41. Global lock Client Client :locked

  42. Global lock Client Client :unlocked

  43. Global lock Client Client :unlocked

  44. Global lock Client Client :unlocked

  45. Global lock Client Client :unlocked

  46. Global lock Client Client :locked

  47. Global lock Client Client :locked

  48. Global lock Client Client :locked

  49. defmodule Demo.Lock do use GenServer end

  50. defmodule Demo.Lock do use GenServer def init(:ok) do state =

    {:unlocked, nil} {:ok, state} end end
  51. defmodule Demo.Lock do use GenServer def init(:ok) do state =

    {:unlocked, nil} {:ok, state} end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end end
  52. defmodule Demo.Lock do use GenServer def init(:ok) do state =

    {:unlocked, nil} {:ok, state} end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  53. defmodule Demo.Lock do use GenServer def init(:ok) do state =

    {:unlocked, nil} {:ok, state} end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  54. Global lock

  55. Problem: “We need to run multiple Nodes”

  56. Multiple Nodes

  57. Multiple Nodes Node

  58. Node Multiple Nodes

  59. Node Node Multiple Nodes

  60. Node Client Client Multiple Nodes Node

  61. Node Node Client Multiple Nodes Client

  62. Node Node Client Multiple Nodes Client

  63. Node Node Client Multiple Nodes Client

  64. Node Node Client Multiple Nodes Client

  65. Node Node Client Multiple Nodes Client

  66. Node Node Client Multiple Nodes Client

  67. Node Node Client Multiple Nodes Client This is bad

  68. Solution: “Lets just use a global process”

  69. GenServer.start_link(Lock, :ok, name: Lock)

  70. GenServer.start_link(Lock, :ok, name: {:global, Lock})

  71. Solution: Global Process Node Node Client Client

  72. Solution: Global Process Node Node Client Client Remove this

  73. Solution: Global Process Node Node Client Client Global

  74. Solution: Global Process Node Node Client Client

  75. Solution: Global Process Node Node Client Client

  76. Solution: Global Process Node Node Client Client

  77. Solution: Global Process Node Node Client Client

  78. Solution: Global Process Node Node Client Client

  79. Solution: Global Process Node Node Client Client

  80. Solution: Global Process Node Node Client Client

  81. Solution: Global Process Node Node Client Client

  82. Solution: Global Process Node Node Client Client What if this

    goes away?
  83. Solution: Global Process Node Client Client

  84. Solution: Global Process Node Client Client

  85. Solution: Global Process Node Client Client Start a new lock

    process
  86. Solution: Global Process Node Client Client Start a new lock

    process
  87. Solution: Global Process Node Client Client

  88. Solution: Global Process Node Client Client

  89. Solution: Global Process Node Client Client

  90. Solution: Global Process Node Client Client

  91. Problem: “what if the node isn’t really down?”

  92. Solution: Global Process Node Node Client Client

  93. Solution: Global Process Node Node Client Client

  94. Solution: Global Process Node Node Client Client Partition

  95. Solution: Global Process Node Node Client Client

  96. Solution: Global Process Node Node Client Client

  97. Solution: Global Process Node Node Client Client

  98. Solution: Global Process Node Node Client Client Guess it must

    be down
  99. Solution: Global Process Node Node Client Client Start a new

    lock process
  100. Solution: Global Process Node Node Client Client Start a new

    lock process
  101. Solution: Global Process Node Node Client Client

  102. Solution: Global Process Node Node Client Client

  103. Solution: Global Process Node Node Client Client

  104. Solution: Global Process Node Node Client Client

  105. Solution: Global Process Node Node Client Client This Lock still

    exists
  106. Solution: Global Process Node Node Client Client

  107. Solution: Global Process Node Node Client Client

  108. Solution: Global Process Node Node Client Client

  109. Solution: Global Process Node Node Client Client

  110. Solution: Global Process Node Node Client Client

  111. Solution: Global Process Node Node Client Client

  112. Solution: Global Process Node Node Client Client

  113. Solution: Global Process Node Node Client Client This is bad

  114. Problem: “What happens when the partition heals?”

  115. Solution: Global Process Node Node Client Client

  116. Solution: Global Process Node Node Client Client Heals

  117. Solution: Global Process Node Node Client Client

  118. Solution: Global Process Node Node Client Client Who should win?

  119. Solution: Global Process Node Node Client Client This is bad

  120. Solution: “Lets just use The Database”

  121. Solution: Databases Node Node Client Client

  122. Solution: Databases Node Node Client Client

  123. Solution: Databases Node Node Client Client Redis

  124. Solution: “Lets have a way to consistently manage state in

    elixir”
  125. When a network is partitioned you can either be available

    or consistent CAP Theorem
  126. “Every request receives a response without guarantee that it contains

    the most recent write” Available
  127. “Every read receives the most recent write or it errors”

    Consistent
  128. AP CP Available during partitions Consistent during partitions

  129. Available

  130. Problem: “We need to keep track of counts”

  131. Counters Node Client Node Client

  132. Counters Node Client Node Client +0 +0

  133. Counters Node Client Node Client +0 +0 Store additions

  134. Counters Node Client Node Client +0 +0

  135. Counters Node Client Node Client +2 +0 +0

  136. Counters Node Client Node Client +2 +0 +0

  137. Counters Node Client Node Client +0 +0 +2

  138. Counters Node Client Node Client +2 +0 +0 +2

  139. Counters Node Client Node Client +2 +0 +0 +2

  140. Counters Node Client Node Client +0 +2 +0 +2

  141. Counters Node Client Node Client +0 +2 +0 +2 :read

  142. Counters Node Client Node Client +0 +2 +0 +2 :read

  143. Counters Node Client Node Client +0 +2 +0 +2 2

  144. Counters Node Client Node Client +0 +2 +0 +2 2

  145. Counters Node Client Node Client +0 +2 +0 +2

  146. Counters Node Client Node Client +0 +2 +0 +2

  147. Counters Node Client Node Client +0 +2 +0 +2 +3

  148. Counters Node Client Node Client +0 +2 +0 +2 +3

  149. Counters Node Client Node Client +0 +2 +0 +2 +3

  150. Counters Node Client Node Client +0 +2 +0 +2 +3

    +3
  151. Counters Node Client Node Client +0 +2 +0 +2 +3

    +3
  152. Counters Node Client Node Client +0 +2 +0 +2 +3

    :read
  153. Counters Node Client Node Client +0 +2 +0 +2 +3

    :read
  154. Counters Node Client Node Client +0 +2 +0 +2 +3

    2
  155. Counters Node Client Node Client +0 +2 +0 +2 +3

    2
  156. Counters Node Client Node Client +0 +2 +0 +2 +3

  157. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  158. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  159. Counters Node Client Node Client +0 +2 +0 +2 +3

  160. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  161. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5 Inconsistent
  162. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  163. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  164. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  165. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5
  166. Counters Node Client Node Client +0 +2 +0 +2 +3

    +5 +5 +3
  167. Sometimes you’re going to be wrong (And thats ok)

  168. Phoenix presence

  169. Some problems need consistency Distributed Locking Databases Distributed scheduling and

    coordination Configuration and metadata storage Transactions
  170. Consistent

  171. Node Partitions in consistent systems Node Node

  172. Node Partitions in consistent systems Node Node

  173. Node Partitions in consistent systems Node Node

  174. Node Partitions in consistent systems Node Node

  175. Node Partitions in consistent systems Node Node

  176. Consensus

  177. Paxos

  178. Leslie Lamport

  179. Leslie Lamport

  180. Paxos (but simpler)

  181. Raft (this time a lot simpler)

  182. Raft (this time a lot simpler)

  183. Problem: “We need to limit access to an external resource”

  184. defmodule Demo.Lock do use GenServer def init(:ok) do {:ok, {:unlocked,

    nil}} end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  185. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:ok, {:unlocked,

    nil}} end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  186. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_call({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  187. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_write({:lock, client}, _from, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  188. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_write({:lock, client}, {:unlocked, nil}) do {:reply, :ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  189. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_write({:lock, client}, {:unlocked, nil}) do {:ok, {:locked, client}} end def handle_call({:lock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  190. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_write({:lock, client}, {:unlocked, nil}) do {:ok, {:locked, client}} end def handle_write({:lock, client}, {:locked, other_client}) do {:error, {:locked, other_client}} end def handle_call({:unlock, client}, _from, {:locked, client}) do {:reply, :ok, {:unlocked, nil}} end def handle_call({:unlock, client}, _from, {:locked, other_client}) do {:reply, :error, {:locked, other_client}} end end
  191. defmodule Demo.Lock do use Raft.StateMachine def init(:ok) do {:unlocked, nil}

    end def handle_write({:lock, client}, {:unlocked, nil}) do {:ok, {:locked, client}} end def handle_write({:lock, client}, {:locked, other_client}) do {:error, {:locked, other_client}} end def handle_write({:unlock, client}, {:locked, client}) do {:ok, {:unlocked, nil}} end def handle_write({:unlock, client}, {:locked, other_client}) do {:error, {:locked, other_client}} end end
  192. None
  193. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3)

  194. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3) Raft.set_configuration(:s1,

    [:s1, :s2, :s3])
  195. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3) Raft.set_configuration(:s1,

    [:s1, :s2, :s3]) :ok = Raft.write(:s1, {:lock, :s1})
  196. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3) Raft.set_configuration(:s1,

    [:s1, :s2, :s3]) :ok = Raft.write(:s1, {:lock, :s1}) :error = Raft.write(:s2, {:lock, :s2}) :error = Raft.write(:s2, {:unlock, :s2})
  197. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3) Raft.set_configuration(:s1,

    [:s1, :s2, :s3]) :ok = Raft.write(:s1, {:lock, :s1}) :error = Raft.write(:s2, {:lock, :s2}) :error = Raft.write(:s2, {:unlock, :s2}) :ok = Raft.write(:s1, {:unlock, :s1})
  198. Raft.start_peer(Demo.Lock, name: :s1) Raft.start_peer(Demo.Lock, name: :s2) Raft.start_peer(Demo.Lock, name: :s3) Raft.set_configuration(:s1,

    [:s1, :s2, :s3]) :ok = Raft.write(:s1, {:lock, :s1}) :error = Raft.write(:s2, {:lock, :s2}) :error = Raft.write(:s2, {:unlock, :s2}) :ok = Raft.write(:s1, {:unlock, :s1}) :ok = Raft.write(:s2, {:lock, :s2})
  199. Demo

  200. How does this work?

  201. Node Node Node Consensus & leader election

  202. Consensus & leader election Leader Follower Follower Client

  203. Consensus & leader election Leader Follower Follower Client

  204. Consensus & leader election Leader Follower Follower Client

  205. Consensus & leader election Leader Follower Follower Client Replicated

  206. Consensus & leader election Leader Follower Follower Client

  207. Consensus & leader election Leader Follower Follower Client

  208. Consensus & leader election Leader Follower Follower Client

  209. Consensus & leader election Leader Follower Follower Client Committed

  210. Consensus & leader election Leader Follower Follower Client

  211. Consensus & leader election Leader Follower Follower Client

  212. Consensus & leader election Leader Follower Follower Client

  213. Consensus & leader election Leader Follower Follower Client Heartbeats

  214. Consensus & leader election Leader Follower Follower Client

  215. Consensus & leader election Leader Follower Follower Client

  216. Consensus & leader election Leader Follower Follower Client

  217. Consensus & leader election Leader Follower Follower Client

  218. Consensus & leader election Leader Follower Client Follower

  219. Consensus & leader election Leader Follower Client Follower Starts a

    new election
  220. Consensus & leader election Leader Follower Client Follower

  221. Consensus & leader election Leader Candidate Client Follower

  222. Consensus & leader election Leader Candidate Client Follower

  223. Consensus & leader election Leader Candidate Client Follower

  224. Consensus & leader election Leader Candidate Client Follower

  225. Consensus & leader election Leader Leader Follower Client

  226. Consensus & leader election Leader Leader Follower Client

  227. Consensus & leader election Leader Leader Follower Client

  228. Consensus & leader election Leader Leader Follower Client

  229. Consensus & leader election Leader Leader Follower Client

  230. Consensus & leader election Leader Leader Follower Client

  231. Consensus & leader election Follower Leader Follower Client

  232. Consensus & leader election Follower Leader Follower Client

  233. Consensus & leader election Follower Leader Follower Client

  234. Consensus & leader election Follower Leader Follower Client

  235. Consensus & leader election Follower Leader Follower Client

  236. Consensus & leader election Follower Leader Follower Client

  237. Consensus & leader election Follower Leader Follower Client

  238. Consensus & leader election Follower Leader Follower Client

  239. Consensus & leader election Follower Leader Follower Client

  240. Consensus & leader election Follower Leader Follower Client

  241. Consensus & leader election Follower Leader Follower Client

  242. Consensus & leader election Follower Leader Follower Client

  243. Consensus & leader election Follower Leader Follower Client

  244. Logs Linearized Writes Replicated across all nodes RocksDB

  245. Testing Property Tests Jepsen

  246. What can we do now?

  247. KV Store Service discovery Distributed lock manager Database transactions Configuration

    management
  248. Toniq

  249. https://github.com/toniqsystems/raft https://github.com/toniqsystems/raft_demo https://Toniq.sh https://speakerdeck/keathley Links

  250. Todo More Testing Dynamic Node Configurations LMDB Storage Adapter

  251. Now we can build applications and manage state safely without

    needing to leave elixir.
  252. Thanks Chris Keathley / @ChrisKeathley / c@keathley.io