Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)

Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)

Slides for my talk at Elixir Conf EU 2018 (www.elixirconf.eu/elixirconf2018)

What happens when you don't like centralized things? You do P2P! I created a distributed smart office using Elixir that runs in a single P2P network. There are a lot of subtleties into this. How can we prevent someone from entering the network? How do we manage shared data? What topology do we use? How about nodes that are unrealiable and we are not sure when they will connect or disconnect? What if YOU want to add a node and we can't trust you to deliver messages? I will use the office as an example and teach you how different the reasoning between a web-like centralised context and a P2P distributed system is. We will go from simple P2P topologies (gossip, trees) to more complex ones (Gnutella2, HyParView, Plumtrees), analyse their problems and take a look at what CRDTs are and how awesome they can be for shared data.

6497e10d8345ce6fee06048127196d6b?s=128

Fernando Mendes

April 27, 2018
Tweet

Transcript

  1. Knee-Deep Into P2P A Tale of Fail @fribmendes

  2. None
  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. I don’t know how to smart office

  12. I know how to web development

  13. I know how to web development … what now?

  14. @fribmendes me failing at photoshop

  15. None
  16. None
  17. I know how to web development … what now?

  18. None
  19. None
  20. None
  21. None
  22. Step 1: receive new connections

  23. Step 1: receive new connections Step 2: accept and send

    messages
  24. Step 1: receive new connections Step 2: accept and send

    messages Step 3: do a bunch of Steps 1 and 2
  25. Step 1: receive new connections

  26. None
  27. defp accept_loop(pid, server_socket) do {:ok, client} = :gen_tcp.accept(server_socket) :inet.setopts(client, [active:

    true]) :gen_tcp.controlling_process(client, pid) Gossip.accept(pid, client) accept_loop(pid, server_socket) end
  28. defp accept_loop(pid, server_socket) do {:ok, client} = :gen_tcp.accept(server_socket) :inet.setopts(client, [active:

    true]) :gen_tcp.controlling_process(client, pid) Gossip.accept(pid, client) accept_loop(pid, server_socket) end
  29. Step 1: receive new connections Step 2: accept and send

    messages
  30. None
  31. def recv_loop(pid, socket) do receive do {:tcp, _port, msg} ->

    # process an incoming message {:tcp_closed, port} -> # close the sockets {:send, msg} -> # send an outgoing message end end end
  32. Step 1: receive new connections Step 2: accept and send

    messages Step 3: do a bunch of Steps 1 and 2
  33. Raspberry Pi #1 Raspberry Pi #2

  34. None
  35. None
  36. None
  37. Testing

  38. Gossip Node A

  39. Gossip Server start_link() Node A

  40. Gossip Server Node A listen_loop() The Internet

  41. Gossip Server Node A Node B Worker

  42. Gossip Server {:accept, socket} Node A Node B Worker

  43. Gossip Server start_link(socket) Worker Node A Node B Worker

  44. Gossip Server recv_loop(socket) Worker Node A Node B Worker

  45. Gossip Server recv_loop(socket) Worker Node A Node B test this

    Worker
  46. def recv_loop(pid, socket) do receive do {:tcp, _port, msg} ->

    # echo the message {:tcp_closed, port} -> # close the sockets {:send, msg} -> # send the message end end end
  47. describe "recv_loop/2" do test "echoes :tcp messages" do end test

    "disconnects on :tcp_closed messages" do end test "sends a message on :send messages" do end end
  48. def recv_loop(pid, socket) do receive do {:tcp, _port, msg} ->

    # ... {:tcp_closed, port} -> # ... {:send, msg} -> # ... end end end
  49. gossip def recv_loop(pid, socket) do receive do {:tcp, _port, msg}

    -> # ... {:tcp_closed, port} -> # ... {:send, msg} -> # ... end end end
  50. self () def recv_loop(pid, socket) do receive do {:tcp, _port,

    msg} -> # ... {:tcp_closed, port} -> # ... {:send, msg} -> # ... end end end
  51. the test process def recv_loop(pid, socket) do receive do {:tcp,

    _port, msg} -> # ... {:tcp_closed, port} -> # ... {:send, msg} -> # ... end end end
  52. Gossip Server Worker recv_loop(socket) Worker Node A Node B

  53. Gossip Server self() Worker Node A Node B

  54. self() Server self() Worker Node A Node B

  55. self() Server self() Node A Node B

  56. self() Server self() Node A Node B in_socket

  57. self() Server self() Node A Node B {:accept, out_socket}

  58. self() Server self() Worker Node A Node B start_link(socket)

  59. self() Server self() Worker Node A Node B out_socket

  60. self() Server self() Worker Node A Node B out_socket in_socket

  61. self() Server self() Worker Node A Node B assert on

    out_socket write to in_socket
  62. def recv_loop(pid, socket) do receive do {:tcp, _port, msg} ->

    # ... {:tcp_closed, port} -> # ... {:send, msg} -> # ... end end end
  63. defp start_and_connect_to(port) do end

  64. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) end

  65. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) end

  66. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) {:ok, in_socket} = :gen_tcp.connect('localhost', port,

    @socket_opts) end
  67. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) {:ok, in_socket} = :gen_tcp.connect('localhost', port,

    @socket_opts) {:ok, out_socket} = receive_accept_msg() end
  68. defp receive_accept_msg do receive do {_, {:accept, out_socket}} -> {:ok,

    out_socket} after 3_000 -> {:error, :timeout} end end
  69. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) {:ok, in_socket} = :gen_tcp.connect('localhost', port,

    @socket_opts) {:ok, out_socket} = receive_accept_msg() end
  70. defp start_and_connect_to(port) do Gossip.Server.start_link([self(), port]) {:ok, in_socket} = :gen_tcp.connect('localhost', port,

    @socket_opts) {:ok, out_socket} = receive_accept_msg() {in_socket, out_socket} end
  71. Mocking gives message control to your test process

  72. self() Server self() Worker Node A Node B assert on

    out_socket write to in_socket
  73. describe "recv_loop/2" do test "echoes :tcp messages" do end end

  74. describe "recv_loop/2" do test "echoes :tcp messages" do {in_socket, out_socket}

    = start_and_connect_to(3000) end end
  75. describe "recv_loop/2" do test "echoes :tcp messages" do {in_socket, out_socket}

    = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) end end
  76. describe "recv_loop/2" do test "echoes :tcp messages" do {in_socket, out_socket}

    = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) send worker, {:tcp, in_socket, "hello"} end end
  77. describe "recv_loop/2" do test "echoes :tcp messages" do {in_socket, out_socket}

    = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) send worker, {:tcp, in_socket, “hello"} assert {:ok, "hello"} = :gen_tcp.recv(in_socket, 0) end end
  78. describe "recv_loop/2" do test "disconnects on :tcp_closed messages" do end

    end
  79. describe "recv_loop/2" do test "disconnects on :tcp_closed messages" do {in_socket,

    out_socket} = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) end end
  80. describe "recv_loop/2" do test "disconnects on :tcp_closed messages" do {in_socket,

    out_socket} = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) send worker, {:tcp_closed, out_socket} end end
  81. describe "recv_loop/2" do test "disconnects on :tcp_closed messages" do {in_socket,

    out_socket} = start_and_connect_to(3000) {:ok, worker} = start_worker(self(), out_socket) send worker, {:tcp_closed, out_socket} # assert the sockets are closed assert {:error, :closed} = :gen_tcp.recv(in_socket, 0) assert {:error, :closed} = :gen_tcp.recv(out_socket, 0) assert_receive {_, {:disconnect, ^worker}} end end
  82. Avoid named processes

  83. Inject self() into any functions that send messages

  84. Test the invoked functions directly

  85. Test the handle_* functions

  86. Play around with messages

  87. “Does it scale?”

  88. None
  89. None
  90. g

  91. None
  92. Gnutella

  93. Gnutella

  94. Gnutella

  95. Gnutella

  96. Gnutella

  97. g

  98. g (gnutella2)

  99. Gnutella

  100. G2/Gnutella2

  101. G2/Gnutella2

  102. G2/Gnutella2

  103. G2/Gnutella2

  104. None
  105. None
  106. None
  107. None
  108. None
  109. HyParView

  110. None
  111. None
  112. None
  113. None
  114. None
  115. None
  116. “Aha! It works on my computer!”

  117. “Aha! It works on my computer!”

  118. “Great but we need something to show”

  119. “Great but we need something to show” (aka Raspberry Pi

    time)
  120. “Guys… Is this a bomb? Are we going to die?”

    — @naps62
  121. “Hey, I can borrow™ someone else’s code”

  122. None
  123. None
  124. None
  125. you shall not pass!

  126. Stick everything on Raspberry Pi’s

  127. Things running on one Raspberry Pi

  128. Things running on one Raspberry Pi ✓BEAM

  129. Things running on one Raspberry Pi ✓BEAM ✓thebox (sensors)

  130. Things running on one Raspberry Pi ✓BEAM ✓thebox (sensors) ✓Phoenix

    app
  131. Things running on one Raspberry Pi ✓BEAM (x2) ✓thebox (sensors)

    ✓Phoenix app
  132. Things running on one Raspberry Pi ✓BEAM (x2) ✓thebox (sensors)

    ✓Phoenix app ✓Postgres
  133. Things running on one Raspberry Pi ✓BEAM (x2) ✓thebox (sensors)

    ✓Phoenix app ✓Postgres ✓Cassandra
  134. Things running on one Raspberry Pi ✓BEAM (x2) ✓thebox (sensors)

    ✓Phoenix app ✓Postgres ✓Cassandra it works!
  135. None
  136. None
  137. None
  138. “Looking good! Everything’s working!”

  139. lol, nope

  140. State of each node:

  141. State of each node: • Last sensor readings

  142. State of each node: • Last sensor readings • Network

    map (MAC-IP)
  143. State of each node: • Last sensor readings • Network

    map (MAC-IP) • Target values
  144. State of each node: • Last sensor readings • Network

    map (MAC-IP) • Target values
  145. None
  146. How do we handle concurrency?

  147. Vector Clocks

  148. None
  149. None
  150. None
  151. None
  152. None
  153. None
  154. None
  155. Vector = (1, 0) Vector = (0, 1)

  156. CAP Theorem

  157. CAP Theorem “you’re a programmer. you can’t have nice things.”

  158. consistency availability partitioning

  159. consistency availability partitioning

  160. Eventual Consistency

  161. CRDTs

  162. Operation-Based CRDT

  163. Operation-Based CRDT commutative but not idempotent update exactly once

  164. no CRDTs

  165. no CRDTs

  166. no CRDTs

  167. no CRDTs

  168. Op-based CRDTs

  169. Op-based CRDTs

  170. Op-based CRDTs

  171. Op-based CRDTs

  172. State-Based CRDT

  173. State-Based CRDT commutative and idempotent heavier on the network

  174. State-based CRDTs

  175. State-based CRDTs

  176. State-based CRDTs

  177. State-based CRDTs

  178. None
  179. None
  180. None
  181. None
  182. None
  183. None
  184. None
  185. None
  186. Wrapping up

  187. System resources matter

  188. System resources matter your algorithms should account for them

  189. There are models. Use them.

  190. Distributed System Checklist

  191. Distributed System Checklist •Is the number of processes known or

    finite?
  192. Distributed System Checklist •Is the number of processes known or

    finite? •Is there a global notion of time?
  193. Distributed System Checklist •Is the number of processes known or

    finite? •Is there a global notion of time? •Is the network reliable?
  194. Distributed System Checklist •Is the number of processes known or

    finite? •Is there a global notion of time? •Is the network reliable? •Is there full connectivity?
  195. Distributed System Checklist •Is the number of processes known or

    finite? •Is there a global notion of time? •Is the network reliable? •Is there full connectivity? •What happens when a process crashes?
  196. It really doesn’t change that much

  197. CRDTs aren’t a golden hammer

  198. Reinventing the wheel is stupid

  199. None
  200. Knee-Deep Into P2P A Tale of Fail @fribmendes