Upgrade to Pro — share decks privately, control downloads, hide ads and more …

pg2 and You: Getting Distributed with Elixir

Ddd5bf859e632d93f73753e5a98a9e2c?s=47 Eric Entin
September 01, 2016

pg2 and You: Getting Distributed with Elixir

Erlang's pg2 module provides distributed process groups and is a key technology underlying Phoenix's default PubSub adapter, which powers the Channels API that we know and love. Together, we'll take a guided tour of pg2's capabilities and see how easy it is to add distributed features to your own applications. Along the way, we'll dissect pg2's internals to see how the surprisingly short Erlang implementation can be accomplished in Elixir, while uncovering other primitives that make distributed Erlang so powerful. It's time to learn just how easy it is to tap into everything distributed Elixir has to offer!


Eric Entin

September 01, 2016


  1. pg2 and You: Getting Distributed with Elixir Eric Entin @antipax

  2. Distributed Applications are awesome!

  3. Distributed Elixir is really awesome!

  4. Distributed Erlang/OTP makes Distributed Elixir really awesome!

  5. But…

  6. Distributed Applications Are Hard really hard!

  7. And…

  8. Erlang/OTP Isn’t Magic but it does provide some really nice

    tools that make distributed applications a little easier!
  9. One of these tools is called…

  10. pg2 distributed named process groups

  11. pg2 allows us to create, join, and query groups of

    processes across a cluster.
  12. What does pg2 let us do? • We can access

    a group of processes by a common name. For example, there can be a set of processes (which can be located on different nodes) that are all members of the group :foobar. • We can send a message to one, some, or all group members. • If a member process terminates, it is automatically removed from the group.
  13. Let’s take a look at a real world example of

    pg2 in use…
  14. …in a chat app… (what else?)

  15. …using Phoenix Channels! “bidirectional communication for soft-realtime functionality”

  16. Our Phoenix Chat App Phone Tablet Browser Phoenix Server

  17. Well, that was easy.

  18. But what about when we outgrow one server? (most likely

    for redundancy, given Phoenix’s performance characteristics)
  19. Our Distributed Phoenix Chat App Phone Tablet Browser Phoenix Server

    Phoenix Server
  20. But how is this implemented in Phoenix?

  21. Phoenix PubSub (simplified) Client 1 Socket 1 Channel 2 PubSub

    Server Channel 1 Client 2 Socket 2 This is a process running on each node.
  22. Our Distributed Phoenix Chat App Phone Tablet Browser Phoenix Server

    Phoenix Server How do we find the PubSub server on all of our nodes?
  23. pg2!

  24. By having our Phoenix PubSub servers join a pg2 group,

    we can fan our messages out across the cluster. (disclaimer: in practice, PubSub is implemented via adapters, so we can use pg2, redis, or any other solution we can think of!)
  25. Let’s take a look at a simplified code example.

  26. pg2 example defmodule Phoenix.PubSub.PG2Server do use GenServer def init(server_name) do

    :ok = :pg2.create({:phx, server_name}) :ok = :pg2.join({:phx, server_name}, self()) {:ok, server_name} end … end
  27. pg2 example, cont. defmodule Phoenix.PubSub.PG2Server do … def broadcast(server_name, topic,

    message) do Enum.each :pg2.get_members({:phx, server_name}), fn pid -> send pid, {:broadcast, topic, message} end end def handle_info({:broadcast, topic, message}, server_name) do Local.broadcast(topic, message) {:noreply, server_name} end end
  28. In practice, Phoenix PubSub is more complicated, thanks to extensive

    optimization. But this is essentially how it works.
  29. But how does pg2 work?

  30. What can we learn about its characteristics from its implementation?

  31. In order to answer these questions and learn exactly how

    pg2 works, I translated the code to Elixir.
  32. Introducing: RePG2 A highly-documented translation of the original Erlang pg2

    implementation to Elixir for educational purposes.
  33. Why?

  34. The true specification of behavior is the code itself. By

    reading the code of our favorite software, we can gain a deeper understanding.
  35. Not everyone knows Erlang. Despite the high quality of the

    implementation, pg2's Erlang code is not necessarily easy to read, even if you know Erlang.
  36. For fun! Sometimes you’re just bored.

  37. In order to accomplish these goals, I set out some

    guiding principles for the translation.
  38. RePG2 Translation Principles • RePG2 code should be idiomatic, easy-to-read,

    fully (over?) documented Elixir. • RePG2 should be identical to pg2 in terms of functionality and performance characteristics, even if it has been refactored to increase clarity. • Code which exists purely for backwards compatibility may be eliminated in the interest of clarity.
  39. Tests were also written using ExUnit for full RePG2 code

    coverage. This includes a distributed suite which interacts with multiple nodes.
  40. RePG2 vs. pg2 • RePG2 does not have the same

    backwards compatibility as pg2, and has only been tested on Erlang/OTP 18.3 and Elixir 1.2.4 • pg2 is started under the kernel_safe_sup, a special OTP kernel supervisor for important services that are considered safe to restart. RePG2 is implemented as a normal OTP application. • pg2 will start itself if it is not yet started. RePG2 expects to be added to :applications in mix.exs and will not start itself.
  41. How much work was the translation?

  42. Actually, not much, because pg2 is tiny!

  43. In fact, pg2 is only 333 lines of code. with

    comments, 390!
  44. pg2 is simple thanks to some other useful tools OTP

    provides. RePG2 uses all of these tools as well.
  45. pg2’s OTP Toolbox • GenServer • ETS • :global •

    Node and Process Monitoring
  46. Let’s take a look at each of these individually. All

    of these apply equally to RePG2.
  47. Additionally, by looking at how pg2 uses these tools, we

    will gain a high-level understanding of pg2’s implementation.
  48. GenServer • Each node (which is using pg2) has a

    pg2 server process running. • This server process serves as the central point of interaction for pg2 between (and within) each node.
  49. ETS • ETS is an in-memory concurrent storage solution for

    Elixir terms. • In pg2, ETS is used to store process groups and their memberships. • Reads can happen from any process, but to avoid race conditions, writes are serialized through the node’s pg2 server. • ETS is extremely useful in general, and this pattern in particular is very common.
  50. :global • The :global module provides a function called trans/2.

    This function acquires a lock across the entire cluster, using any Elixir term as a key, and then runs a provided function. After the function completes, the lock is released. • By combining trans/2 with GenServer’s multi_call/3, which allows us to call all processes registered with a given name within the cluster, pg2 ensures that only one process across the entire cluster can modify any given group at a time. • This pattern can be very useful in our own code!
  51. Node and Process Monitoring • The :net_kernel function monitor_nodes/1 allows

    the calling process to register for notifications about nodes connecting and disconnecting from the cluster. • When the pg2 server receives a notification that a new node has connected, it merges the groups and memberships between itself and the new member’s pg2 server. • Additionally, pg2 registers a monitor for each process which joins a group. If this monitor reports that the process is down (which could be because the process died, or because its node disconnected), the process’s membership is removed from the local data.
  52. That’s the pg2 implementation!

  53. What are some key insights we can take away from

    these pg2 implementation details?
  54. pg2 uses global locks • While reading group memberships is

    very fast, modifying them is a globally locked operation requiring multiple network round-trips. • We may run into problems with lock overhead if our groups contain a large number of memberships.
  55. pg2 is a distributed database • In terms of the

    CAP theorem, pg2 is AP: available, and partition- tolerant. • Cluster partitions will only see groups and memberships from nodes that are reachable. • However, pg2 is eventually consistent in that it automatically heals from any partitions. • Process groups are uniquely easy to distribute thanks to monitors, and the fact that conflicts can be easily resolved via merging.
  56. Overall, pg2 is an amazingly powerful tool. Just be aware

    of the caveats!
  57. Check out RePG2 for more examples. Including how to build

    a distributed test suite!
  58. Thanks! Questions? https://github.com/antipax/repg2 https://www.twitter.com/antipax