Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-AZ Clustering for Elixir+ Phoenix

Evadne Wu
November 29, 2017

Multi-AZ Clustering for Elixir+ Phoenix

A quick review of clustered deployment options on AWS + conceptual framework for networking architecture

Evadne Wu

November 29, 2017
Tweet

More Decks by Evadne Wu

Other Decks in Technology

Transcript

  1. Scope In Scope ➤ Establishing and maintaining an Erlang Distribution

    cluster in AWS Out of Scope ➤ Using distribution to implement certain requirements
  2. Precepts Networking
 i.e. how to ensure peers can talk with

    each other Peer Discovery
 i.e. how to identify and connect to peers Meta-Clustering
 i.e. how to avoid building a huge cluster, when you can have small ones
  3. Networking A functional networking solution only ensures that potential peers

    in a cluster can communicate with each other, but is not responsible for formation nor maintenance of a cluster.
  4. Networking The solution is highly dependent on deployment topology: ➤

    a) direct deployment with hosts or virtual machines; ➤ b) deployment with containers; ➤ c) other forms of topology
  5. Networking (AWS) Direct deployment on EC2 (long-running releases) ➤ If

    hosts face the internet directly, consider plugging a separate ENI into the host, and run cluster control traffic through different subnets. ➤ Augment Security Groups + Route Tables with on-host iptables rules ➤ If hosts are fronted by a load balancer then this is less of a concern.
  6. Networking (AWS) Multi-site deployment ➤ Consider using AWS VPC Peering

    if deploying solely on EC2 ➤ But using Erlang Distribution over WAN can be questionable ➤ See WhatsApp’s wandist ➤ Alternatively, run a host-to-host VPN between each host ➤ This establishes a virtual switch for your applications
  7. Networking (AWS) ECS-based deployment ➤ Must review Docker networking modes

    supported by ECS ➤ None ➤ Bridged — container given virtual IP ➤ Host — container uses ports on host interface ➤ VPC (New!) — each container gets an ENI
  8. Networking (AWS) ECS-based deployment ➤ Depending on desired density (number

    of containers per host), you may be able to use the VPC networking mode, which gives each container its own ENI and therefore its own privately routable IP ➤ As fast as host networking, more isolation ➤ There is a limit on how many ENIs can be attached to each container host, so this is not for you (currently) if you run many containers per host
  9. Networking (AWS) News: AWS Fargate is out today ➤ Think

    of this as ECS without need to manage instances ➤ vCPU: $0.0506/h; Memory: $0.0127/h ➤ This is however much more expensive than instances ➤ m4.large @ 2vCPU + 8GB = app. $0.025/h (spot) or $0.1/h (on-demand) ➤ No EBS support yet
 https://news.ycombinator.com/item?id=15808416
  10. Networking (AWS) Kubernetes-based deployment ➤ Container-to-container communication within a pod

    is allowed by default without much hassle. But communication across hosts (which crosses pods) is an entirely different matter ➤ n.b. AWS EKS is now available and in Preview
 https://aws.amazon.com/eks ➤ Kubernetes: Cluster Networking
 https://kubernetes.io/docs/concepts/cluster-administration/networking/
  11. Networking (AWS) Run a VPN between each container instance (ECS)

    / pod (Kube) ➤ Weave Net/Scope does it for you and there is a blog post about it
 https://aws.amazon.com/blogs/apn/architecting-microservices-using- weave-net-and-amazon-ec2-container-service/ ➤ Each container is started with two network interfaces, instead of one ➤ TCP for control traffic and UDP for data traffic ➤ Queries EC2 ASG for peer information at boot time
  12. root@cl-devel-01-default:/app# ifconfig eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:04 inet addr:172.17.0.4 Bcast:0.0.0.0

    Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:62389 errors:0 dropped:0 overruns:0 frame:0 TX packets:46229 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:15343682 (14.6 MiB) TX bytes:41305027 (39.3 MiB) ethwe Link encap:Ethernet HWaddr 76:12:7a:ab:9e:49 inet addr:10.32.0.4 Bcast:0.0.0.0 Mask:255.240.0.0 UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1 RX packets:39800 errors:0 dropped:0 overruns:0 frame:0 TX packets:39573 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2092168 (1.9 MiB) TX bytes:2194128 (2.0 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:5787 errors:0 dropped:0 overruns:0 frame:0 TX packets:5787 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:509045 (497.1 KiB) TX bytes:509045 (497.1 KiB)
  13. # hostname -I 172.17.0.4 10.32.0.4 # ip -f inet -o

    addr show dev ethwe \
 | grep -Po 'inet \K[\d.]+' 10.32.0.4
  14. Peer Discovery Multiple Viable Strategies ➤ Query AWS / Kube

    for outstanding nodes / containers / pods ➤ However, AWS IAM permissions for describeInstances is not granular ➤ Multicast gossip (now you have a private network, UDP works again) ➤ SELECT * FROM nodes; ➤ if you have a database, you can keep information in there so it is easy to expose that information elsewhere…
  15. Peer Discovery What to do with the list of peers?

    ➤ Feed it into libcluster for sure ➤ Write your own peer discovery strategy, if you desire ➤ bitwalker/libcluster
 https://github.com/bitwalker/libcluster
  16. Meta-Clustering Distributed Erlang
 http://www1.erlang.org/doc/reference_manual/distributed.html “Connections are by default transitive. If

    a node A connects to node B, and node B has a connection to node C, then node A will also try to connect to node C. This feature can be turned off by using the command line flag - connect_all false, see erl(1).”
  17. Meta-Clustering a.k.a. the art of sharding. See That’s a Billion

    with a B
 https://github.com/reedr/reedr/blob/master/slides/efsf2014-whatsapp- scaling.pdf
  18. 31 Routing Cluster 1 Cluster 2 service client {last,1} {last,2}

    {last,3} {last,4} {last,1} {last,2} {last,3} {last,4} Other cluster-local services wandist pg2 29 Meta-clustering DC1 main cluster DC2 main cluster DC1 mms cluster DC2 mms cluster global clusters 30 Topology DC1 main cluster DC1 mms cluster DC2 mms cluster Acct cluster DC1 DC2 6 System Overview chat chat chat chat chat chat chat chat chat chat chat chat Account Profile Push Group ... mms mms mms mms mms mms mms mms mms mms Phones Offline storage