Slide 1

Slide 1 text

Multi-AZ Clustering for Elixir+ Phoenix Evadne Wu github.com/evadne [email protected] // @evadne last updated 29 November 2017

Slide 2

Slide 2 text

Scope In Scope ➤ Establishing and maintaining an Erlang Distribution cluster in AWS Out of Scope ➤ Using distribution to implement certain requirements

Slide 3

Slide 3 text

Precepts Networking
 i.e. how to ensure peers can talk with each other Peer Discovery
 i.e. how to identify and connect to peers Meta-Clustering
 i.e. how to avoid building a huge cluster, when you can have small ones

Slide 4

Slide 4 text

Networking A functional networking solution only ensures that potential peers in a cluster can communicate with each other, but is not responsible for formation nor maintenance of a cluster.

Slide 5

Slide 5 text

Networking The solution is highly dependent on deployment topology: ➤ a) direct deployment with hosts or virtual machines; ➤ b) deployment with containers; ➤ c) other forms of topology

Slide 6

Slide 6 text

Networking (AWS) Direct deployment on EC2 (long-running releases) ➤ If hosts face the internet directly, consider plugging a separate ENI into the host, and run cluster control traffic through different subnets. ➤ Augment Security Groups + Route Tables with on-host iptables rules ➤ If hosts are fronted by a load balancer then this is less of a concern.

Slide 7

Slide 7 text

Networking (AWS) Multi-site deployment ➤ Consider using AWS VPC Peering if deploying solely on EC2 ➤ But using Erlang Distribution over WAN can be questionable ➤ See WhatsApp’s wandist ➤ Alternatively, run a host-to-host VPN between each host ➤ This establishes a virtual switch for your applications

Slide 8

Slide 8 text

Networking (AWS) ECS-based deployment ➤ Must review Docker networking modes supported by ECS ➤ None ➤ Bridged — container given virtual IP ➤ Host — container uses ports on host interface ➤ VPC (New!) — each container gets an ENI

Slide 9

Slide 9 text

Networking (AWS) ECS-based deployment ➤ Depending on desired density (number of containers per host), you may be able to use the VPC networking mode, which gives each container its own ENI and therefore its own privately routable IP ➤ As fast as host networking, more isolation ➤ There is a limit on how many ENIs can be attached to each container host, so this is not for you (currently) if you run many containers per host

Slide 10

Slide 10 text

Networking (AWS) News: AWS Fargate is out today ➤ Think of this as ECS without need to manage instances ➤ vCPU: $0.0506/h; Memory: $0.0127/h ➤ This is however much more expensive than instances ➤ m4.large @ 2vCPU + 8GB = app. $0.025/h (spot) or $0.1/h (on-demand) ➤ No EBS support yet
 https://news.ycombinator.com/item?id=15808416

Slide 11

Slide 11 text

Networking (AWS) Kubernetes-based deployment ➤ Container-to-container communication within a pod is allowed by default without much hassle. But communication across hosts (which crosses pods) is an entirely different matter ➤ n.b. AWS EKS is now available and in Preview
 https://aws.amazon.com/eks ➤ Kubernetes: Cluster Networking
 https://kubernetes.io/docs/concepts/cluster-administration/networking/

Slide 12

Slide 12 text

Networking (AWS) Run a VPN between each container instance (ECS) / pod (Kube) ➤ Weave Net/Scope does it for you and there is a blog post about it
 https://aws.amazon.com/blogs/apn/architecting-microservices-using- weave-net-and-amazon-ec2-container-service/ ➤ Each container is started with two network interfaces, instead of one ➤ TCP for control traffic and UDP for data traffic ➤ Queries EC2 ASG for peer information at boot time

Slide 13

Slide 13 text

root@cl-devel-01-default:/app# ifconfig eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:04 inet addr:172.17.0.4 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:62389 errors:0 dropped:0 overruns:0 frame:0 TX packets:46229 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:15343682 (14.6 MiB) TX bytes:41305027 (39.3 MiB) ethwe Link encap:Ethernet HWaddr 76:12:7a:ab:9e:49 inet addr:10.32.0.4 Bcast:0.0.0.0 Mask:255.240.0.0 UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1 RX packets:39800 errors:0 dropped:0 overruns:0 frame:0 TX packets:39573 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2092168 (1.9 MiB) TX bytes:2194128 (2.0 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:5787 errors:0 dropped:0 overruns:0 frame:0 TX packets:5787 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:509045 (497.1 KiB) TX bytes:509045 (497.1 KiB)

Slide 14

Slide 14 text

# hostname -I 172.17.0.4 10.32.0.4 # ip -f inet -o addr show dev ethwe \
 | grep -Po 'inet \K[\d.]+' 10.32.0.4

Slide 15

Slide 15 text

Peer Discovery Multiple Viable Strategies ➤ Query AWS / Kube for outstanding nodes / containers / pods ➤ However, AWS IAM permissions for describeInstances is not granular ➤ Multicast gossip (now you have a private network, UDP works again) ➤ SELECT * FROM nodes; ➤ if you have a database, you can keep information in there so it is easy to expose that information elsewhere…

Slide 16

Slide 16 text

Peer Discovery What to do with the list of peers? ➤ Feed it into libcluster for sure ➤ Write your own peer discovery strategy, if you desire ➤ bitwalker/libcluster
 https://github.com/bitwalker/libcluster

Slide 17

Slide 17 text

Meta-Clustering Distributed Erlang
 http://www1.erlang.org/doc/reference_manual/distributed.html “Connections are by default transitive. If a node A connects to node B, and node B has a connection to node C, then node A will also try to connect to node C. This feature can be turned off by using the command line flag - connect_all false, see erl(1).”

Slide 18

Slide 18 text

Meta-Clustering a.k.a. the art of sharding. See That’s a Billion with a B
 https://github.com/reedr/reedr/blob/master/slides/efsf2014-whatsapp- scaling.pdf

Slide 19

Slide 19 text

31 Routing Cluster 1 Cluster 2 service client {last,1} {last,2} {last,3} {last,4} {last,1} {last,2} {last,3} {last,4} Other cluster-local services wandist pg2 29 Meta-clustering DC1 main cluster DC2 main cluster DC1 mms cluster DC2 mms cluster global clusters 30 Topology DC1 main cluster DC1 mms cluster DC2 mms cluster Acct cluster DC1 DC2 6 System Overview chat chat chat chat chat chat chat chat chat chat chat chat Account Profile Push Group ... mms mms mms mms mms mms mms mms mms mms Phones Offline storage