Investigating the Scalability Limits of Distributed Erlang

Investigating the Scalability Limits of Distributed Erlang Amir Ghaffari Thirteenth
ACM SIGPLAN Erlang Workshop Göteborg - September 05, 2014 1

Outline • RELEASE Project • The Design and Implementation of
DE-Bench • Benchmark’s Results o Global Commands o Data Size o Data Size & Computation Time o Server Processes • Conclusion & Future work 2

RELEASE project • RELEASE is an European project aiming to
scale Erlang onto commodity architectures with 100,000 cores. 3

Why Erlang? Commercial applications • Facebook chat backend • T-Mobile
advanced call control services • Ericsson AXD301 ATM switch • Riak, CouchDB NoSQL DBMSs This popularity is due to • share-nothing concurrency • asynchronous message passing based on the actor model • process location transparency • fault tolerance • … 4

RELEASE project The RELEASE consortium work at following levels: 
Virtual machine  Language  Scalable Computation model  Scalable In-memory data structures  Scalable Persistent data structures  Infrastructure levels  Proﬁling and refactoring tools 5

Requirement To scale distributed Erlang we needed 6 • To
identify the scalability bottlenecks of distributed Erlang • Thu, the need for benchmarking is obvious • But, there was not such a tool and nobody has done it before • A scalable benchmarking tools for large scale architecture (with hundreds of nodes & thousands of cores)

DE-Bench So, we have developed our own tool! 7 •
DE-Bench stands for “Distributed Erlang Benchmark” • DE-Bench is based on Basho Bench, an open source benchmarking tool for Riak

DE-Bench 8 For the sake of scalability, DE-Bench has a
P2P design: • No central coordination or synchronisation • No single point of failure • All nodes perform the same role independently

Hosts and Nodes Organization 9 •Multiple nodes per host •Fully
connected •One DE-Bench per node

How DE-Bench Works 10 •Erlang supervision tree : reliability and
fault-tolerance •8 cores on each node, and so 40 worker processes on each node •CSV files are stored on the local disk of each node to avoid disk access contention

Distributed Erlang Commands P2P commands • a function with tunable
argument size and computation time is run on a remote node. 11 • A function with argument size X bytes is called. • Then, a non-tail recursive function is run on the target node for Y microseconds. • Finally, the argument is returned to the source node as result.

Distributed Erlang Commands Local Commands • Just the local node
gets involved and no communicate with other nodes 12 Global Commands • All nodes in cluster get involved • result will be ready once the command runs successfully on all nodes.

Distributed Erlang Commands P2P commands • spawn(Node, Fun): asynchronously calls
a function at a remote node • RPC(Node, Fun): synchronously calls a function at a remote node • Server Process Call: a synchronous call to a generic server process (gen_server) or a finite state machine process (gen_fsm) 13 Target node Spawn: a new process is created RPC and Server Process: process exists Source node

Distributed Erlang Commands Local commands: register_name(Name, Pid): associates a name
with a process identifier (pid). Unregister_name(Name): removes a registered name, associated with a pid. whereis(Name): returns the pid registered with a specific name. global:whereis(Name): returns the pid associated with a specific name globally. 14

Distributed Erlang Commands Global commands: global:register_name(Name, Pid): globally associates a
name with a pid. global:unregister_name(Name): removes a globally registered name from all nodes in the cluster. 15 Register Unregister Erlang VM Erlang VM Erlang VM Global name table Global name table Global name table Global name table Erlang VM

A Worker Process Internal State 16

Platform & Settings 17 •The benchmark was carried on Kalkyl
cluster at UPPMAX • 348 nodes with 2784 64-bit processor cores (8 cores per node) • 24GB RAM memory and 250GB hard disk •Red Hat Enterprise Linux 6.0 •Erlang version R16B has been used in all our experiments.

Benchmarking Results 18 • The benchmark is conducted on 10,
20, 30, 40, 50, 60, 70, 80, 90, and 100-node clusters • All the experiments run for 5 minutes. • one Erlang VM on each host and as always one DE-Bench instance on each VM. • CSV files from all participating nodes are aggregated to find out the total throughput and failures.

Global Commands 19 Commands that have been used in this
benchmark: • spawn and RPC with 10 bytes argument size and 10 microseconds computation time • Global commands, global:register_name and global:unregister_name • Local commands: global:whereis(Name)

Frequency of Global Commands 20

Latency of Global Commands 21

Data Size 22 spawn and RPC with •10, 100, 1000,
and 10,000 bytes argument size •10 microseconds computation time

Data Size & Computation Time 23 spawn and RPC with
•1000 bytes argument size •1000 microseconds computation time scales linearly up to 150 nodes and 1200 cores !

Latency of P2P Commands 24 RPC latency raises as cluster
size grows Why? How does RPC work? What is the difference between RPC & Spawn?

How does RPC work? 25 There is a gen_server process
(called rex) on each Erlang VM In addition to user applications, RPC is also used by many built-in OTP modules So, it can be overloaded as a shared service

Riak NoSQL DBMS 26 Our experience with Riak1.1.1 shows an
overloaded server process limits the scalability of Riak Do server processes have scalability limitation(s)?

Server Processes 27 •spawn and RPC, with 10 bytes argument
size and 1 microsecond computation time. •gen_server and gen_fsm calls with 10 bytes argument size and 1 microsecond computation time.

Latency of Server Processes 28 •Server processes scale well and
they have the lowest latency

Conclusion  We have demonstrated that global commands are bottlenecks
for the scalability of distributed Erlang. We improved this limitation by grouping nodes in smaller partitions. Our results reveal that the latency of RPC calls rises as cluster size grows We have shown that server processes scale well and they have the lowest latency among all P2P commands We are currently developing other scalable benchmarking applications to run on a larger architecture (i.e. Blue Gene/Q system). 29

30 Thanks!

Investigating the Scalability Limits of Distrib...

Investigating the Scalability Limits of Distributed Erlang

Amir Ghaffari

More Decks by Amir Ghaffari

Other Decks in Programming

Featured

Transcript