Erlang: Building Blocks for Global Distributed Systems Sean Cribbs @seancribbs Chicago ACM 11 February 2015

About Me Senior Engineer at Basho, Makers of Riak Erlanger since 2008 Distributed Systems, Web Architecture, Compilers

Outline Why Erlang? Working in Erlang Basics OTP Runtime Building Distributed Systems in Erlang Erlang Distributed Systems in Industry Open Source Services and Proprietary

Section 1 Why Erlang?

Context Ericsson CS Lab, mid-late 1980s PLEX (proprietary) and C Complicated and error-prone

Requirements Isolate faults (bugs) Limit downtime Soft-realtime Simple

Requirements Fault isolation All software will have bugs! Don't share memory, send messages Treat values as immutable "Let it crash"

Requirements Fault isolation All software will have bugs! Don't share memory, send messages Treat values as immutable "Let it crash" Limit downtime Watch components Restart after failures Don't retry forever Live console interaction Load new code without restarting

Requirements Soft-realtime Low-latency Eager evaluation Prevent starvation via pre-emption Virtual machine and emulator

Requirements Soft-realtime Low-latency Eager evaluation Prevent starvation via pre-emption Virtual machine and emulator Simple High-level Functional / declarative Simple, composable data types Strong abstractions around edges

Erlang begins... Named after Agner Krarup Erlang and short for "Ericsson Language" Joe Armstrong, Mike Williams, Bjarne Däcker, Robert Virding Initial versions in Prolog, later reimplemented in C

Erlang begins... Named after Agner Krarup Erlang and short for "Ericsson Language" Joe Armstrong, Mike Williams, Bjarne Däcker, Robert Virding Initial versions in Prolog, later reimplemented in C YouTube: "Erlang the Movie"

AXD-301 Erlang Success Story AXE series was colossal failure AXD-301 started in 1996 3MLoC: Erlang 500KLoC: C 13KLoC: Java Separate user plane from control plane BT reports nine 9s

Post AXD-301 After AXD-301 launch, Ericsson banned Erlang from internal projects. Moved to open-source shortly after (1999?). After open-source: Distributed Erlang Binaries & bit-syntax Async threads HiPE & Dialyzer SMP Native functions (NIFs) Maps Dirty schedulers

Section 2 Working in Erlang

Working in Erlang Live coding time!

Key Takeaways Simple datatypes Dynamic typing Pattern matching Immutable data Functional Cheap processes Message-passing Hot code-loading

OTP Runtime loop(V) -> receive {set, X} -> loop(X); {get, Pid} -> Pid ! V, loop(V); end.

OTP Runtime loop(V) -> receive {set, X} -> loop(X); {get, Pid} -> Pid ! V, loop(V); end. Problems Manual tail-call over receive Limited extensibility Difficult to inspect externally No interface for caller

OTP Runtime Patterns Server Finite State Machine Event handler Supervisor Application Release -module(myserver). -behavior(gen_server). init([]) -> {ok, 0}. handle_call(get, From, Value) -> {reply, Value, Value}. handle_cast({set, New}, _Value) -> {noreply, New}. handle_info(_Msg, Value) -> {noreply, Value}.

OTP Runtime Libraries Data structures Embedded databases (ets, dets, mnesia) Operating system services Input, output, encodings Monitoring and logging External interfaces (Java, C) Internet Services (HTTP, FTP, SSH) GUI toolkits (wx, gs) Tracing and debugging Unit and Functional Testing

Section 3 Building Distributed Systems in Erlang

Why Erlang for Distributed Systems? A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. – Leslie Lamport

Why Erlang for Distributed Systems? Fault-tolerance from low-level to high-level Great networking support Distributed Erlang (location transparency) Uniform application structure and patterns Easy to express solutions to distributed problems

Why Erlang for Distributed Systems? Fault-tolerance from low-level to high-level Great networking support Distributed Erlang (location transparency) Uniform application structure and patterns Easy to express solutions to distributed problems Caveats Queue management is hard (unbounded) Messages can be lost! Best components often third-party or RYO Ericsson is super-conservative

Pattern: Survey Survey pattern FSM stages

Pattern: Survey -module(survey_fsm). -behaviour(gen_fsm). -record(state, {caller, nodes=[], workers=[], replies=[]}). start_link(Nodes) -> gen_fsm:start_link(?MODULE, [self(), Nodes], []). init([Caller, Nodes]) -> {ok, distribute, #state{caller=Caller, nodes=Nodes}, 0}.

Pattern: Survey distribute(timeout, #state{nodes=Nodes}=State) -> Workers = [ spawn(Node, ?MODULE, do_work, [self()]) || Node <- Nodes ], {next_state, collect, State#state{workers=Workers}}. do_work(FSM) -> gen_fsm:send_event(FSM, {reply, crypto:rand_bytes(10)}).

Pattern: Survey collect({reply, Result}, #state{workers=Workers, replies=Replies0}=State0) -> Replies = [Result|Replies0], State = State0#state{replies=Replies}, if length(Replies) == length(Workers) -> {next_state, finish, State, 0}; true -> {next_state, collect, State} end.

Pattern: Survey finish(timeout, #state{caller=Caller, replies=Replies}=State) -> Caller ! {survey, Replies}, {stop, normal, State}.

Section 4 Erlang Distributed Systems in Industry

Riak & Riak Core Dynamo-like Key-Value store HA and SC modes CRDTs Search, Secondary Indexes, MapReduce Multi-Datacenter (license only) Riak Core Dynamo, abstracted Cluster membership Partition ownership Virtual nodes (vnodes) Handoff Coverage planning Cluster metadata over gossip

Riak CS S3 and Swift interface Block storage over Riak Usage accounting Remote fetch (over licensed MDC)

Disco MapReduce system, like Hadoop Originally developed at Nokia Write jobs in Python, Erlang distributes them

Project FiFo Cloud orchestration for SmartOS (Illumos) "Private cloud" / IaaS Uses Riak Core for some components Includes LeoFS for storage (S3-like) Multi-datacenter capability

WhatsApp Mobile messaging app Recently purchased by Facebook for $19Bn Scaled to 2 Million connections per machine! Originally based on ejabberd, but quickly became custom

OpenX ® Online Advertising Network Real-time Bidding Impressions Ad Delivery Monitoring Also use Riak

Chef formerly Opscode Configuration Management, much code in Ruby Web Services in Erlang: Chef Server 12 Analytics Other projects in the works

Thanks Francesco Cesarini, Erlang Solutions Heinz Gies, Project FiFo Rick Reed, WhatsApp Anthony Molinaro, OpenX Joe DeVivo, Chef