Building Applications with Distributed Erlang

BUILDING APPLICATIONS with DISTRIBUTED ERLANG cmeik the adventures of hash(<<“author”>>,
<<“Christopher Meiklejohn”>>)

who am i CMEIK

who am i CMEIK distributed systems engineer basho technologies

who am i CMEIK distributed systems engineer basho technologies researcher
with the syncfree project

what is THE AGENDA

what is THE AGENDA (novice) 1. what is distributed erlang?

what is THE AGENDA (novice) 1. what is distributed erlang?
(erlanger) 2. where do i go from here?

what is DISTRIBUTED ERLANG

what is DISTRIBUTED ERLANG EXTENSION TO HELP BUILD DISTRIBUTED SYSTEMS

what are the goals of a DISTRIBUTED SYSTEM

what are the goals of a DISTRIBUTED SYSTEM “A distributed
system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.” Wikipedia, “Distributed Computing”

what are some examples of a DISTRIBUTED SYSTEM

what are some examples of a DISTRIBUTED SYSTEM distributed databases,
riak, cassandra, etc.

riak, cassandra, etc. master/slave in sql, multi-partition txns

riak, cassandra, etc. master/slave in sql, multi-partition txns web services via rest, soap, etc.

riak, cassandra, etc. master/slave in sql, multi-partition txns web services via rest, soap, etc. mobile clients, internet of things

why are distributed systems HARD L. Peter Deutsch, “Fallacies of
Distributed Computing”

why are distributed systems HARD the network is reliable L.
Peter Deutsch, “Fallacies of Distributed Computing”

why are distributed systems HARD the network is reliable latency
is zero L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite the network is secure L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite the network is secure topology doesn’t change L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite the network is secure topology doesn’t change there is one administrator L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite the network is secure topology doesn’t change there is one administrator transport cost is zero L. Peter Deutsch, “Fallacies of Distributed Computing”

is zero bandwidth is infinite the network is secure topology doesn’t change there is one administrator transport cost is zero the network is homogeneous L. Peter Deutsch, “Fallacies of Distributed Computing”

what is DISTRIBUTED ERLANG 1 3 2 4 5

what is DISTRIBUTED ERLANG 1 3 2 4 5 transparent
• message passing • links • monitors transitive connections (except hidden nodes) access control via cookies

what is DISTRIBUTED ERLANG 1 3 2 4 5 transparent
• message passing • links • monitors transitive connections (except hidden nodes) access control via cookies 6

what is DISTRIBUTED ERLANG 1 2

what is DISTRIBUTED ERLANG 1 spawn functions on other nodes
2 p1 p2

2 p1 p2 monitor or link on other nodes

2 p1 p2 monitor or link on other nodes 1 2 p1 p2

2 p1 p2 monitor or link on other nodes 1 2 p1

what distributed erlang gets RIGHT

what distributed erlang gets RIGHT assumes unreliable asynchronous message passing

what libraries come with DISTRIBUTED ERLANG

what libraries come with DISTRIBUTED ERLANG global - global name
registration and locking

registration and locking pg2 - process group registry

registration and locking pg2 - process group registry mnesia - distributed transactions

registration and locking pg2 - process group registry mnesia - distributed transactions net_kernel - erlang distributed networking kernel

registration and locking pg2 - process group registry mnesia - distributed transactions net_kernel - erlang distributed networking kernel rpc - remote procedure call services

what is GLOBAL

what is GLOBAL global name registration and locking service

what is GLOBAL global name registration and locking service shared
state, replicated locally

state, replicated locally races under network partitions

state, replicated locally races under network partitions provides ad-hoc resolution hook

what is PG2

what is PG2 distributed process group registry

what is PG2 distributed process group registry usually used for
work partitioning

work partitioning races under network partitions

work partitioning races under network partitions can lose values under network partitions

work partitioning races under network partitions can lose values under network partitions originally isis inspired, pg descendent

what is MNESIA

transactional database in erlang what is MNESIA

transactional database in erlang implemented using replicated ets tables what
is MNESIA

transactional database in erlang implemented using replicated ets tables global
transactions are *expensive* what is MNESIA

transactional database in erlang implemented using replicated ets tables global
transactions are *expensive* network partitions can cause values to be lost what is MNESIA

what is NET_KERNEL

maintenance of network in distributed erlang what is NET_KERNEL

maintenance of network in distributed erlang responsible for detecting failures
what is NET_KERNEL

dropped tcp connections, network partitions what is NET_KERNEL

dropped tcp connections, network partitions poor mechanism for cluster management what is NET_KERNEL

what is RPC

remote procedure call services what is RPC

remote procedure call services serialized execution of requests what is
RPC

remote procedure call services serialized execution of requests call to
a single node, or multi call what is RPC

remote procedure call services serialized execution of requests call to
a single node, or multi call synchronous programming pattern what is RPC

what are the ANTI-PATTERNS*

what are the ANTI-PATTERNS* utilizing shared state

what are the ANTI-PATTERNS* utilizing shared state reliance on physical
time

time using predesignated masters

time using predesignated masters treating the network as synchronous

time using predesignated masters treating the network as synchronous * also: distributed objects, guaranteed delivery mechanisms, distributed serializable transactions, etc.

unfortunately these mechanisms are NAIVE so, what can do we
and what have we LEARNED

what about CLUSTER MEMBERSHIP

what about CLUSTER MEMBERSHIP 1 3 2 4

what about CLUSTER MEMBERSHIP fixed membership, a priori 1 3
2 4

what about CLUSTER MEMBERSHIP fixed membership, a priori don’t tie
to net_kernel, net_ticktime 1 3 2 4

to net_kernel, net_ticktime store information locally, gossip 1 3 2 4

to net_kernel, net_ticktime store information locally, gossip hyparview, plumtree, thicket 1 3 2 4

what about FAILURE DETECTION 1 3 2 4

what about FAILURE DETECTION detection of delays vs. failed nodes
1 3 2 4

net_kernel, net_ticktime 1 3 2 4

net_kernel, net_ticktime the φ accrual failure detector 1 3 2 4

net_kernel, net_ticktime the φ accrual failure detector swim: membership and failure detector 1 3 2 4

what about VALUE DIVERGENCE

what about VALUE DIVERGENCE replicated data can diverge 1 1

2 1

2 1 ?

what about VALUE DIVERGENCE replicated data can diverge identify concurrency
1 3 2 1 ?

what about VALUE DIVERGENCE replicated data can diverge lamport clock,
vector clocks, version vectors, wall clock identify concurrency 1 3 2 1 ?

what about VALUE DIVERGENCE replicated data can diverge lamport clock,
vector clocks, version vectors, wall clock identify concurrency how to resolve? lww vs. siblings vs. crdt 1 3 2 1 ?

what about DISTRIBUTION BUFFERS

what about DISTRIBUTION BUFFERS 1 2 p1 p3 p2

what about DISTRIBUTION BUFFERS 1 2 p1 p3 p2 large
objects can cause head-of-line blocking

objects can cause head-of-line blocking move objects to tcp sockets

objects can cause head-of-line blocking move objects to tcp sockets increase distribution port buffer size

objects can cause head-of-line blocking move objects to tcp sockets increase distribution port buffer size beware unbounded queues

what about LEADER ELECTION

what about LEADER ELECTION 1 2 3 4

what about LEADER ELECTION gen_leader assumes fixed topology 1 2
3 4

what about LEADER ELECTION gen_leader assumes fixed topology use paxos
or raft 1 2 3 4

what about LEADER ELECTION gen_leader assumes fixed topology use paxos
or raft gen_leader known to deadlock 1 2 3 4

what about MESSAGE FORMATS

what about MESSAGE FORMATS v1 v2

what about MESSAGE FORMATS mixed version clusters reality in large
systems v1 v2

systems keep formats compatible v1 v2

systems keep formats compatible upgrade to new format v1 v2

what about MESSAGE DELIVERY

what about MESSAGE DELIVERY message delivery not guaranteed with failures

1 2

1 2 1 2 3 4

1 2 1 1 2 2 3 1 4 3 4

what can we take away from this DISCUSSION

what are the LESSONS

distributed erlang gets you part of the way but, you
still have to understand the problems and the tradeoffs what are the LESSONS

distributed erlang gets you part of the way but, you
still have to understand the problems and the tradeoffs what are the LESSONS http://christophermeiklejohn.com/distributed/systems/ 2013/07/12/readings-in-distributed-systems.html

what are some useful ERLANG LIBRARIES

what are some useful ERLANG LIBRARIES riak_core: distributed systems toolkit
https://github.com/basho/riak_core

https://github.com/basho/riak_core riak_ensemble: generic multi-paxos framework https://github.com/basho/riak_ensemble

https://github.com/basho/riak_core riak_ensemble: generic multi-paxos framework https://github.com/basho/riak_ensemble riak_dt: conflict-free replicated data types https://github.com/basho/riak_dt

https://github.com/basho/riak_core riak_ensemble: generic multi-paxos framework https://github.com/basho/riak_ensemble riak_dt: conflict-free replicated data types https://github.com/basho/riak_dt riak_test: riak_core testing framework https://github.com/basho/riak_test

what are some useful RESEARCH PROJECTS

what are some useful RESEARCH PROJECTS syncfree: large-scale computation on
erlang https://github.com/syncfree

erlang https://github.com/syncfree release: large-scale erlang deployments https://github.com/release-project/

erlang https://github.com/syncfree release: large-scale erlang deployments https://github.com/release-project/ paraphrase: parallel computation https://github.com/paraphrase

what are some PUBLICATIONS

what are some PUBLICATIONS pid reuse unreliable failure detectors unreliable
delivery of messages …and more!

do you have any questions? THANKS!

Building Applications with Distributed Erlang

Building Applications with Distributed Erlang

More Decks by Christopher Meiklejohn

Other Decks in Programming

Featured

Transcript