BUILDING APPLICATIONS
with
DISTRIBUTED ERLANG
cmeik
the adventures of
hash(<<“author”>>, <<“Christopher Meiklejohn”>>)
Slide 2
Slide 2 text
who am i
CMEIK
Slide 3
Slide 3 text
who am i
CMEIK
Slide 4
Slide 4 text
who am i
CMEIK
distributed systems engineer
basho technologies
Slide 5
Slide 5 text
who am i
CMEIK
distributed systems engineer
basho technologies
researcher with the
syncfree project
Slide 6
Slide 6 text
what is
THE AGENDA
Slide 7
Slide 7 text
what is
THE AGENDA
(novice)
1. what is distributed erlang?
Slide 8
Slide 8 text
what is
THE AGENDA
(novice)
1. what is distributed erlang?
(erlanger)
2. where do i go from here?
Slide 9
Slide 9 text
what is
DISTRIBUTED ERLANG
Slide 10
Slide 10 text
what is
DISTRIBUTED ERLANG
EXTENSION TO HELP BUILD
DISTRIBUTED SYSTEMS
Slide 11
Slide 11 text
what is
DISTRIBUTED ERLANG
EXTENSION TO HELP BUILD
DISTRIBUTED SYSTEMS
Slide 12
Slide 12 text
what are the goals of a
DISTRIBUTED SYSTEM
Slide 13
Slide 13 text
what are the goals of a
DISTRIBUTED SYSTEM
“A distributed system is a software system in
which components located on networked
computers communicate and coordinate their
actions by passing messages. The components
interact with each other in order to achieve a
common goal. Three significant characteristics of
distributed systems are: concurrency of
components, lack of a global clock, and
independent failure of components.”
Wikipedia, “Distributed Computing”
Slide 14
Slide 14 text
what are some examples of a
DISTRIBUTED SYSTEM
Slide 15
Slide 15 text
what are some examples of a
DISTRIBUTED SYSTEM
distributed databases, riak, cassandra, etc.
Slide 16
Slide 16 text
what are some examples of a
DISTRIBUTED SYSTEM
distributed databases, riak, cassandra, etc.
master/slave in sql, multi-partition txns
Slide 17
Slide 17 text
what are some examples of a
DISTRIBUTED SYSTEM
distributed databases, riak, cassandra, etc.
master/slave in sql, multi-partition txns
web services via rest, soap, etc.
Slide 18
Slide 18 text
what are some examples of a
DISTRIBUTED SYSTEM
distributed databases, riak, cassandra, etc.
master/slave in sql, multi-partition txns
web services via rest, soap, etc.
mobile clients, internet of things
Slide 19
Slide 19 text
why are distributed systems
HARD
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 20
Slide 20 text
why are distributed systems
HARD
the network is reliable
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 21
Slide 21 text
why are distributed systems
HARD
the network is reliable
latency is zero
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 22
Slide 22 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 23
Slide 23 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
the network is secure
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 24
Slide 24 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
the network is secure
topology doesn’t change
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 25
Slide 25 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
the network is secure
topology doesn’t change
there is one administrator
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 26
Slide 26 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
the network is secure
topology doesn’t change
there is one administrator
transport cost is zero
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 27
Slide 27 text
why are distributed systems
HARD
the network is reliable
latency is zero
bandwidth is infinite
the network is secure
topology doesn’t change
there is one administrator
transport cost is zero
the network is homogeneous
L. Peter Deutsch, “Fallacies of Distributed Computing”
Slide 28
Slide 28 text
what is
DISTRIBUTED ERLANG
Slide 29
Slide 29 text
what is
DISTRIBUTED ERLANG
1
3
2 4
5
Slide 30
Slide 30 text
what is
DISTRIBUTED ERLANG
1
3
2 4
5
transparent
• message passing
• links
• monitors
transitive
connections
(except hidden nodes)
access control via
cookies
Slide 31
Slide 31 text
what is
DISTRIBUTED ERLANG
1
3
2 4
5
transparent
• message passing
• links
• monitors
transitive
connections
(except hidden nodes)
access control via
cookies
6
Slide 32
Slide 32 text
what is
DISTRIBUTED ERLANG
1
3
2 4
5
transparent
• message passing
• links
• monitors
transitive
connections
(except hidden nodes)
access control via
cookies
6
Slide 33
Slide 33 text
what is
DISTRIBUTED ERLANG
1
3
2 4
5
transparent
• message passing
• links
• monitors
transitive
connections
(except hidden nodes)
access control via
cookies
6
Slide 34
Slide 34 text
what is
DISTRIBUTED ERLANG
Slide 35
Slide 35 text
what is
DISTRIBUTED ERLANG
1 2
Slide 36
Slide 36 text
what is
DISTRIBUTED ERLANG
1
spawn functions
on other nodes
2
p1 p2
Slide 37
Slide 37 text
what is
DISTRIBUTED ERLANG
1
spawn functions
on other nodes
2
p1 p2
monitor or link
on other nodes
Slide 38
Slide 38 text
what is
DISTRIBUTED ERLANG
1
spawn functions
on other nodes
2
p1 p2
monitor or link
on other nodes
1 2
p1 p2
Slide 39
Slide 39 text
what is
DISTRIBUTED ERLANG
1
spawn functions
on other nodes
2
p1 p2
monitor or link
on other nodes
1 2
p1
Slide 40
Slide 40 text
what distributed erlang gets
RIGHT
Slide 41
Slide 41 text
what distributed erlang gets
RIGHT
assumes unreliable asynchronous message passing
Slide 42
Slide 42 text
what libraries come with
DISTRIBUTED ERLANG
Slide 43
Slide 43 text
what libraries come with
DISTRIBUTED ERLANG
global - global name registration and locking
Slide 44
Slide 44 text
what libraries come with
DISTRIBUTED ERLANG
global - global name registration and locking
pg2 - process group registry
Slide 45
Slide 45 text
what libraries come with
DISTRIBUTED ERLANG
global - global name registration and locking
pg2 - process group registry
mnesia - distributed transactions
Slide 46
Slide 46 text
what libraries come with
DISTRIBUTED ERLANG
global - global name registration and locking
pg2 - process group registry
mnesia - distributed transactions
net_kernel - erlang distributed networking kernel
Slide 47
Slide 47 text
what libraries come with
DISTRIBUTED ERLANG
global - global name registration and locking
pg2 - process group registry
mnesia - distributed transactions
net_kernel - erlang distributed networking kernel
rpc - remote procedure call services
Slide 48
Slide 48 text
what is
GLOBAL
Slide 49
Slide 49 text
what is
GLOBAL
global name registration and locking service
Slide 50
Slide 50 text
what is
GLOBAL
global name registration and locking service
shared state, replicated locally
Slide 51
Slide 51 text
what is
GLOBAL
global name registration and locking service
shared state, replicated locally
races under network partitions
Slide 52
Slide 52 text
what is
GLOBAL
global name registration and locking service
shared state, replicated locally
races under network partitions
provides ad-hoc resolution hook
Slide 53
Slide 53 text
what is
PG2
Slide 54
Slide 54 text
what is
PG2
distributed process group registry
Slide 55
Slide 55 text
what is
PG2
distributed process group registry
usually used for work partitioning
Slide 56
Slide 56 text
what is
PG2
distributed process group registry
usually used for work partitioning
races under network partitions
Slide 57
Slide 57 text
what is
PG2
distributed process group registry
usually used for work partitioning
races under network partitions
can lose values under network partitions
Slide 58
Slide 58 text
what is
PG2
distributed process group registry
usually used for work partitioning
races under network partitions
can lose values under network partitions
originally isis inspired, pg descendent
Slide 59
Slide 59 text
what is
MNESIA
Slide 60
Slide 60 text
transactional database in erlang
what is
MNESIA
Slide 61
Slide 61 text
transactional database in erlang
implemented using replicated ets tables
what is
MNESIA
Slide 62
Slide 62 text
transactional database in erlang
implemented using replicated ets tables
global transactions are *expensive*
what is
MNESIA
Slide 63
Slide 63 text
transactional database in erlang
implemented using replicated ets tables
global transactions are *expensive*
network partitions can cause values to be lost
what is
MNESIA
Slide 64
Slide 64 text
what is
NET_KERNEL
Slide 65
Slide 65 text
maintenance of network in distributed erlang
what is
NET_KERNEL
Slide 66
Slide 66 text
maintenance of network in distributed erlang
responsible for detecting failures
what is
NET_KERNEL
Slide 67
Slide 67 text
maintenance of network in distributed erlang
responsible for detecting failures
dropped tcp connections, network partitions
what is
NET_KERNEL
Slide 68
Slide 68 text
maintenance of network in distributed erlang
responsible for detecting failures
dropped tcp connections, network partitions
poor mechanism for cluster management
what is
NET_KERNEL
Slide 69
Slide 69 text
what is
RPC
Slide 70
Slide 70 text
remote procedure call services
what is
RPC
Slide 71
Slide 71 text
remote procedure call services
serialized execution of requests
what is
RPC
Slide 72
Slide 72 text
remote procedure call services
serialized execution of requests
call to a single node, or multi call
what is
RPC
Slide 73
Slide 73 text
remote procedure call services
serialized execution of requests
call to a single node, or multi call
synchronous programming pattern
what is
RPC
Slide 74
Slide 74 text
what are the
ANTI-PATTERNS*
Slide 75
Slide 75 text
what are the
ANTI-PATTERNS*
utilizing shared state
Slide 76
Slide 76 text
what are the
ANTI-PATTERNS*
utilizing shared state
reliance on physical time
Slide 77
Slide 77 text
what are the
ANTI-PATTERNS*
utilizing shared state
reliance on physical time
using predesignated masters
Slide 78
Slide 78 text
what are the
ANTI-PATTERNS*
utilizing shared state
reliance on physical time
using predesignated masters
treating the network as synchronous
Slide 79
Slide 79 text
what are the
ANTI-PATTERNS*
utilizing shared state
reliance on physical time
using predesignated masters
treating the network as synchronous
* also: distributed objects, guaranteed delivery mechanisms, distributed
serializable transactions, etc.
Slide 80
Slide 80 text
unfortunately these mechanisms are
NAIVE
so, what can do we and what have we
LEARNED
Slide 81
Slide 81 text
what about
CLUSTER MEMBERSHIP
Slide 82
Slide 82 text
what about
CLUSTER MEMBERSHIP
1
3
2
4
Slide 83
Slide 83 text
what about
CLUSTER MEMBERSHIP
fixed membership, a priori
1
3
2
4
Slide 84
Slide 84 text
what about
CLUSTER MEMBERSHIP
fixed membership, a priori
don’t tie to net_kernel,
net_ticktime
1
3
2
4
Slide 85
Slide 85 text
what about
CLUSTER MEMBERSHIP
fixed membership, a priori
don’t tie to net_kernel,
net_ticktime
store information locally,
gossip
1
3
2
4
Slide 86
Slide 86 text
what about
CLUSTER MEMBERSHIP
fixed membership, a priori
don’t tie to net_kernel,
net_ticktime
store information locally,
gossip
hyparview, plumtree, thicket
1
3
2
4
Slide 87
Slide 87 text
what about
FAILURE DETECTION
1
3
2
4
Slide 88
Slide 88 text
what about
FAILURE DETECTION
detection of delays vs.
failed nodes
1
3
2
4
Slide 89
Slide 89 text
what about
FAILURE DETECTION
detection of delays vs.
failed nodes
net_kernel, net_ticktime
1
3
2
4
Slide 90
Slide 90 text
what about
FAILURE DETECTION
detection of delays vs.
failed nodes
net_kernel, net_ticktime
the φ accrual
failure detector
1
3
2
4
Slide 91
Slide 91 text
what about
FAILURE DETECTION
detection of delays vs.
failed nodes
net_kernel, net_ticktime
the φ accrual
failure detector
swim: membership and
failure detector
1
3
2
4
Slide 92
Slide 92 text
what about
VALUE DIVERGENCE
Slide 93
Slide 93 text
what about
VALUE DIVERGENCE
replicated data can diverge
1 1
Slide 94
Slide 94 text
what about
VALUE DIVERGENCE
replicated data can diverge
1
3
2
1
Slide 95
Slide 95 text
what about
VALUE DIVERGENCE
replicated data can diverge
1
3
2
1
?
Slide 96
Slide 96 text
what about
VALUE DIVERGENCE
replicated data can diverge
identify concurrency
1
3
2
1
?
Slide 97
Slide 97 text
what about
VALUE DIVERGENCE
replicated data can diverge
lamport clock,
vector clocks,
version vectors,
wall clock
identify concurrency
1
3
2
1
?
Slide 98
Slide 98 text
what about
VALUE DIVERGENCE
replicated data can diverge
lamport clock,
vector clocks,
version vectors,
wall clock
identify concurrency
how to resolve?
lww vs. siblings vs. crdt
1
3
2
1
?
Slide 99
Slide 99 text
what about
DISTRIBUTION BUFFERS
Slide 100
Slide 100 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
Slide 101
Slide 101 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
Slide 102
Slide 102 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
large objects can cause
head-of-line blocking
Slide 103
Slide 103 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
large objects can cause
head-of-line blocking
move objects to
tcp sockets
Slide 104
Slide 104 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
large objects can cause
head-of-line blocking
move objects to
tcp sockets
increase distribution
port buffer size
Slide 105
Slide 105 text
what about
DISTRIBUTION BUFFERS
1 2
p1
p3
p2
large objects can cause
head-of-line blocking
move objects to
tcp sockets
increase distribution
port buffer size
beware unbounded
queues
Slide 106
Slide 106 text
what about
LEADER ELECTION
Slide 107
Slide 107 text
what about
LEADER ELECTION
1
2
3
4
Slide 108
Slide 108 text
what about
LEADER ELECTION
gen_leader assumes
fixed topology
1
2
3
4
Slide 109
Slide 109 text
what about
LEADER ELECTION
gen_leader assumes
fixed topology
use paxos or raft
1
2
3
4
Slide 110
Slide 110 text
what about
LEADER ELECTION
gen_leader assumes
fixed topology
use paxos or raft
gen_leader known
to deadlock
1
2
3
4
Slide 111
Slide 111 text
what about
MESSAGE FORMATS
Slide 112
Slide 112 text
what about
MESSAGE FORMATS
v1 v2
Slide 113
Slide 113 text
what about
MESSAGE FORMATS
mixed version clusters
reality in large systems
v1 v2
Slide 114
Slide 114 text
what about
MESSAGE FORMATS
mixed version clusters
reality in large systems
keep formats compatible
v1 v2
Slide 115
Slide 115 text
what about
MESSAGE FORMATS
mixed version clusters
reality in large systems
keep formats compatible
upgrade to new format
v1 v2
Slide 116
Slide 116 text
what about
MESSAGE DELIVERY
Slide 117
Slide 117 text
what about
MESSAGE DELIVERY
message delivery not
guaranteed with
failures
Slide 118
Slide 118 text
what about
MESSAGE DELIVERY
message delivery not
guaranteed with
failures
1
2
Slide 119
Slide 119 text
what about
MESSAGE DELIVERY
message delivery not
guaranteed with
failures
1
2
1
2
3
4
Slide 120
Slide 120 text
what about
MESSAGE DELIVERY
message delivery not
guaranteed with
failures
1
2
1
1
2
2
3
1 4
3
4
Slide 121
Slide 121 text
what can we take away from this
DISCUSSION
Slide 122
Slide 122 text
what are the
LESSONS
Slide 123
Slide 123 text
distributed erlang gets you part of the way
but, you still have to understand the problems
and the tradeoffs
what are the
LESSONS
Slide 124
Slide 124 text
distributed erlang gets you part of the way
but, you still have to understand the problems
and the tradeoffs
what are the
LESSONS
http://christophermeiklejohn.com/distributed/systems/
2013/07/12/readings-in-distributed-systems.html
Slide 125
Slide 125 text
what are some useful
ERLANG LIBRARIES
Slide 126
Slide 126 text
what are some useful
ERLANG LIBRARIES
riak_core: distributed systems toolkit
https://github.com/basho/riak_core
Slide 127
Slide 127 text
what are some useful
ERLANG LIBRARIES
riak_core: distributed systems toolkit
https://github.com/basho/riak_core
riak_ensemble: generic multi-paxos framework
https://github.com/basho/riak_ensemble
Slide 128
Slide 128 text
what are some useful
ERLANG LIBRARIES
riak_core: distributed systems toolkit
https://github.com/basho/riak_core
riak_ensemble: generic multi-paxos framework
https://github.com/basho/riak_ensemble
riak_dt: conflict-free replicated data types
https://github.com/basho/riak_dt
Slide 129
Slide 129 text
what are some useful
ERLANG LIBRARIES
riak_core: distributed systems toolkit
https://github.com/basho/riak_core
riak_ensemble: generic multi-paxos framework
https://github.com/basho/riak_ensemble
riak_dt: conflict-free replicated data types
https://github.com/basho/riak_dt
riak_test: riak_core testing framework
https://github.com/basho/riak_test
Slide 130
Slide 130 text
what are some useful
RESEARCH PROJECTS
Slide 131
Slide 131 text
what are some useful
RESEARCH PROJECTS
syncfree: large-scale computation on erlang
https://github.com/syncfree
Slide 132
Slide 132 text
what are some useful
RESEARCH PROJECTS
syncfree: large-scale computation on erlang
https://github.com/syncfree
release: large-scale erlang deployments
https://github.com/release-project/
Slide 133
Slide 133 text
what are some useful
RESEARCH PROJECTS
syncfree: large-scale computation on erlang
https://github.com/syncfree
release: large-scale erlang deployments
https://github.com/release-project/
paraphrase: parallel computation
https://github.com/paraphrase
Slide 134
Slide 134 text
what are some
PUBLICATIONS
Slide 135
Slide 135 text
what are some
PUBLICATIONS
Slide 136
Slide 136 text
what are some
PUBLICATIONS
Slide 137
Slide 137 text
what are some
PUBLICATIONS
Slide 138
Slide 138 text
what are some
PUBLICATIONS
Slide 139
Slide 139 text
what are some
PUBLICATIONS
Slide 140
Slide 140 text
what are some
PUBLICATIONS
pid reuse
unreliable failure detectors
unreliable delivery of messages
…and more!