Slide 1

Slide 1 text

Building Distributed System with Akka Anil Wadghule
 @anildigital

Slide 2

Slide 2 text

!✈#

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Current scene in internet

Slide 7

Slide 7 text

Size of the internet

Slide 8

Slide 8 text

Market forces leading to change Concurrent connections •“Internet of Things”, mobile devices Big data •Size of data is overwhelming our ability to manage it Response times •Real-time results (e.g. analytics) with sub-second latencies

Slide 9

Slide 9 text

Physical factors leading to change •Expensive hardware → cheap hardware •A single machine → cluster of machines •A single core → multiple cores •Slow networks → fast networks •Small data snapshots → big data, streams of data

Slide 10

Slide 10 text

Modern Application Requirements High Availability Fault Tolerance Scalability

Slide 11

Slide 11 text

Google going down? Accepted?

Slide 12

Slide 12 text

Google going down? Accepted? Highly Available

Slide 13

Slide 13 text

Millions people visit Google

Slide 14

Slide 14 text

Millions people visit Google Highly Scalable

Slide 15

Slide 15 text

Facebook doesn’t crash in afternoon

Slide 16

Slide 16 text

Fault Tolerant Facebook doesn’t crash in afternoon

Slide 17

Slide 17 text

Reactive Principles

Slide 18

Slide 18 text

What is a Distributed System?

Slide 19

Slide 19 text

Your computer?

Slide 20

Slide 20 text

Your computer?

Slide 21

Slide 21 text

Your computer? NO.

Slide 22

Slide 22 text

No global clock Distributed Systems

Slide 23

Slide 23 text

You have a distributed system, when the crash of a computer you’ve never heard of, stops you from getting any work done. Leslie Lamport: A Guide to Building Dependable Distributed Systems.

Slide 24

Slide 24 text

A collection of independent computers that appear to its users as one computer. Tanenbaum and Steen: Distributed Systems, Principles and Paradigms

Slide 25

Slide 25 text

Collection of interconnected nodes

Slide 26

Slide 26 text

Distributed Architecture

Slide 27

Slide 27 text

8 Fallacies of Distributed System •The network is reliable. •Latency is zero. •Bandwidth is infinite. •The network is secure. •Topology doesn't change. •There is one administrator. •Transport cost is zero. •The network is homogeneous.

Slide 28

Slide 28 text

Distributed System Examples •amazon.com •Cassandara database (unlike local Sqlite db)

Slide 29

Slide 29 text

How do we scale?

Slide 30

Slide 30 text

Coffeeshop Example

Slide 31

Slide 31 text

Regular small app o o One Americano Please! Akira

Slide 32

Slide 32 text

Regular small app o o Akira - Americano Head
 Barista Assistant
 Baristas

Slide 33

Slide 33 text

Starbucks goes popular •Starbucks needs to serve more customers now

Slide 34

Slide 34 text

Scaling Strategies •Read Replication •Sharding •Consistent Hashing

Slide 35

Slide 35 text

Read Replication

Slide 36

Slide 36 text

o o Akira - Americano Akira - Americano Akira - Americano Akira - Americano Akira - Americano Read Replication Akira - Americano Head
 Barista Assistant 
 Baristas

Slide 37

Slide 37 text

Issues for Read Replication •Complexity •Consistency

Slide 38

Slide 38 text

Sharding

Slide 39

Slide 39 text

Sharding •Add more writers (i.e. Head Baristas) •Split orders with some key (i.e. Customers name )

Slide 40

Slide 40 text

Sharding o o Akira - Americano Akira - Americano Akira - Americano Akira - Americano o o Akira - Americano Koichi - Cappuccino Akira to Chang
 Barista 1 Jim to Lorenzo
 Barista 2 A-C J-L Akira - Americano Akira - Americano Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino

Slide 41

Slide 41 text

Issues with Sharding •Limited data model •More complexity •Limited data access patterns •Only good for certain kind of applications e.g. SAAS apps

Slide 42

Slide 42 text

Consistent Hashing

Slide 43

Slide 43 text

Consistent Hashing and Random Trees Karger et al. at MIT, 1997

Slide 44

Slide 44 text

0000 2000 4000 6000 8000 10000 12000 14000 Akira-Americano

Slide 45

Slide 45 text

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano

Slide 46

Slide 46 text

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72 > 8000

Slide 47

Slide 47 text

What if node crashes?

Slide 48

Slide 48 text

0000 2000 4000 6000 8000 10000 12000 14000 Node crashes 9F72-Americano

Slide 49

Slide 49 text

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72-Americano 9F72-Americano N = 3
 Replication Factor

Slide 50

Slide 50 text

Consistency formula R + W > N • N = Total number of replicas (e.g. 3) • W = Number of replicas acknowledge my update • R = Number of replicas that agree on read

Slide 51

Slide 51 text

When to use consistent hashing? •Scale •Transactional data (Business transactions .. not ACID) Data which changes a lot •Always available

Slide 52

Slide 52 text

CAP Theorem

Slide 53

Slide 53 text

Consistency Availability Partition Tolerance

Slide 54

Slide 54 text

Consistency

Slide 55

Slide 55 text

Availability

Slide 56

Slide 56 text

Partition Tolerance

Slide 57

Slide 57 text

CAP Theorem •Partitioning can’t be negotiated. It’s reality. •You have to compromise Availability or Consistency

Slide 58

Slide 58 text

Distributed Transactions

Slide 59

Slide 59 text

ACID? Distributed Transactions

Slide 60

Slide 60 text

Ordering Coffee •Receive Order •Process Payment •Enqueue order (mark the cup) •Make Coffee •Deliver Drink

Slide 61

Slide 61 text

Why split the work? •Parallelization •Uneven workloads

Slide 62

Slide 62 text

What could go wrong? •Payment failure •Insufficient resources •Equipment failure •Worker failure •Consumer failure

Slide 63

Slide 63 text

Response to failure? •Write-off (throw it out) •Retry (Typical) •Compensating action

Slide 64

Slide 64 text

Distributed Transactions •How can we design a coffeeshop with atomic transaction?

Slide 65

Slide 65 text

Distributed Computation

Slide 66

Slide 66 text

Distributed Computation Strategies •Scatter-Gather •Map Reduce

Slide 67

Slide 67 text

Fault Tolerance

Slide 68

Slide 68 text

Fault Tolerant System •Embraces the notion of failure

Slide 69

Slide 69 text

Ideal world system has •Two paths •Components that can never fail •Accounting for every possible fault by providing a recovery action

Slide 70

Slide 70 text

Most regular applications •Catch-all mechanism •Terminate as soon as uncaught failure arises

Slide 71

Slide 71 text

FileWatcher LogFile LogProcessor Row Log Processor Log Processor DbWriter Database Connection The database connection might break Java Concurrent Logs Processor

Slide 72

Slide 72 text

runnable.run() runnable.run() runnable.run() dbWriter.write(row5) runnable.run() DBBrokenConnectionException DBBrokenConnectionException DBBrokenConnectionException con.write(row5) Exception moves up the stack on the thread. We don’t have the connection details here to re-create dbWriter and retro dbWriter logProcessor FileWatcher thread logProcessor.process (file) Runnable dbWriter Writes using db Connection Exception can happen from different threads Many log processors are called to process files from several threads. Java Concurrent Logs Processor

Slide 73

Slide 73 text

Difficult to recover •Exception moves up in stack •Making processed lines and connection info available •Breaks simple design •Violates best practices (Encapsulation, DI, SRP)

Slide 74

Slide 74 text

What we need is •Faulty component replaced in a threadsafe manner

Slide 75

Slide 75 text

Fault tolerant requirements •Fault Isolation •Structure •Redundancy •Replacement •Reboot •Suspend •Separation of concerns

Slide 76

Slide 76 text

Resilience

Slide 77

Slide 77 text

Capacity to recover from difficulties

Slide 78

Slide 78 text

Why care about resiliency? •Financial losses •Losing customers •Affecting customers

Slide 79

Slide 79 text

Actor Model

Slide 80

Slide 80 text

Actor Model Carl Hewitt, 1973

Slide 81

Slide 81 text

Actors •Higher level abstraction to write concurrent and distributed programs •To concurrently manageable state •Communication with Actor is by sending messages

Slide 82

Slide 82 text

Akka

Slide 83

Slide 83 text

Akka •Created July 2009 by Jonas Bonér •https://github.com/akka/akka •Written in Scala

Slide 84

Slide 84 text

Akka •Open-source toolkit •Simplifies •Concurrency •Distribution

Slide 85

Slide 85 text

Akka •Actor Model on JVM •Emphasizes actor-based concurrency (inspiration drawn from Erlang)

Slide 86

Slide 86 text

Why Akka?

Slide 87

Slide 87 text

Let it crash • Akka provides two separate flows: • Normal flow • Fault recovery flow

Slide 88

Slide 88 text

Let it crash •The normal flow consists • Actors that handle normal messages •The recovery flow consists • Actors that monitor the actors in the normal flow.

Slide 89

Slide 89 text

FileWatcher Disk Error Stop Corrupt FileException Resume Restart DbBrokeConnection Exception LogProcessor DbWriter LogProcessingSupervisor Escalate Actors in the log-processing application do not concern themselves with fault recovery Supervisors can decide to escalate problem to higher level The LogProcessingSupervisor create all the actors at startup and supervises them all Akka Concurrent Logs Processor

Slide 90

Slide 90 text

LogProcessingSupervisor FileWatcher LogProcessor DbWriter DiskError Stop CorruptFileException Resume Db Broken Connection Exception Restart DbNodeDownException logProcessor Stop The LogProcessor also watches the dbWriter and replaces it once it is terminated due to a DbNodeDownException Akka Concurrent Logs Processor

Slide 91

Slide 91 text

Benefits of Let it Crash •Fault isolation •Structure  •Redundancy  •Replacement  •Reboot  •Component lifecylcle •Suspend  •Separation of concerns 

Slide 92

Slide 92 text

Akka - What’s in it?

Slide 93

Slide 93 text

Akka Layers Akka Core Akka IO Akka Remote (Implements Distribution) Akka Cluster (Mobility) Akka Cluster Extensions (Pattern Singleton, Sharding, PubSub)

Slide 94

Slide 94 text

Akka provides •Local actor •Remote actor •Scheduling •Routing •Cluster •Cluster Singleton •Sharding •Persistence •Distributed Data

Slide 95

Slide 95 text

Local Actor on JVM1 Local Actor on JVM

Slide 96

Slide 96 text

Local Actor on JVM Local Actor on JVM1 Coordinator Actor

Slide 97

Slide 97 text

Local Actor on JVM Coordinator Actor println(“Hello World”)

Slide 98

Slide 98 text

class Worker extends Actor { def receive = { case x => println(x) } } val system = ActorSystem("ExampleActorSystem") val workerActorRef = system.actorOf(Props[Worker]) workerActorRef | "Hello World"

Slide 99

Slide 99 text

Akka Remoting

Slide 100

Slide 100 text

Actor on JVM 1 Actor on JVM 2 Akka Remoting

Slide 101

Slide 101 text

val workerActorRef = system.actorOf(Props[Worker]) workerActorRef ! "Hello Conference" val workerActorRef = context.actorSelection("akka:tcp:// [email protected]:9005/usr/WorkerActor") workerActorRef ! "Hello World"

Slide 102

Slide 102 text

akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote { enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "127.0.0.1" port = 2552 } } }

Slide 103

Slide 103 text

Akka Cluster

Slide 104

Slide 104 text

Akka Cluster •Fault Tolerant Membership service for Akka nodes built on top of Akka Remoting •No Single point of failure or bottleneck •Provides support of load balancing or fail over

Slide 105

Slide 105 text

Akka Cluster •Allows dynamically grow or shrink number of nodes •Actor could reside anywhere in the cluster.. local or remote.

Slide 106

Slide 106 text

Akka Cluster Node Types Node Cluster Leader Seed Node Node Node

Slide 107

Slide 107 text

Cluster (node 1, node 2, node 3, node 4) Node 1 Node 2 Node 3 Node 4 User a b c d e User a b c d e User a b c d e User a b c d e This cluster is ring of Nodes Every node contains an actor system. The actor system needs to have the same name to be part of the same cluster. A list of member nodes Is maintained in a current cluster state. The actor systems gossip to each other about this state Akka Cluster

Slide 108

Slide 108 text

Akka Cluster features •Cluster membership •Load balancing •Node partitioning •Partition Points •Gossip & Convergence •Failure Detection

Slide 109

Slide 109 text

Akka Gossip Protocol •Decentralised, probabilistic, viral communication protocol with convergence •Each node hold the state of the cluster and tell neighbours about it

Slide 110

Slide 110 text

Gossip (Communication Between Nodes) 7 7 7 Convergence

Slide 111

Slide 111 text

Gossip (Communication Between Nodes) 7 7 7 Convergence 7 7

Slide 112

Slide 112 text

How do we detect when node fails?

Slide 113

Slide 113 text

Failure Detector

Slide 114

Slide 114 text

Failure Detector 7 ! 7 Failure Detection 7 7

Slide 115

Slide 115 text

Failure Detector 7 7 7 7 7 7 7 7

Slide 116

Slide 116 text

Cluster Seed nodes (1, 2, 3) Master nodes (4, 5) Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers

Slide 117

Slide 117 text

Cluster Seed Nodes: 1 Node 1: Seed Role

Slide 118

Slide 118 text

Cluster Seed Nodes: (1, 2) Joining: (3) Node 1: Seed Role Node 2: Seed Role Node 3: Seed Role Join

Slide 119

Slide 119 text

Cluster Seed Nodes: (1, 2, 3) Joining nodes: (4, 5) Node 1: Seed Role Node 2: Seed Role Node 4: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 3 responds fastest and handles join of node 5 Node 2 responds fastest and handles join of node 4

Slide 120

Slide 120 text

Cluster Seed nodes: (2) Master nodes: (5) Worker nodes: (4) Joining nodes: (6, 7) Node 1: Seed Role Node 2: Seed Role Node 7: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 2 responds fastest and handles join of node 4 Leave Leave Node 4: Worker Role Node 5: Master Role Node 6: Worker Role seed list (1, 2, 3) Join

Slide 121

Slide 121 text

Cluster Seed nodes (1, 2, 3) Master nodes (4, 5) Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers

Slide 122

Slide 122 text

Cluster Leader: Node 1 Node 1: Leaving Node 2: Up Node 3: Up Seed node 1 Seed node 2 Seed node 3 Leave

Slide 123

Slide 123 text

Cluster Leader: Node 1 Node 1: Exiting unreachable Node 2: Up Node 3: Up Seed node 1 cluster node Is shutdown Seed node 2 Seed node 3 Leave

Slide 124

Slide 124 text

Cluster Leader: Node 2 Node 1: Removed Node 2: Up Node 3: Up Seed node 2 Seed node 3

Slide 125

Slide 125 text

Joining Up Leaving Unreachable Down Initial state Final state State in transition Key Join Leader action Leader Exiting Leader action Leader action Down Removed

Slide 126

Slide 126 text

Akka Cluster Patterns •Singleton •Sharding •Routing •Distributed Pub Sub

Slide 127

Slide 127 text

Cluster Singleton

Slide 128

Slide 128 text

Cluster Singleton b a e d c S

Slide 129

Slide 129 text

Cluster Singleton b a e d c S

Slide 130

Slide 130 text

Cluster Singleton b e d c S

Slide 131

Slide 131 text

Akka Cluster Sharding

Slide 132

Slide 132 text

Cluster Sharding •To distribute actors across several nodes •Interact with them with logical identifier (without knowing physical location)

Slide 133

Slide 133 text

Node Shard E E E E E Shard Region Shard Coordinator Akka Cluster Sharding

Slide 134

Slide 134 text

Akka Persistence

Slide 135

Slide 135 text

Akka Persistence •Stateful actors to persist their internal state. •State can be recovered when actor is started, restarted

Slide 136

Slide 136 text

Akka Persistence •Implemented using Event Sourcing •Only Changes to actor’s internal state are stored •Current state is never stored directly.

Slide 137

Slide 137 text

Akka Distributed Data

Slide 138

Slide 138 text

Distributed Data •A set of merge-friendly data types to maintain state across cluster

Slide 139

Slide 139 text

Distributed Data •Useful to share data between nodes in Akka Cluster •Based on Conflict Free Replicated Data Types (CRDTs).

Slide 140

Slide 140 text

Akka makes distributed systems less difficult

Slide 141

Slide 141 text

References • Akka in Action - Raymond Roestenburg, • https://speakerd.s3.amazonaws.com/presentations/ 9a6b0f62b9ee4dc8980e5ff590f1a6cf/sheridan_college.pdf • Distributed Systems in One Lesson http://shop.oreilly.com/product/0636920039518.do by Tim Berglund • Learning Akka - Salma Khater • Hands on Introduction to Distributed Systems Concepts with Akka Clustering - by David Russell • A tour of the (advanced) Akka features in 60 minutes by Johan Janssen • Go distributed (and scale out) with Actors and Akka Clustering

Slide 142

Slide 142 text

Thank you!

Slide 143

Slide 143 text

Questions? @anildigital