Building Distributed System with Akka

Building Distributed System with Akka Anil Wadghule  @anildigital

Current scene in internet

Size of the internet

Market forces leading to change Concurrent connections •“Internet of Things”,
mobile devices Big data •Size of data is overwhelming our ability to manage it Response times •Real-time results (e.g. analytics) with sub-second latencies

Physical factors leading to change •Expensive hardware → cheap hardware
•A single machine → cluster of machines •A single core → multiple cores •Slow networks → fast networks •Small data snapshots → big data, streams of data

Modern Application Requirements High Availability Fault Tolerance Scalability

Google going down? Accepted?

Google going down? Accepted? Highly Available

Millions people visit Google

Millions people visit Google Highly Scalable

Facebook doesn’t crash in afternoon

Fault Tolerant Facebook doesn’t crash in afternoon

Reactive Principles

What is a Distributed System?

Your computer?

Your computer? NO.

No global clock Distributed Systems

You have a distributed system, when the crash of a
computer you’ve never heard of, stops you from getting any work done. Leslie Lamport: A Guide to Building Dependable Distributed Systems.

A collection of independent computers that appear to its users
as one computer. Tanenbaum and Steen: Distributed Systems, Principles and Paradigms

Collection of interconnected nodes

Distributed Architecture

8 Fallacies of Distributed System •The network is reliable. •Latency
is zero. •Bandwidth is inﬁnite. •The network is secure. •Topology doesn't change. •There is one administrator. •Transport cost is zero. •The network is homogeneous.

Distributed System Examples •amazon.com •Cassandara database (unlike local Sqlite db)

How do we scale?

Coﬀeeshop Example

Regular small app o o One Americano Please! Akira

Regular small app o o Akira - Americano Head  Barista
Assistant  Baristas

Starbucks goes popular •Starbucks needs to serve more customers now

Scaling Strategies •Read Replication •Sharding •Consistent Hashing

Read Replication

o o Akira - Americano Akira - Americano Akira -
Americano Akira - Americano Akira - Americano Read Replication Akira - Americano Head  Barista Assistant   Baristas

Issues for Read Replication •Complexity •Consistency

Sharding

Sharding •Add more writers (i.e. Head Baristas) •Split orders with
some key (i.e. Customers name )

Sharding o o Akira - Americano Akira - Americano Akira
- Americano Akira - Americano o o Akira - Americano Koichi - Cappuccino Akira to Chang  Barista 1 Jim to Lorenzo  Barista 2 A-C J-L Akira - Americano Akira - Americano Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino

Issues with Sharding •Limited data model •More complexity •Limited data
access patterns •Only good for certain kind of applications e.g. SAAS apps

Consistent Hashing

Consistent Hashing and Random Trees Karger et al. at MIT,
1997

0000 2000 4000 6000 8000 10000 12000 14000 Akira-Americano

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72
> 8000

What if node crashes?

0000 2000 4000 6000 8000 10000 12000 14000 Node crashes
9F72-Americano

0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72-Americano
9F72-Americano N = 3  Replication Factor

Consistency formula R + W > N • N =
Total number of replicas (e.g. 3) • W = Number of replicas acknowledge my update • R = Number of replicas that agree on read

When to use consistent hashing? •Scale •Transactional data (Business transactions
.. not ACID) Data which changes a lot •Always available

CAP Theorem

Consistency Availability Partition Tolerance

Consistency

Availability

Partition Tolerance

CAP Theorem •Partitioning can’t be negotiated. It’s reality. •You have
to compromise Availability or Consistency

Distributed Transactions

ACID? Distributed Transactions

Ordering Coﬀee •Receive Order •Process Payment •Enqueue order (mark the
cup) •Make Coﬀee •Deliver Drink

Why split the work? •Parallelization •Uneven workloads

What could go wrong? •Payment failure •Insuﬃcient resources •Equipment failure
•Worker failure •Consumer failure

Response to failure? •Write-oﬀ (throw it out) •Retry (Typical) •Compensating
action

Distributed Transactions •How can we design a coﬀeeshop with atomic
transaction?

Distributed Computation

Distributed Computation Strategies •Scatter-Gather •Map Reduce

Fault Tolerance

Fault Tolerant System •Embraces the notion of failure

Ideal world system has •Two paths •Components that can never
fail •Accounting for every possible fault by providing a recovery action

Most regular applications •Catch-all mechanism •Terminate as soon as uncaught
failure arises

FileWatcher LogFile LogProcessor Row Log Processor Log Processor DbWriter Database
Connection The database connection might break Java Concurrent Logs Processor

runnable.run() runnable.run() runnable.run() dbWriter.write(row5) runnable.run() DBBrokenConnectionException DBBrokenConnectionException DBBrokenConnectionException con.write(row5) Exception
moves up the stack on the thread. We don’t have the connection details here to re-create dbWriter and retro dbWriter logProcessor FileWatcher thread logProcessor.process (file) Runnable dbWriter Writes using db Connection Exception can happen from different threads Many log processors are called to process files from several threads. Java Concurrent Logs Processor

Diﬃcult to recover •Exception moves up in stack •Making processed
lines and connection info available •Breaks simple design •Violates best practices (Encapsulation, DI, SRP)

What we need is •Faulty component replaced in a threadsafe
manner

Fault tolerant requirements •Fault Isolation •Structure •Redundancy •Replacement •Reboot •Suspend
•Separation of concerns

Resilience

Capacity to recover from diﬃculties

Why care about resiliency? •Financial losses •Losing customers •Aﬀecting customers

Actor Model

Actor Model Carl Hewitt, 1973

Actors •Higher level abstraction to write concurrent and distributed programs
•To concurrently manageable state •Communication with Actor is by sending messages

Akka •Created July 2009 by Jonas Bonér •https://github.com/akka/akka •Written in
Scala

Akka •Open-source toolkit •Simpliﬁes •Concurrency •Distribution

Akka •Actor Model on JVM •Emphasizes actor-based concurrency (inspiration drawn
from Erlang)

Why Akka?

Let it crash • Akka provides two separate flows: •
Normal flow • Fault recovery flow

Let it crash •The normal flow consists • Actors that
handle normal messages •The recovery flow consists • Actors that monitor the actors in the normal flow.

FileWatcher Disk Error Stop Corrupt FileException Resume Restart DbBrokeConnection Exception
LogProcessor DbWriter LogProcessingSupervisor Escalate Actors in the log-processing application do not concern themselves with fault recovery Supervisors can decide to escalate problem to higher level The LogProcessingSupervisor create all the actors at startup and supervises them all Akka Concurrent Logs Processor

LogProcessingSupervisor FileWatcher LogProcessor DbWriter DiskError Stop CorruptFileException Resume Db Broken
Connection Exception Restart DbNodeDownException logProcessor Stop The LogProcessor also watches the dbWriter and replaces it once it is terminated due to a DbNodeDownException Akka Concurrent Logs Processor

Beneﬁts of Let it Crash •Fault isolation •Structure •Redundancy •Replacement
•Reboot •Component lifecylcle •Suspend •Separation of concerns

Akka - What’s in it?

Akka Layers Akka Core Akka IO Akka Remote (Implements Distribution)
Akka Cluster (Mobility) Akka Cluster Extensions (Pattern Singleton, Sharding, PubSub)

Akka provides •Local actor •Remote actor •Scheduling •Routing •Cluster •Cluster
Singleton •Sharding •Persistence •Distributed Data

Local Actor on JVM1 Local Actor on JVM

Local Actor on JVM Local Actor on JVM1 Coordinator Actor

Local Actor on JVM Coordinator Actor println(“Hello World”)

class Worker extends Actor { def receive = { case
x => println(x) } } val system = ActorSystem("ExampleActorSystem") val workerActorRef = system.actorOf(Props[Worker]) workerActorRef | "Hello World"

Akka Remoting

Actor on JVM 1 Actor on JVM 2 Akka Remoting

val workerActorRef = system.actorOf(Props[Worker]) workerActorRef ! "Hello Conference" val workerActorRef
= context.actorSelection("akka:tcp:// [email protected]:9005/usr/WorkerActor") workerActorRef ! "Hello World"

akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote {
enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "127.0.0.1" port = 2552 } } }

Akka Cluster

Akka Cluster •Fault Tolerant Membership service for Akka nodes built
on top of Akka Remoting •No Single point of failure or bottleneck •Provides support of load balancing or fail over

Akka Cluster •Allows dynamically grow or shrink number of nodes
•Actor could reside anywhere in the cluster.. local or remote.

Akka Cluster Node Types Node Cluster Leader Seed Node Node
Node

Cluster (node 1, node 2, node 3, node 4) Node
1 Node 2 Node 3 Node 4 User a b c d e User a b c d e User a b c d e User a b c d e This cluster is ring of Nodes Every node contains an actor system. The actor system needs to have the same name to be part of the same cluster. A list of member nodes Is maintained in a current cluster state. The actor systems gossip to each other about this state Akka Cluster

Akka Cluster features •Cluster membership •Load balancing •Node partitioning •Partition
Points •Gossip & Convergence •Failure Detection

Akka Gossip Protocol •Decentralised, probabilistic, viral communication protocol with convergence
•Each node hold the state of the cluster and tell neighbours about it

Gossip (Communication Between Nodes) 7 7 7 Convergence

Gossip (Communication Between Nodes) 7 7 7 Convergence 7 7

How do we detect when node fails?

Failure Detector

Failure Detector 7 ! 7 Failure Detection 7 7

Failure Detector 7 7 7 7 7 7 7 7

Cluster Seed nodes (1, 2, 3) Master nodes (4, 5)
Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers

Cluster Seed Nodes: 1 Node 1: Seed Role

Cluster Seed Nodes: (1, 2) Joining: (3) Node 1: Seed
Role Node 2: Seed Role Node 3: Seed Role Join

Cluster Seed Nodes: (1, 2, 3) Joining nodes: (4, 5)
Node 1: Seed Role Node 2: Seed Role Node 4: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 3 responds fastest and handles join of node 5 Node 2 responds fastest and handles join of node 4

Cluster Seed nodes: (2) Master nodes: (5) Worker nodes: (4)
Joining nodes: (6, 7) Node 1: Seed Role Node 2: Seed Role Node 7: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 2 responds fastest and handles join of node 4 Leave Leave Node 4: Worker Role Node 5: Master Role Node 6: Worker Role seed list (1, 2, 3) Join

Cluster Seed nodes (1, 2, 3) Master nodes (4, 5)
Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers

Cluster Leader: Node 1 Node 1: Leaving Node 2: Up
Node 3: Up Seed node 1 Seed node 2 Seed node 3 Leave

Cluster Leader: Node 1 Node 1: Exiting unreachable Node 2:
Up Node 3: Up Seed node 1 cluster node Is shutdown Seed node 2 Seed node 3 Leave

Cluster Leader: Node 2 Node 1: Removed Node 2: Up
Node 3: Up Seed node 2 Seed node 3

Joining Up Leaving Unreachable Down Initial state Final state State
in transition Key Join Leader action Leader Exiting Leader action Leader action Down Removed

Akka Cluster Patterns •Singleton •Sharding •Routing •Distributed Pub Sub

Cluster Singleton

Cluster Singleton b a e d c S

Cluster Singleton b e d c S

Akka Cluster Sharding

Cluster Sharding •To distribute actors across several nodes •Interact with
them with logical identiﬁer (without knowing physical location)

Node Shard E E E E E Shard Region Shard
Coordinator Akka Cluster Sharding

Akka Persistence

Akka Persistence •Stateful actors to persist their internal state. •State
can be recovered when actor is started, restarted

Akka Persistence •Implemented using Event Sourcing •Only Changes to actor’s
internal state are stored •Current state is never stored directly.

Akka Distributed Data

Distributed Data •A set of merge-friendly data types to maintain
state across cluster

Distributed Data •Useful to share data between nodes in Akka
Cluster •Based on Conﬂict Free Replicated Data Types (CRDTs).

Akka makes distributed systems less diﬃcult

References • Akka in Action - Raymond Roestenburg, • https://speakerd.s3.amazonaws.com/presentations/
9a6b0f62b9ee4dc8980e5ﬀ590f1a6cf/sheridan_college.pdf • Distributed Systems in One Lesson http://shop.oreilly.com/product/0636920039518.do by Tim Berglund • Learning Akka - Salma Khater • Hands on Introduction to Distributed Systems Concepts with Akka Clustering - by David Russell • A tour of the (advanced) Akka features in 60 minutes by Johan Janssen • Go distributed (and scale out) with Actors and Akka Clustering

Thank you!

Questions? @anildigital

Building Distributed System with Akka

Building Distributed System with Akka

More Decks by anildigital

Other Decks in Programming

Featured

Transcript