Building Distributed System with Akka

Building Distributed System with Akka

2ad20e87f55ce79b113a12c516ec9d09?s=128

anildigital

April 17, 2018
Tweet

Transcript

  1. Building Distributed System with Akka Anil Wadghule
 @anildigital

  2. !✈#

  3. None
  4. None
  5. None
  6. Current scene in internet

  7. Size of the internet

  8. Market forces leading to change Concurrent connections •“Internet of Things”,

    mobile devices Big data •Size of data is overwhelming our ability to manage it Response times •Real-time results (e.g. analytics) with sub-second latencies
  9. Physical factors leading to change •Expensive hardware → cheap hardware

    •A single machine → cluster of machines •A single core → multiple cores •Slow networks → fast networks •Small data snapshots → big data, streams of data
  10. Modern Application Requirements High Availability Fault Tolerance Scalability

  11. Google going down? Accepted?

  12. Google going down? Accepted? Highly Available

  13. Millions people visit Google

  14. Millions people visit Google Highly Scalable

  15. Facebook doesn’t crash in afternoon

  16. Fault Tolerant Facebook doesn’t crash in afternoon

  17. Reactive Principles

  18. What is a Distributed System?

  19. Your computer?

  20. Your computer?

  21. Your computer? NO.

  22. No global clock Distributed Systems

  23. You have a distributed system, when the crash of a

    computer you’ve never heard of, stops you from getting any work done. Leslie Lamport: A Guide to Building Dependable Distributed Systems.
  24. A collection of independent computers that appear to its users

    as one computer. Tanenbaum and Steen: Distributed Systems, Principles and Paradigms
  25. Collection of interconnected nodes

  26. Distributed Architecture

  27. 8 Fallacies of Distributed System •The network is reliable. •Latency

    is zero. •Bandwidth is infinite. •The network is secure. •Topology doesn't change. •There is one administrator. •Transport cost is zero. •The network is homogeneous.
  28. Distributed System Examples •amazon.com •Cassandara database (unlike local Sqlite db)

  29. How do we scale?

  30. Coffeeshop Example

  31. Regular small app o o One Americano Please! Akira

  32. Regular small app o o Akira - Americano Head
 Barista

    Assistant
 Baristas
  33. Starbucks goes popular •Starbucks needs to serve more customers now

  34. Scaling Strategies •Read Replication •Sharding •Consistent Hashing

  35. Read Replication

  36. o o Akira - Americano Akira - Americano Akira -

    Americano Akira - Americano Akira - Americano Read Replication Akira - Americano Head
 Barista Assistant 
 Baristas
  37. Issues for Read Replication •Complexity •Consistency

  38. Sharding

  39. Sharding •Add more writers (i.e. Head Baristas) •Split orders with

    some key (i.e. Customers name )
  40. Sharding o o Akira - Americano Akira - Americano Akira

    - Americano Akira - Americano o o Akira - Americano Koichi - Cappuccino Akira to Chang
 Barista 1 Jim to Lorenzo
 Barista 2 A-C J-L Akira - Americano Akira - Americano Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino Koichi - Cappuccino
  41. Issues with Sharding •Limited data model •More complexity •Limited data

    access patterns •Only good for certain kind of applications e.g. SAAS apps
  42. Consistent Hashing

  43. Consistent Hashing and Random Trees Karger et al. at MIT,

    1997
  44. 0000 2000 4000 6000 8000 10000 12000 14000 Akira-Americano

  45. 0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano

  46. 0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72

    > 8000
  47. What if node crashes?

  48. 0000 2000 4000 6000 8000 10000 12000 14000 Node crashes

    9F72-Americano
  49. 0000 2000 4000 6000 8000 10000 12000 14000 9F72-Americano 9F72-Americano

    9F72-Americano N = 3
 Replication Factor
  50. Consistency formula R + W > N • N =

    Total number of replicas (e.g. 3) • W = Number of replicas acknowledge my update • R = Number of replicas that agree on read
  51. When to use consistent hashing? •Scale •Transactional data (Business transactions

    .. not ACID) Data which changes a lot •Always available
  52. CAP Theorem

  53. Consistency Availability Partition Tolerance

  54. Consistency

  55. Availability

  56. Partition Tolerance

  57. CAP Theorem •Partitioning can’t be negotiated. It’s reality. •You have

    to compromise Availability or Consistency
  58. Distributed Transactions

  59. ACID? Distributed Transactions

  60. Ordering Coffee •Receive Order •Process Payment •Enqueue order (mark the

    cup) •Make Coffee •Deliver Drink
  61. Why split the work? •Parallelization •Uneven workloads

  62. What could go wrong? •Payment failure •Insufficient resources •Equipment failure

    •Worker failure •Consumer failure
  63. Response to failure? •Write-off (throw it out) •Retry (Typical) •Compensating

    action
  64. Distributed Transactions •How can we design a coffeeshop with atomic

    transaction?
  65. Distributed Computation

  66. Distributed Computation Strategies •Scatter-Gather •Map Reduce

  67. Fault Tolerance

  68. Fault Tolerant System •Embraces the notion of failure

  69. Ideal world system has •Two paths •Components that can never

    fail •Accounting for every possible fault by providing a recovery action
  70. Most regular applications •Catch-all mechanism •Terminate as soon as uncaught

    failure arises
  71. FileWatcher LogFile LogProcessor Row Log Processor Log Processor DbWriter Database

    Connection The database connection might break Java Concurrent Logs Processor
  72. runnable.run() runnable.run() runnable.run() dbWriter.write(row5) runnable.run() DBBrokenConnectionException DBBrokenConnectionException DBBrokenConnectionException con.write(row5) Exception

    moves up the stack on the thread. We don’t have the connection details here to re-create dbWriter and retro dbWriter logProcessor FileWatcher thread logProcessor.process (file) Runnable dbWriter Writes using db Connection Exception can happen from different threads Many log processors are called to process files from several threads. Java Concurrent Logs Processor
  73. Difficult to recover •Exception moves up in stack •Making processed

    lines and connection info available •Breaks simple design •Violates best practices (Encapsulation, DI, SRP)
  74. What we need is •Faulty component replaced in a threadsafe

    manner
  75. Fault tolerant requirements •Fault Isolation •Structure •Redundancy •Replacement •Reboot •Suspend

    •Separation of concerns
  76. Resilience

  77. Capacity to recover from difficulties

  78. Why care about resiliency? •Financial losses •Losing customers •Affecting customers

  79. Actor Model

  80. Actor Model Carl Hewitt, 1973

  81. Actors •Higher level abstraction to write concurrent and distributed programs

    •To concurrently manageable state •Communication with Actor is by sending messages
  82. Akka

  83. Akka •Created July 2009 by Jonas Bonér •https://github.com/akka/akka •Written in

    Scala
  84. Akka •Open-source toolkit •Simplifies •Concurrency •Distribution

  85. Akka •Actor Model on JVM •Emphasizes actor-based concurrency (inspiration drawn

    from Erlang)
  86. Why Akka?

  87. Let it crash • Akka provides two separate flows: •

    Normal flow • Fault recovery flow
  88. Let it crash •The normal flow consists • Actors that

    handle normal messages •The recovery flow consists • Actors that monitor the actors in the normal flow.
  89. FileWatcher Disk Error Stop Corrupt FileException Resume Restart DbBrokeConnection Exception

    LogProcessor DbWriter LogProcessingSupervisor Escalate Actors in the log-processing application do not concern themselves with fault recovery Supervisors can decide to escalate problem to higher level The LogProcessingSupervisor create all the actors at startup and supervises them all Akka Concurrent Logs Processor
  90. LogProcessingSupervisor FileWatcher LogProcessor DbWriter DiskError Stop CorruptFileException Resume Db Broken

    Connection Exception Restart DbNodeDownException logProcessor Stop The LogProcessor also watches the dbWriter and replaces it once it is terminated due to a DbNodeDownException Akka Concurrent Logs Processor
  91. Benefits of Let it Crash •Fault isolation •Structure  •Redundancy  •Replacement 

    •Reboot  •Component lifecylcle •Suspend  •Separation of concerns 
  92. Akka - What’s in it?

  93. Akka Layers Akka Core Akka IO Akka Remote (Implements Distribution)

    Akka Cluster (Mobility) Akka Cluster Extensions (Pattern Singleton, Sharding, PubSub)
  94. Akka provides •Local actor •Remote actor •Scheduling •Routing •Cluster •Cluster

    Singleton •Sharding •Persistence •Distributed Data
  95. Local Actor on JVM1 Local Actor on JVM

  96. Local Actor on JVM Local Actor on JVM1 Coordinator Actor

  97. Local Actor on JVM Coordinator Actor println(“Hello World”)

  98. class Worker extends Actor { def receive = { case

    x => println(x) } } val system = ActorSystem("ExampleActorSystem") val workerActorRef = system.actorOf(Props[Worker]) workerActorRef | "Hello World"
  99. Akka Remoting

  100. Actor on JVM 1 Actor on JVM 2 Akka Remoting

  101. val workerActorRef = system.actorOf(Props[Worker]) workerActorRef ! "Hello Conference" val workerActorRef

    = context.actorSelection("akka:tcp:// ExampleActorSystem@127.0.0.1:9005/usr/WorkerActor") workerActorRef ! "Hello World"
  102. akka { actor { provider = "akka.remote.RemoteActorRefProvider" } remote {

    enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "127.0.0.1" port = 2552 } } }
  103. Akka Cluster

  104. Akka Cluster •Fault Tolerant Membership service for Akka nodes built

    on top of Akka Remoting •No Single point of failure or bottleneck •Provides support of load balancing or fail over
  105. Akka Cluster •Allows dynamically grow or shrink number of nodes

    •Actor could reside anywhere in the cluster.. local or remote.
  106. Akka Cluster Node Types Node Cluster Leader Seed Node Node

    Node
  107. Cluster (node 1, node 2, node 3, node 4) Node

    1 Node 2 Node 3 Node 4 User a b c d e User a b c d e User a b c d e User a b c d e This cluster is ring of Nodes Every node contains an actor system. The actor system needs to have the same name to be part of the same cluster. A list of member nodes Is maintained in a current cluster state. The actor systems gossip to each other about this state Akka Cluster
  108. Akka Cluster features •Cluster membership •Load balancing •Node partitioning •Partition

    Points •Gossip & Convergence •Failure Detection
  109. Akka Gossip Protocol •Decentralised, probabilistic, viral communication protocol with convergence

    •Each node hold the state of the cluster and tell neighbours about it
  110. Gossip (Communication Between Nodes) 7 7 7 Convergence

  111. Gossip (Communication Between Nodes) 7 7 7 Convergence 7 7

  112. How do we detect when node fails?

  113. Failure Detector

  114. Failure Detector 7 ! 7 Failure Detection 7 7

  115. Failure Detector 7 7 7 7 7 7 7 7

  116. Cluster Seed nodes (1, 2, 3) Master nodes (4, 5)

    Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers
  117. Cluster Seed Nodes: 1 Node 1: Seed Role

  118. Cluster Seed Nodes: (1, 2) Joining: (3) Node 1: Seed

    Role Node 2: Seed Role Node 3: Seed Role Join
  119. Cluster Seed Nodes: (1, 2, 3) Joining nodes: (4, 5)

    Node 1: Seed Role Node 2: Seed Role Node 4: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 3 responds fastest and handles join of node 5 Node 2 responds fastest and handles join of node 4
  120. Cluster Seed nodes: (2) Master nodes: (5) Worker nodes: (4)

    Joining nodes: (6, 7) Node 1: Seed Role Node 2: Seed Role Node 7: Worker Role seed list (1, 2, 3) Join Node 3: Seed Role Node 5: Master Role seed list (1, 2, 3) Node 2 responds fastest and handles join of node 4 Leave Leave Node 4: Worker Role Node 5: Master Role Node 6: Worker Role seed list (1, 2, 3) Join
  121. Cluster Seed nodes (1, 2, 3) Master nodes (4, 5)

    Worker nodes (6, 7, 8) Node 1: Seed Role Node 3: Seed Role Node 2: Seed Role Node 4: Master Role Node 7: Worker Role Node 8: Worker Role Node 5: Master Role Node 6: Worker Role Minimal setup cluster. 3 seeds 2 master 3 workers
  122. Cluster Leader: Node 1 Node 1: Leaving Node 2: Up

    Node 3: Up Seed node 1 Seed node 2 Seed node 3 Leave
  123. Cluster Leader: Node 1 Node 1: Exiting unreachable Node 2:

    Up Node 3: Up Seed node 1 cluster node Is shutdown Seed node 2 Seed node 3 Leave
  124. Cluster Leader: Node 2 Node 1: Removed Node 2: Up

    Node 3: Up Seed node 2 Seed node 3
  125. Joining Up Leaving Unreachable Down Initial state Final state State

    in transition Key Join Leader action Leader Exiting Leader action Leader action Down Removed
  126. Akka Cluster Patterns •Singleton •Sharding •Routing •Distributed Pub Sub

  127. Cluster Singleton

  128. Cluster Singleton b a e d c S

  129. Cluster Singleton b a e d c S

  130. Cluster Singleton b e d c S

  131. Akka Cluster Sharding

  132. Cluster Sharding •To distribute actors across several nodes •Interact with

    them with logical identifier (without knowing physical location)
  133. Node Shard E E E E E Shard Region Shard

    Coordinator Akka Cluster Sharding
  134. Akka Persistence

  135. Akka Persistence •Stateful actors to persist their internal state. •State

    can be recovered when actor is started, restarted
  136. Akka Persistence •Implemented using Event Sourcing •Only Changes to actor’s

    internal state are stored •Current state is never stored directly.
  137. Akka Distributed Data

  138. Distributed Data •A set of merge-friendly data types to maintain

    state across cluster
  139. Distributed Data •Useful to share data between nodes in Akka

    Cluster •Based on Conflict Free Replicated Data Types (CRDTs).
  140. Akka makes distributed systems less difficult

  141. References • Akka in Action - Raymond Roestenburg, • https://speakerd.s3.amazonaws.com/presentations/

    9a6b0f62b9ee4dc8980e5ff590f1a6cf/sheridan_college.pdf • Distributed Systems in One Lesson http://shop.oreilly.com/product/0636920039518.do by Tim Berglund • Learning Akka - Salma Khater • Hands on Introduction to Distributed Systems Concepts with Akka Clustering - by David Russell • A tour of the (advanced) Akka features in 60 minutes by Johan Janssen • Go distributed (and scale out) with Actors and Akka Clustering
  142. Thank you!

  143. Questions? @anildigital