Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Distributed System with Akka

Building Distributed System with Akka

anildigital

April 17, 2018
Tweet

More Decks by anildigital

Other Decks in Programming

Transcript

  1. Building Distributed System with Akka
    Anil Wadghule

    @anildigital

    View Slide

  2. !✈#

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. Current scene in internet

    View Slide

  7. Size of the internet

    View Slide

  8. Market forces leading to change
    Concurrent connections
    •“Internet of Things”, mobile devices
    Big data
    •Size of data is overwhelming our ability to manage it
    Response times
    •Real-time results (e.g. analytics) with sub-second
    latencies

    View Slide

  9. Physical factors leading to change
    •Expensive hardware → cheap hardware
    •A single machine → cluster of machines
    •A single core → multiple cores
    •Slow networks → fast networks
    •Small data snapshots → big data, streams of data

    View Slide

  10. Modern Application Requirements
    High
    Availability
    Fault
    Tolerance
    Scalability

    View Slide

  11. Google going down? Accepted?

    View Slide

  12. Google going down? Accepted?
    Highly
    Available

    View Slide

  13. Millions people visit Google

    View Slide

  14. Millions people visit Google
    Highly
    Scalable

    View Slide

  15. Facebook doesn’t crash in afternoon

    View Slide

  16. Fault
    Tolerant
    Facebook doesn’t crash in afternoon

    View Slide

  17. Reactive Principles

    View Slide

  18. What is a Distributed System?

    View Slide

  19. Your computer?

    View Slide

  20. Your computer?

    View Slide

  21. Your computer? NO.

    View Slide

  22. No global clock
    Distributed Systems

    View Slide

  23. You have a distributed system,
    when the crash of a computer
    you’ve never heard of, stops you
    from getting any work done.
    Leslie Lamport: A Guide to Building Dependable Distributed Systems.

    View Slide

  24. A collection of independent
    computers that appear to its
    users as one computer.
    Tanenbaum and Steen: Distributed Systems, Principles and
    Paradigms

    View Slide

  25. Collection of interconnected
    nodes

    View Slide

  26. Distributed Architecture

    View Slide

  27. 8 Fallacies of Distributed System
    •The network is reliable.
    •Latency is zero.
    •Bandwidth is infinite.
    •The network is secure.
    •Topology doesn't change.
    •There is one administrator.
    •Transport cost is zero.
    •The network is homogeneous.

    View Slide

  28. Distributed System Examples
    •amazon.com
    •Cassandara database
    (unlike local Sqlite db)

    View Slide

  29. How do we scale?

    View Slide

  30. Coffeeshop Example

    View Slide

  31. Regular small app
    o o
    One
    Americano
    Please!
    Akira

    View Slide

  32. Regular small app
    o o
    Akira - Americano
    Head

    Barista
    Assistant

    Baristas

    View Slide

  33. Starbucks goes popular
    •Starbucks needs to serve
    more customers now

    View Slide

  34. Scaling Strategies
    •Read Replication
    •Sharding
    •Consistent Hashing

    View Slide

  35. Read Replication

    View Slide

  36. o o
    Akira - Americano
    Akira - Americano
    Akira - Americano
    Akira - Americano
    Akira - Americano
    Read Replication
    Akira - Americano
    Head

    Barista
    Assistant 

    Baristas

    View Slide

  37. Issues for Read Replication
    •Complexity
    •Consistency

    View Slide

  38. Sharding

    View Slide

  39. Sharding
    •Add more writers (i.e. Head
    Baristas)
    •Split orders with some key
    (i.e. Customers name )

    View Slide

  40. Sharding
    o o
    Akira - Americano
    Akira - Americano
    Akira - Americano
    Akira - Americano
    o o
    Akira - Americano
    Koichi - Cappuccino
    Akira to Chang

    Barista 1
    Jim to Lorenzo

    Barista 2
    A-C
    J-L
    Akira - Americano
    Akira - Americano
    Koichi - Cappuccino
    Koichi - Cappuccino
    Koichi - Cappuccino
    Koichi - Cappuccino
    Koichi - Cappuccino
    Koichi - Cappuccino

    View Slide

  41. Issues with Sharding
    •Limited data model
    •More complexity
    •Limited data access patterns
    •Only good for certain kind of
    applications e.g. SAAS apps

    View Slide

  42. Consistent Hashing

    View Slide

  43. Consistent Hashing and Random Trees
    Karger et al. at MIT, 1997

    View Slide

  44. 0000
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    Akira-Americano

    View Slide

  45. 0000
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    9F72-Americano

    View Slide

  46. 0000
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    9F72-Americano
    9F72 > 8000

    View Slide

  47. What if node crashes?

    View Slide

  48. 0000
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    Node crashes
    9F72-Americano

    View Slide

  49. 0000
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    9F72-Americano
    9F72-Americano
    9F72-Americano
    N = 3

    Replication Factor

    View Slide

  50. Consistency formula
    R + W > N
    • N = Total number of replicas (e.g. 3)
    • W = Number of replicas acknowledge my update
    • R = Number of replicas that agree on read

    View Slide

  51. When to use consistent hashing?
    •Scale
    •Transactional data (Business
    transactions .. not ACID) Data
    which changes a lot
    •Always available

    View Slide

  52. CAP Theorem

    View Slide

  53. Consistency
    Availability Partition Tolerance

    View Slide

  54. Consistency

    View Slide

  55. Availability

    View Slide

  56. Partition Tolerance

    View Slide

  57. CAP Theorem
    •Partitioning can’t be
    negotiated. It’s reality.
    •You have to compromise
    Availability or Consistency

    View Slide

  58. Distributed Transactions

    View Slide

  59. ACID?
    Distributed Transactions

    View Slide

  60. Ordering Coffee
    •Receive Order
    •Process Payment
    •Enqueue order (mark the cup)
    •Make Coffee
    •Deliver Drink

    View Slide

  61. Why split the work?
    •Parallelization
    •Uneven workloads

    View Slide

  62. What could go wrong?
    •Payment failure
    •Insufficient resources
    •Equipment failure
    •Worker failure
    •Consumer failure

    View Slide

  63. Response to failure?
    •Write-off (throw it out)
    •Retry (Typical)
    •Compensating action

    View Slide

  64. Distributed Transactions
    •How can we design a coffeeshop
    with atomic transaction?

    View Slide

  65. Distributed Computation

    View Slide

  66. Distributed Computation Strategies
    •Scatter-Gather
    •Map Reduce

    View Slide

  67. Fault Tolerance

    View Slide

  68. Fault Tolerant System
    •Embraces the notion of failure

    View Slide

  69. Ideal world system has
    •Two paths
    •Components that can never fail
    •Accounting for every possible
    fault by providing a recovery
    action

    View Slide

  70. Most regular applications
    •Catch-all mechanism
    •Terminate as soon as
    uncaught failure arises

    View Slide

  71. FileWatcher LogFile
    LogProcessor
    Row
    Log Processor
    Log Processor
    DbWriter
    Database
    Connection
    The database connection might break
    Java Concurrent Logs Processor

    View Slide

  72. runnable.run()
    runnable.run()
    runnable.run()
    dbWriter.write(row5)
    runnable.run()
    DBBrokenConnectionException
    DBBrokenConnectionException
    DBBrokenConnectionException
    con.write(row5)
    Exception moves up
    the stack on the
    thread.
    We don’t have the
    connection details
    here to re-create
    dbWriter
    and retro
    dbWriter
    logProcessor
    FileWatcher
    thread
    logProcessor.process
    (file)
    Runnable
    dbWriter Writes using db Connection
    Exception can happen from different threads
    Many log processors are called to process files from several threads.
    Java Concurrent Logs Processor

    View Slide

  73. Difficult to recover
    •Exception moves up in stack
    •Making processed lines and connection
    info available
    •Breaks simple design
    •Violates best practices (Encapsulation,
    DI, SRP)

    View Slide

  74. What we need is
    •Faulty component replaced
    in a threadsafe manner

    View Slide

  75. Fault tolerant requirements
    •Fault Isolation
    •Structure
    •Redundancy
    •Replacement
    •Reboot
    •Suspend
    •Separation of concerns

    View Slide

  76. Resilience

    View Slide

  77. Capacity to recover from
    difficulties

    View Slide

  78. Why care about resiliency?
    •Financial losses
    •Losing customers
    •Affecting customers

    View Slide

  79. Actor Model

    View Slide

  80. Actor Model
    Carl Hewitt, 1973

    View Slide

  81. Actors
    •Higher level abstraction to write
    concurrent and distributed programs
    •To concurrently manageable state
    •Communication with Actor is by
    sending messages

    View Slide

  82. Akka

    View Slide

  83. Akka
    •Created July 2009 by Jonas
    Bonér
    •https://github.com/akka/akka
    •Written in Scala

    View Slide

  84. Akka
    •Open-source toolkit
    •Simplifies
    •Concurrency
    •Distribution

    View Slide

  85. Akka
    •Actor Model on JVM
    •Emphasizes actor-based
    concurrency (inspiration drawn from
    Erlang)

    View Slide

  86. Why Akka?

    View Slide

  87. Let it crash
    • Akka provides two separate
    flows:
    • Normal flow
    • Fault recovery flow

    View Slide

  88. Let it crash
    •The normal flow consists
    • Actors that handle normal messages
    •The recovery flow consists
    • Actors that monitor the actors in the
    normal flow.

    View Slide

  89. FileWatcher
    Disk Error Stop Corrupt
    FileException
    Resume Restart
    DbBrokeConnection
    Exception
    LogProcessor DbWriter
    LogProcessingSupervisor
    Escalate
    Actors in the log-processing application
    do not concern themselves with fault
    recovery
    Supervisors can decide to
    escalate problem to higher level
    The LogProcessingSupervisor
    create all the actors at startup
    and supervises them all
    Akka Concurrent Logs Processor

    View Slide

  90. LogProcessingSupervisor
    FileWatcher
    LogProcessor
    DbWriter
    DiskError Stop
    CorruptFileException Resume
    Db Broken
    Connection
    Exception
    Restart
    DbNodeDownException
    logProcessor
    Stop
    The LogProcessor also watches
    the dbWriter and replaces it once
    it is terminated due to a
    DbNodeDownException
    Akka Concurrent Logs Processor

    View Slide

  91. Benefits of Let it Crash
    •Fault isolation
    •Structure 
    •Redundancy 
    •Replacement 
    •Reboot 
    •Component lifecylcle
    •Suspend 
    •Separation of
    concerns 

    View Slide

  92. Akka - What’s in it?

    View Slide

  93. Akka Layers
    Akka Core
    Akka IO
    Akka Remote (Implements Distribution)
    Akka Cluster (Mobility)
    Akka Cluster Extensions (Pattern
    Singleton, Sharding, PubSub)

    View Slide

  94. Akka provides
    •Local actor
    •Remote actor
    •Scheduling
    •Routing
    •Cluster
    •Cluster Singleton
    •Sharding
    •Persistence
    •Distributed Data

    View Slide

  95. Local Actor on
    JVM1
    Local Actor on JVM

    View Slide

  96. Local Actor on JVM
    Local Actor on
    JVM1
    Coordinator Actor

    View Slide

  97. Local Actor on JVM
    Coordinator Actor
    println(“Hello World”)

    View Slide

  98. class Worker extends Actor {
    def receive = {
    case x =>
    println(x)
    }
    }
    val system = ActorSystem("ExampleActorSystem")
    val workerActorRef = system.actorOf(Props[Worker])
    workerActorRef | "Hello World"

    View Slide

  99. Akka Remoting

    View Slide

  100. Actor on JVM 1 Actor on JVM 2
    Akka Remoting

    View Slide

  101. val workerActorRef = system.actorOf(Props[Worker])
    workerActorRef ! "Hello Conference"
    val workerActorRef =
    context.actorSelection("akka:tcp://
    [email protected]:9005/usr/WorkerActor")
    workerActorRef ! "Hello World"

    View Slide

  102. akka {
    actor {
    provider = "akka.remote.RemoteActorRefProvider"
    }
    remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
    hostname = "127.0.0.1"
    port = 2552
    }
    }
    }

    View Slide

  103. Akka Cluster

    View Slide

  104. Akka Cluster
    •Fault Tolerant Membership service for
    Akka nodes built on top of Akka Remoting
    •No Single point of failure or bottleneck
    •Provides support of load balancing or fail
    over

    View Slide

  105. Akka Cluster
    •Allows dynamically grow or
    shrink number of nodes
    •Actor could reside anywhere in
    the cluster.. local or remote.

    View Slide

  106. Akka Cluster Node Types
    Node
    Cluster
    Leader
    Seed
    Node
    Node
    Node

    View Slide

  107. Cluster
    (node 1, node 2, node 3, node 4)
    Node 1
    Node 2
    Node 3
    Node 4
    User
    a b
    c d e
    User
    a b
    c d e
    User
    a b
    c d e
    User
    a b
    c d e
    This cluster is ring of Nodes
    Every node contains an actor system.
    The actor system needs to have the
    same name to be part of
    the same cluster.
    A list of member nodes
    Is maintained in a
    current cluster state. The actor
    systems gossip to each
    other about this state
    Akka Cluster

    View Slide

  108. Akka Cluster features
    •Cluster membership
    •Load balancing
    •Node partitioning
    •Partition Points
    •Gossip & Convergence
    •Failure Detection

    View Slide

  109. Akka Gossip Protocol
    •Decentralised, probabilistic, viral
    communication protocol with
    convergence
    •Each node hold the state of the
    cluster and tell neighbours about it

    View Slide

  110. Gossip (Communication Between Nodes)
    7
    7
    7
    Convergence

    View Slide

  111. Gossip (Communication Between Nodes)
    7
    7
    7
    Convergence
    7
    7

    View Slide

  112. How do we detect when
    node fails?

    View Slide

  113. Failure Detector

    View Slide

  114. Failure Detector
    7
    !
    7
    Failure Detection
    7
    7

    View Slide

  115. Failure Detector
    7 7
    7
    7
    7 7
    7
    7

    View Slide

  116. Cluster
    Seed nodes (1, 2, 3)
    Master nodes (4, 5)
    Worker nodes (6, 7, 8)
    Node 1:
    Seed Role
    Node 3:
    Seed Role
    Node 2:
    Seed Role
    Node 4:
    Master Role
    Node 7:
    Worker Role
    Node 8:
    Worker Role
    Node 5:
    Master Role
    Node 6:
    Worker Role
    Minimal setup cluster.
    3 seeds
    2 master
    3 workers

    View Slide

  117. Cluster
    Seed Nodes: 1
    Node 1:
    Seed Role

    View Slide

  118. Cluster
    Seed Nodes: (1, 2)
    Joining: (3)
    Node 1:
    Seed Role
    Node 2:
    Seed Role
    Node 3:
    Seed Role
    Join

    View Slide

  119. Cluster
    Seed Nodes: (1, 2, 3)
    Joining nodes: (4, 5)
    Node 1:
    Seed Role
    Node 2:
    Seed Role
    Node 4: Worker Role
    seed list (1, 2, 3)
    Join
    Node 3:
    Seed Role
    Node 5: Master Role
    seed list (1, 2, 3)
    Node 3
    responds fastest
    and handles
    join of node 5
    Node 2
    responds fastest
    and handles
    join of node 4

    View Slide

  120. Cluster
    Seed nodes: (2)
    Master nodes: (5)
    Worker nodes: (4)
    Joining nodes: (6, 7)
    Node 1:
    Seed Role
    Node 2:
    Seed Role
    Node 7: Worker Role
    seed list (1, 2, 3)
    Join
    Node 3:
    Seed Role
    Node 5: Master Role
    seed list (1, 2, 3)
    Node 2
    responds fastest
    and handles
    join of node 4
    Leave
    Leave
    Node 4:
    Worker Role
    Node 5:
    Master Role
    Node 6: Worker
    Role
    seed list (1, 2, 3)
    Join

    View Slide

  121. Cluster
    Seed nodes (1, 2, 3)
    Master nodes (4, 5)
    Worker nodes (6, 7, 8)
    Node 1:
    Seed Role
    Node 3:
    Seed Role
    Node 2:
    Seed Role
    Node 4:
    Master Role
    Node 7:
    Worker Role
    Node 8:
    Worker Role
    Node 5:
    Master Role
    Node 6:
    Worker Role
    Minimal setup cluster.
    3 seeds
    2 master
    3 workers

    View Slide

  122. Cluster
    Leader: Node 1
    Node 1: Leaving
    Node 2: Up
    Node 3: Up
    Seed node 1
    Seed node 2 Seed node 3
    Leave

    View Slide

  123. Cluster
    Leader: Node 1
    Node 1: Exiting
    unreachable
    Node 2: Up
    Node 3: Up
    Seed node 1
    cluster node
    Is shutdown
    Seed node 2 Seed node 3
    Leave

    View Slide

  124. Cluster
    Leader: Node 2
    Node 1: Removed
    Node 2: Up
    Node 3: Up
    Seed node 2 Seed node 3

    View Slide

  125. Joining Up Leaving
    Unreachable
    Down
    Initial state
    Final state
    State in transition
    Key
    Join Leader action Leader
    Exiting
    Leader action
    Leader action
    Down
    Removed

    View Slide

  126. Akka Cluster Patterns
    •Singleton
    •Sharding
    •Routing
    •Distributed Pub Sub

    View Slide

  127. Cluster Singleton

    View Slide

  128. Cluster Singleton
    b
    a
    e
    d
    c
    S

    View Slide

  129. Cluster Singleton
    b
    a
    e
    d
    c
    S

    View Slide

  130. Cluster Singleton
    b e
    d
    c
    S

    View Slide

  131. Akka Cluster Sharding

    View Slide

  132. Cluster Sharding
    •To distribute actors across several
    nodes
    •Interact with them with logical
    identifier (without knowing physical
    location)

    View Slide

  133. Node
    Shard
    E E
    E E
    E
    Shard Region
    Shard
    Coordinator
    Akka Cluster Sharding

    View Slide

  134. Akka Persistence

    View Slide

  135. Akka Persistence
    •Stateful actors to persist their
    internal state.
    •State can be recovered when actor is
    started, restarted

    View Slide

  136. Akka Persistence
    •Implemented using Event Sourcing
    •Only Changes to actor’s internal state
    are stored
    •Current state is never stored directly.

    View Slide

  137. Akka Distributed Data

    View Slide

  138. Distributed Data
    •A set of merge-friendly data types
    to maintain state across cluster

    View Slide

  139. Distributed Data
    •Useful to share data between nodes
    in Akka Cluster
    •Based on Conflict Free Replicated
    Data Types (CRDTs).

    View Slide

  140. Akka makes distributed systems
    less difficult

    View Slide

  141. References
    • Akka in Action - Raymond Roestenburg,
    • https://speakerd.s3.amazonaws.com/presentations/
    9a6b0f62b9ee4dc8980e5ff590f1a6cf/sheridan_college.pdf
    • Distributed Systems in One Lesson http://shop.oreilly.com/product/0636920039518.do
    by Tim Berglund
    • Learning Akka - Salma Khater
    • Hands on Introduction to Distributed Systems Concepts with Akka Clustering - by David
    Russell
    • A tour of the (advanced) Akka features in 60 minutes by Johan Janssen
    • Go distributed (and scale out) with Actors and Akka Clustering

    View Slide

  142. Thank you!

    View Slide

  143. Questions?
    @anildigital

    View Slide