Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Computation: dealing with Time and Failure in the wild

D2ab8b4fe9ddebc6774777e5f4549304?s=47 ryan lemmer
October 10, 2014

Distributed Computation: dealing with Time and Failure in the wild

FuConf, 2014, Bangalore

D2ab8b4fe9ddebc6774777e5f4549304?s=128

ryan lemmer

October 10, 2014
Tweet

Transcript

  1. @ryanlemmer Cape Town Distributed Computation Time and Failure in the

    Wild FuConf Bangalore 2014 1 Friday 10 October 14
  2. * Distributed Programming with Storm + Akka * Distributed +

    Functional? This talk * Focus on Realtime (not Batch) 2 Friday 10 October 14
  3. 3 Friday 10 October 14

  4. journals DB process each journal for analytics, save to analytics

    DB Use Case: Analytics search DB analytics DB 4 Friday 10 October 14
  5. journals DB search DB for j in journals j1 =

    enrich(j) j2 = transform(j1) analytics-save(j2) search-index(j2) analytics DB Sequential Execution 5 Friday 10 October 14
  6. parallel-for j in journals search DB analytics DB journals DB

    Parallel Execution j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) 6 Friday 10 October 14
  7. journals DB search DB analytics DB Distributed Execution j1  =

      enrich( j1  =   enrich( j1  =   enrich( j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) j1  =   enrich( j1  =   enrich( j1  =   enrich( j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) j1  =   enrich( j1  =   enrich( j1  =   enrich( j1  =  enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) 7 Friday 10 October 14
  8. “REALTIME” “FAULT TOLERANT” “SCALABLE” runs continuously has a plan for

    when things go wrong distributed Apache Storm 8 Friday 10 October 14
  9. enrich transform analytics-save search-index next- journal Apache Storm j1  =

     enrich(j) j2  =  transform(j1) analytics-­‐save(j2) search-­‐index(j2) 9 Friday 10 October 14
  10. enrich transform analytics-save search-index next- journal SPOUT BOLT BOLT BOLT

    BOLT Spouts + Bolts 10 Friday 10 October 14
  11. enrich transform analytics-save search-index next- journal [“J323” {‘amt’: 107.43, ...}

    [“J323” {‘$amt’: 15.70, ...} [“J323” {‘K-ratio’: 42.11, ...} data model: tuples 11 Friday 10 October 14
  12. (defspout  client-­‐spout  ["entity"  “values”]    [conf  context  collector]    (let

     [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id])))) clojure spout 12 Friday 10 October 14
  13. (defbolt  transform-­‐client-­‐bolt  ["client"]              

     {:prepare  true}                [conf  context  collector]        (bolt          (execute  [tuple]              (let  [h  (.getValue  tuple  1)]                  (emit-­‐bolt!  collector  [(transform-­‐tuple  h)])                  (ack!  collector  tuple))))) clojure bolt 13 Friday 10 October 14
  14. enrich transform analytics-save search-index next- journal storm topology 14 Friday

    10 October 14
  15. enrich transform analytics-save search-index next- journal ‘p’:  1 ‘p’:  3

    ‘p’:  3 ‘p’:  5 ‘p’:  5 storm parallelism 15 Friday 10 October 14
  16. enrich transform analytics-save search-index next- journal ‘p’:  1 ‘p’:  3

    ‘p’:  3 ‘p’:  5 ‘p’:  5 storm grouping ‘shuffle’ ‘shuffle’ ‘shuffle’ 16 Friday 10 October 14
  17. enrich transform analytics-save search-index next- journal ‘p’:  1 ‘p’:  3

    ‘p’:  3 ‘p’:  5 ‘p’:  5 fault tolerance ‘shuffle’ ‘shuffle’ ‘shuffle’ 17 Friday 10 October 14
  18. fault tolerance 18 Friday 10 October 14

  19. enrich transform analytics-save search-index next- journal x2 side-effects! idempotence 19

    Friday 10 October 14
  20. enrich transform analytics-save search-index next- journal x1 transactional topologies x1

    x1 x1 x1 run-once semantics strong ordering on data processing Storm Trident 20 Friday 10 October 14
  21. (queue) (queue) stream computing 21 Friday 10 October 14

  22. stream computing * stream processing * realtime analytics * continuous

    computation * distributed RPC ... 22 Friday 10 October 14
  23. streaming soup Apache Storm Apache SAMZA Spark Streaming Nokia Dempsy

    Esper Streambase Akka Streams Cambrian explosion! 23 Friday 10 October 14
  24. lambda architectures new data Batch Processor Realtime Processor merged view

    24 Friday 10 October 14
  25. “REALTIME” “FAULT TOLERANT” “SCALABLE” runs continuously “let it crash” Actor

    Model Fault Tolerance scale up (concurrency), scale out (distributed), elastic AKKA 25 Friday 10 October 14
  26. class  Account  {        private  var  balance  =

     0        def  add(num:  Int):  Int  =  {            balance  +=  num}        def  rem(num:  Int):  Int  =  {            balance  -­‐=  num}} account.add(100) account.add(50) account.rem(40) OO: Single threaded 26 Friday 10 October 14
  27. account.add(100) account.add(50) account.rem(40) OO: Multi-threaded class  Account  {    

       private  var  balance  =  0        def  add(num:  Int):  Int  =  {            balance  +=  num}        def  rem(num:  Int):  Int  =  {            balance  -­‐=  num}} 27 Friday 10 October 14
  28. account.add(100) account.add(50) account.rem(40) What if? class  Account  {    

       private  var  balance  =  0        def  add(num:  Int):  Int  =  {            balance  +=  num}        def  rem(num:  Int):  Int  =  {            balance  -­‐=  num}} 28 Friday 10 October 14
  29. class  Account  extends  Actor{        var  balance  =

     0        def  receive  =  {            case  Add(amt:Int)  =>                balance  +=  num            case  Rem(amt:  Int)  =>                balance  -­‐=  num}} Actor Messages account ! Add(100) account ! Add(50) account ! Rem(40) MAILBOX 29 Friday 10 October 14
  30. enrich transform analytics-save search-index next- journal ‘p’:  1 ‘p’:  3

    ‘p’:  3 ‘p’:  5 ‘p’:  5 ‘shuffle’ ‘shuffle’ ‘shuffle’ Actor Streaming (naive) 30 Friday 10 October 14
  31. class  JournalGen  extends  Actor{    val  router  =    Router(RandomRoutingLogic(),

                                                   [enrich1,  enrich2,  enrich3])      def  receive  =  {            case  NextJournal(journalQ)  =>                journal  =  journalQ.pop()                router.route(Enrich(journal),  sender()) }} enrich Journal Gen ‘random’ enrich Enrich Actor Streaming (naive) 31 Friday 10 October 14
  32. enrich Journal Gen enrich Enrich class  Enrich  extends  Actor{  

     def  receive  =  {            case  Enrich(journal)  =>                j  =  enrich(journal)                transform  !  j }} Transform Transform Transform ‘random’ Actor Streaming (naive) 32 Friday 10 October 14
  33. enrich transform next- journal ‘random’ enrich enrich transform transform analytics-save

    analytics-save search-index search-index analytics-save analytics-save analytics-save analytics-save search-index search-index search-index search-index ‘round robin’ ‘round robin’ Actor Streaming (naive) 33 Friday 10 October 14
  34. ERROR! enrich transform next- journal enrich enrich transform transform analytics-save

    analytics-save search-index search-index analytics-save analytics-save analytics-save analytics-save search-index search-index search-index search-index Fault tolerance 34 Friday 10 October 14
  35. A Supervisor can: RESUME RESTART STOP ESCALATE (FAIL) ERROR! transform

    analytics-save analytics-save 2 strategies: OneForOne or AllForOne Fault tolerance 35 Friday 10 October 14
  36. override  val  supervisorStrategy  =    OneForOneStrategy(maxNrOfRetries  =  10,    

                                         withinTimeRange  =  1  minute)  {        case  _:  ThisException        =>  Resume        case  _:  ThatException        =>  Restart        case  _:  AnotherException  =>  Stop        case  _:  Exception                =>  Escalate } Fault tolerance 36 Friday 10 October 14
  37. OO vs Actor Model Communicate via Methods Communicate via Messages

    Synchronous “fire and forget” Shared State + Behaviour Local State + Behaviour Local location transparent ask tell 37 Friday 10 October 14
  38. * single responsibility Actors * find the “right” granularity for

    - Messages - Actor Hierarchies - failure zones Designing with Actors 38 Friday 10 October 14
  39. * Work Distribution (incl. Streaming) * Domain-driven actor apps -

    Actors => Entities - Actor Hierarchies => Aggregates - Actor Messages => Domain Events Actors: problem space 39 Friday 10 October 14
  40. Storm vs Akka Stream computation Actor Concurrency High level abstraction

    Low level, more powerful Topology: static Dynamic topology Directed graph 2-way Heavy bolts, spouts Lightweight Actors 40 Friday 10 October 14
  41. Reactive Manifesto * interactive * fault tolerant * scalable time

    for a manifesto! 41 Friday 10 October 14
  42. AKKA Streams Reactive Streams JVM Standard for async, distributed, stream

    processing 42 Friday 10 October 14
  43. Time, State, Failure It’s about the Order of events. Minimise

    enforced order! Time 43 Friday 10 October 14
  44. Time, State, Failure It’s about the Order of events. Minimise

    enforced order! Time It’s Change of State that hurts most. Minimise Change! (immutability) State 44 Friday 10 October 14
  45. Time, State, Failure It’s about the Order of events. Minimise

    enforced order! Time Embrace Failure, plan for it. Failure is a first class citizen. Fault Tolerance State It’s Change of State that hurts most. Minimise Change! (immutability) 45 Friday 10 October 14
  46. Distributed+functional Concurrency Oriented Programming Languages * concurrent * fault tolerant

    * scalable 46 Friday 10 October 14
  47. Distributed, the future? CDRT’s “ a data type whose operations

    commute when they are concurrent. Replicas eventually converge without any complex concurrency control” “A comprehensive study of Convergent and Commutative Replicated Data Types” - Letia et. al. - 2009 “ACID 2.0” 47 Friday 10 October 14
  48. @ryanlemmer Cape Town Thank YOU FuConf Bangalore 2014 48 Friday

    10 October 14