Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Graph Processing with Scala and Akka (SVSS '13)

Distributed Graph Processing with Scala and Akka (SVSS '13)

In recent years, the boom of online social networks such as Facebook and Twitter have presented several interesting problems, especially in regards to their massive underlying graph structures. With such a large and rich dataset, it is clearly beneficial to leverage these graphs to power features like friendship recommendation. However, dealing with so much data in a scalable manner is difficult, and considerable amounts of engineering and research efforts have gone into solving this problem, manifesting into systems such as Pregel and graph databases (e.g. Neo4J).

As a research assistant during the school year at UC Santa Barbara, I have implemented a distributed graph processing system for use on the lab cluster. The system is designed specifically for trivially parallelizable graph algorithms (which most algorithms I've run in the lab are). Lots of code will be shown, both of the system and of applications written using the system. I will also talk briefly about what I have planned for future iterations of the system.

Adelbert Chang

August 03, 2013
Tweet

More Decks by Adelbert Chang

Other Decks in Programming

Transcript

  1. Distributed Graph Processing
    with Scala and Akka
    Adelbert Chang
    Saturday, August 3, 13

    View Slide

  2. About Me
    Saturday, August 3, 13

    View Slide

  3. About Me
    •4th year student @ UC Santa Barbara
    •BS/MS Computer Science
    Saturday, August 3, 13

    View Slide

  4. About Me
    •4th year student @ UC Santa Barbara
    •BS/MS Computer Science
    •Research Assistant
    •Large-scale graph mining and modeling
    •Cluster Computing
    Saturday, August 3, 13

    View Slide

  5. About Me
    •4th year student @ UC Santa Barbara
    •BS/MS Computer Science
    •Research Assistant
    •Large-scale graph mining and modeling
    •Cluster Computing
    •Engineering Analytics Intern @ Box
    Saturday, August 3, 13

    View Slide

  6. About Me
    •4th year student @ UC Santa Barbara
    •BS/MS Computer Science
    •Research Assistant
    •Large-scale graph mining and modeling
    •Cluster Computing
    •Engineering Analytics Intern @ Box
    •Scala since January 2012
    Saturday, August 3, 13

    View Slide

  7. Outline
    Saturday, August 3, 13

    View Slide

  8. Outline
    •Motivation
    Saturday, August 3, 13

    View Slide

  9. Outline
    •Motivation
    •Context and Assumptions
    Saturday, August 3, 13

    View Slide

  10. Outline
    •Motivation
    •Context and Assumptions
    •User and System Requirements
    Saturday, August 3, 13

    View Slide

  11. Outline
    •Motivation
    •Context and Assumptions
    •User and System Requirements
    •Solution
    Saturday, August 3, 13

    View Slide

  12. Outline
    •Motivation
    •Context and Assumptions
    •User and System Requirements
    •Solution
    •Live Demo!
    Saturday, August 3, 13

    View Slide

  13. Motivation
    Saturday, August 3, 13

    View Slide

  14. Motivation
    •Many of our algorithms are embarassingly
    parallel
    •Pregel model is good, but too heavy for us
    Saturday, August 3, 13

    View Slide

  15. Motivation
    •Many of our algorithms are embarassingly
    parallel
    •Pregel model is good, but too heavy for us
    •Example: Shortest path
    •Split work on nodes
    •Run BFS, return a Map[Int, Int]
    Saturday, August 3, 13

    View Slide

  16. Context + Assumptions
    Saturday, August 3, 13

    View Slide

  17. Context + Assumptions
    •Studying large-scale static graphs, typically
    those found in online social networks
    Saturday, August 3, 13

    View Slide

  18. Context + Assumptions
    •Studying large-scale static graphs, typically
    those found in online social networks
    •Cluster of around 30 machines
    Saturday, August 3, 13

    View Slide

  19. Context + Assumptions
    •Studying large-scale static graphs, typically
    those found in online social networks
    •Cluster of around 30 machines
    •Cluster shares a file system
    Saturday, August 3, 13

    View Slide

  20. Context + Assumptions
    •Studying large-scale static graphs, typically
    those found in online social networks
    •Cluster of around 30 machines
    •Cluster shares a file system
    •Graphs are large, but can fit into machine
    machine memory
    Saturday, August 3, 13

    View Slide

  21. Context + Assumptions
    •Studying large-scale static graphs, typically
    those found in online social networks
    •Cluster of around 30 machines
    •Cluster shares a file system
    •Graphs are large, but can fit into machine
    machine memory
    •We want “raw” results dumped straight to disk
    Saturday, August 3, 13

    View Slide

  22. User Requirements
    Saturday, August 3, 13

    View Slide

  23. User Requirements
    •Users should
    Saturday, August 3, 13

    View Slide

  24. User Requirements
    •Users should
    •Not have to interact with Akka
    Saturday, August 3, 13

    View Slide

  25. User Requirements
    •Users should
    •Not have to interact with Akka
    •Only need to define the algorithm and the
    input
    Saturday, August 3, 13

    View Slide

  26. User Requirements
    •Users should
    •Not have to interact with Akka
    •Only need to define the algorithm and the
    input
    •Be able to put an upper bound on number
    of threads per machine
    Saturday, August 3, 13

    View Slide

  27. System Requirements
    Saturday, August 3, 13

    View Slide

  28. System Requirements
    •The system should
    Saturday, August 3, 13

    View Slide

  29. System Requirements
    •The system should
    •Be easy to deploy without any cluster setup
    Saturday, August 3, 13

    View Slide

  30. System Requirements
    •The system should
    •Be easy to deploy without any cluster setup
    •Be fault tolerant
    Saturday, August 3, 13

    View Slide

  31. System Requirements
    •The system should
    •Be easy to deploy without any cluster setup
    •Be fault tolerant
    •Be elastic
    Saturday, August 3, 13

    View Slide

  32. System Requirements
    •The system should
    •Be easy to deploy without any cluster setup
    •Be fault tolerant
    •Be elastic
    •Graph should be loaded locally
    Saturday, August 3, 13

    View Slide

  33. System Requirements
    •The system should
    •Be easy to deploy without any cluster setup
    •Be fault tolerant
    •Be elastic
    •Graph should be loaded locally
    •Clean up and shut itself down afterwards
    Saturday, August 3, 13

    View Slide

  34. Inspiration
    Saturday, August 3, 13

    View Slide

  35. Inspiration
    Saturday, August 3, 13

    View Slide

  36. Inspiration
    Saturday, August 3, 13

    View Slide

  37. •Scala + Akka to the rescue!
    Inspiration
    Saturday, August 3, 13

    View Slide

  38. Inspiration
    Saturday, August 3, 13

    View Slide

  39. Inspiration
    •We want a balancing dispatcher for remoting
    Saturday, August 3, 13

    View Slide

  40. Inspiration
    •We want a balancing dispatcher for remoting
    •Proxy mailbox is backed by a number of Actors
    Saturday, August 3, 13

    View Slide

  41. Inspiration
    •We want a balancing dispatcher for remoting
    •Proxy mailbox is backed by a number of Actors
    •Messages are sent to a proxy mailbox
    Saturday, August 3, 13

    View Slide

  42. Inspiration
    •We want a balancing dispatcher for remoting
    •Proxy mailbox is backed by a number of Actors
    •Messages are sent to a proxy mailbox
    •Messages distributed to idle Actors
    Saturday, August 3, 13

    View Slide

  43. Balancing Dispatcher
    http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
    Saturday, August 3, 13

    View Slide

  44. Solution
    Saturday, August 3, 13

    View Slide

  45. Solution
    •Design the system to act similarly to a balancing
    dispatcher
    Saturday, August 3, 13

    View Slide

  46. Solution
    •Design the system to act similarly to a balancing
    dispatcher
    •A single Actor (Master) represents the
    dispatcher
    Saturday, August 3, 13

    View Slide

  47. Solution
    •Design the system to act similarly to a balancing
    dispatcher
    •A single Actor (Master) represents the
    dispatcher
    •Each remote Actor (Worker) has it’s own
    mailbox
    Saturday, August 3, 13

    View Slide

  48. Solution
    •Design the system to act similarly to a balancing
    dispatcher
    •A single Actor (Master) represents the
    dispatcher
    •Each remote Actor (Worker) has it’s own
    mailbox
    •Workers report to Masters when idle
    Saturday, August 3, 13

    View Slide

  49. Design Decision
    Saturday, August 3, 13

    View Slide

  50. Design Decision
    •Akka is capable of both remote lookup and
    remote deployment
    Saturday, August 3, 13

    View Slide

  51. Design Decision
    •Akka is capable of both remote lookup and
    remote deployment
    •Remote Deployment
    Saturday, August 3, 13

    View Slide

  52. Design Decision
    •Akka is capable of both remote lookup and
    remote deployment
    •Remote Deployment
    •Master becomes connected to Worker
    automatically
    Saturday, August 3, 13

    View Slide

  53. Design Decision
    •Akka is capable of both remote lookup and
    remote deployment
    •Remote Deployment
    •Master becomes connected to Worker
    automatically
    •Remote lookup
    Saturday, August 3, 13

    View Slide

  54. Design Decision
    •Akka is capable of both remote lookup and
    remote deployment
    •Remote Deployment
    •Master becomes connected to Worker
    automatically
    •Remote lookup
    •Workers can be added/killed at runtime
    Saturday, August 3, 13

    View Slide

  55. High-Level Design
    http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
    Saturday, August 3, 13

    View Slide

  56. High-Level Design
    http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
    Saturday, August 3, 13

    View Slide

  57. Master
    Saturday, August 3, 13

    View Slide

  58. Master
    Saturday, August 3, 13

    View Slide

  59. Master
    Saturday, August 3, 13

    View Slide

  60. Master
    Saturday, August 3, 13

    View Slide

  61. Master
    Saturday, August 3, 13

    View Slide

  62. Master
    Saturday, August 3, 13

    View Slide

  63. Master
    Saturday, August 3, 13

    View Slide

  64. Master
    Saturday, August 3, 13

    View Slide

  65. Master
    Saturday, August 3, 13

    View Slide

  66. Master
    Saturday, August 3, 13

    View Slide

  67. Worker
    Saturday, August 3, 13

    View Slide

  68. Worker
    Saturday, August 3, 13

    View Slide

  69. Worker
    Saturday, August 3, 13

    View Slide

  70. Worker
    Saturday, August 3, 13

    View Slide

  71. Worker
    Saturday, August 3, 13

    View Slide

  72. Worker
    Saturday, August 3, 13

    View Slide

  73. Worker
    Saturday, August 3, 13

    View Slide

  74. Worker
    Saturday, August 3, 13

    View Slide

  75. Worker
    Saturday, August 3, 13

    View Slide

  76. Worker
    Saturday, August 3, 13

    View Slide

  77. Worker
    Saturday, August 3, 13

    View Slide

  78. Worker
    Saturday, August 3, 13

    View Slide

  79. Worker
    Saturday, August 3, 13

    View Slide

  80. Worker
    Saturday, August 3, 13

    View Slide

  81. Sabre
    Saturday, August 3, 13

    View Slide

  82. Application
    Saturday, August 3, 13

    View Slide

  83. Application
    Application
    Sabre
    Master
    ResultHandler
    Saturday, August 3, 13

    View Slide

  84. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    Saturday, August 3, 13

    View Slide

  85. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf
    Saturday, August 3, 13

    View Slide

  86. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Saturday, August 3, 13

    View Slide

  87. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    Saturday, August 3, 13

    View Slide

  88. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkerCreated
    Saturday, August 3, 13

    View Slide

  89. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    DoAlgorithm
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  90. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkIsReady
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  91. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkerRequestsWork
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  92. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkToBeDone
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  93. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  94. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    HandleResult
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  95. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkComplete
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  96. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkIsDone
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  97. Application
    Application
    Sabre
    Master
    ResultHandler
    Sabre.execute()
    system.actorOf system.actorOf
    Worker Worker
    WorkIsDone
    Worker Worker Worker
    Application
    Sabre
    Master
    ResultHandler
    Worker Worker
    Saturday, August 3, 13

    View Slide

  98. Future Work
    Saturday, August 3, 13

    View Slide

  99. Future Work
    •Typed channels
    Saturday, August 3, 13

    View Slide

  100. Future Work
    •Typed channels
    •Akka Clustering
    Saturday, August 3, 13

    View Slide

  101. Future Work
    •Typed channels
    •Akka Clustering
    •Typesafe Developer Console
    Saturday, August 3, 13

    View Slide

  102. Live Demo!
    Saturday, August 3, 13

    View Slide

  103. EOF
    @adelbertchang
    [email protected]
    Saturday, August 3, 13

    View Slide

  104. EOF
    @adelbertchang
    [email protected]
    Questions?
    Saturday, August 3, 13

    View Slide