Distributed Graph Processing
with Scala and Akka
Adelbert Chang
Saturday, August 3, 13
Slide 2
Slide 2 text
About Me
Saturday, August 3, 13
Slide 3
Slide 3 text
About Me
•4th year student @ UC Santa Barbara
•BS/MS Computer Science
Saturday, August 3, 13
Slide 4
Slide 4 text
About Me
•4th year student @ UC Santa Barbara
•BS/MS Computer Science
•Research Assistant
•Large-scale graph mining and modeling
•Cluster Computing
Saturday, August 3, 13
Slide 5
Slide 5 text
About Me
•4th year student @ UC Santa Barbara
•BS/MS Computer Science
•Research Assistant
•Large-scale graph mining and modeling
•Cluster Computing
•Engineering Analytics Intern @ Box
Saturday, August 3, 13
Slide 6
Slide 6 text
About Me
•4th year student @ UC Santa Barbara
•BS/MS Computer Science
•Research Assistant
•Large-scale graph mining and modeling
•Cluster Computing
•Engineering Analytics Intern @ Box
•Scala since January 2012
Saturday, August 3, 13
Slide 7
Slide 7 text
Outline
Saturday, August 3, 13
Slide 8
Slide 8 text
Outline
•Motivation
Saturday, August 3, 13
Slide 9
Slide 9 text
Outline
•Motivation
•Context and Assumptions
Saturday, August 3, 13
Slide 10
Slide 10 text
Outline
•Motivation
•Context and Assumptions
•User and System Requirements
Saturday, August 3, 13
Slide 11
Slide 11 text
Outline
•Motivation
•Context and Assumptions
•User and System Requirements
•Solution
Saturday, August 3, 13
Slide 12
Slide 12 text
Outline
•Motivation
•Context and Assumptions
•User and System Requirements
•Solution
•Live Demo!
Saturday, August 3, 13
Slide 13
Slide 13 text
Motivation
Saturday, August 3, 13
Slide 14
Slide 14 text
Motivation
•Many of our algorithms are embarassingly
parallel
•Pregel model is good, but too heavy for us
Saturday, August 3, 13
Slide 15
Slide 15 text
Motivation
•Many of our algorithms are embarassingly
parallel
•Pregel model is good, but too heavy for us
•Example: Shortest path
•Split work on nodes
•Run BFS, return a Map[Int, Int]
Saturday, August 3, 13
Slide 16
Slide 16 text
Context + Assumptions
Saturday, August 3, 13
Slide 17
Slide 17 text
Context + Assumptions
•Studying large-scale static graphs, typically
those found in online social networks
Saturday, August 3, 13
Slide 18
Slide 18 text
Context + Assumptions
•Studying large-scale static graphs, typically
those found in online social networks
•Cluster of around 30 machines
Saturday, August 3, 13
Slide 19
Slide 19 text
Context + Assumptions
•Studying large-scale static graphs, typically
those found in online social networks
•Cluster of around 30 machines
•Cluster shares a file system
Saturday, August 3, 13
Slide 20
Slide 20 text
Context + Assumptions
•Studying large-scale static graphs, typically
those found in online social networks
•Cluster of around 30 machines
•Cluster shares a file system
•Graphs are large, but can fit into machine
machine memory
Saturday, August 3, 13
Slide 21
Slide 21 text
Context + Assumptions
•Studying large-scale static graphs, typically
those found in online social networks
•Cluster of around 30 machines
•Cluster shares a file system
•Graphs are large, but can fit into machine
machine memory
•We want “raw” results dumped straight to disk
Saturday, August 3, 13
Slide 22
Slide 22 text
User Requirements
Saturday, August 3, 13
Slide 23
Slide 23 text
User Requirements
•Users should
Saturday, August 3, 13
Slide 24
Slide 24 text
User Requirements
•Users should
•Not have to interact with Akka
Saturday, August 3, 13
Slide 25
Slide 25 text
User Requirements
•Users should
•Not have to interact with Akka
•Only need to define the algorithm and the
input
Saturday, August 3, 13
Slide 26
Slide 26 text
User Requirements
•Users should
•Not have to interact with Akka
•Only need to define the algorithm and the
input
•Be able to put an upper bound on number
of threads per machine
Saturday, August 3, 13
Slide 27
Slide 27 text
System Requirements
Saturday, August 3, 13
Slide 28
Slide 28 text
System Requirements
•The system should
Saturday, August 3, 13
Slide 29
Slide 29 text
System Requirements
•The system should
•Be easy to deploy without any cluster setup
Saturday, August 3, 13
Slide 30
Slide 30 text
System Requirements
•The system should
•Be easy to deploy without any cluster setup
•Be fault tolerant
Saturday, August 3, 13
Slide 31
Slide 31 text
System Requirements
•The system should
•Be easy to deploy without any cluster setup
•Be fault tolerant
•Be elastic
Saturday, August 3, 13
Slide 32
Slide 32 text
System Requirements
•The system should
•Be easy to deploy without any cluster setup
•Be fault tolerant
•Be elastic
•Graph should be loaded locally
Saturday, August 3, 13
Slide 33
Slide 33 text
System Requirements
•The system should
•Be easy to deploy without any cluster setup
•Be fault tolerant
•Be elastic
•Graph should be loaded locally
•Clean up and shut itself down afterwards
Saturday, August 3, 13
Slide 34
Slide 34 text
Inspiration
Saturday, August 3, 13
Slide 35
Slide 35 text
Inspiration
Saturday, August 3, 13
Slide 36
Slide 36 text
Inspiration
Saturday, August 3, 13
Slide 37
Slide 37 text
•Scala + Akka to the rescue!
Inspiration
Saturday, August 3, 13
Slide 38
Slide 38 text
Inspiration
Saturday, August 3, 13
Slide 39
Slide 39 text
Inspiration
•We want a balancing dispatcher for remoting
Saturday, August 3, 13
Slide 40
Slide 40 text
Inspiration
•We want a balancing dispatcher for remoting
•Proxy mailbox is backed by a number of Actors
Saturday, August 3, 13
Slide 41
Slide 41 text
Inspiration
•We want a balancing dispatcher for remoting
•Proxy mailbox is backed by a number of Actors
•Messages are sent to a proxy mailbox
Saturday, August 3, 13
Slide 42
Slide 42 text
Inspiration
•We want a balancing dispatcher for remoting
•Proxy mailbox is backed by a number of Actors
•Messages are sent to a proxy mailbox
•Messages distributed to idle Actors
Saturday, August 3, 13
Slide 43
Slide 43 text
Balancing Dispatcher
http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
Saturday, August 3, 13
Slide 44
Slide 44 text
Solution
Saturday, August 3, 13
Slide 45
Slide 45 text
Solution
•Design the system to act similarly to a balancing
dispatcher
Saturday, August 3, 13
Slide 46
Slide 46 text
Solution
•Design the system to act similarly to a balancing
dispatcher
•A single Actor (Master) represents the
dispatcher
Saturday, August 3, 13
Slide 47
Slide 47 text
Solution
•Design the system to act similarly to a balancing
dispatcher
•A single Actor (Master) represents the
dispatcher
•Each remote Actor (Worker) has it’s own
mailbox
Saturday, August 3, 13
Slide 48
Slide 48 text
Solution
•Design the system to act similarly to a balancing
dispatcher
•A single Actor (Master) represents the
dispatcher
•Each remote Actor (Worker) has it’s own
mailbox
•Workers report to Masters when idle
Saturday, August 3, 13
Slide 49
Slide 49 text
Design Decision
Saturday, August 3, 13
Slide 50
Slide 50 text
Design Decision
•Akka is capable of both remote lookup and
remote deployment
Saturday, August 3, 13
Slide 51
Slide 51 text
Design Decision
•Akka is capable of both remote lookup and
remote deployment
•Remote Deployment
Saturday, August 3, 13
Slide 52
Slide 52 text
Design Decision
•Akka is capable of both remote lookup and
remote deployment
•Remote Deployment
•Master becomes connected to Worker
automatically
Saturday, August 3, 13
Slide 53
Slide 53 text
Design Decision
•Akka is capable of both remote lookup and
remote deployment
•Remote Deployment
•Master becomes connected to Worker
automatically
•Remote lookup
Saturday, August 3, 13
Slide 54
Slide 54 text
Design Decision
•Akka is capable of both remote lookup and
remote deployment
•Remote Deployment
•Master becomes connected to Worker
automatically
•Remote lookup
•Workers can be added/killed at runtime
Saturday, August 3, 13
Slide 55
Slide 55 text
High-Level Design
http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
Saturday, August 3, 13
Slide 56
Slide 56 text
High-Level Design
http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
Saturday, August 3, 13
Slide 57
Slide 57 text
Master
Saturday, August 3, 13
Slide 58
Slide 58 text
Master
Saturday, August 3, 13
Slide 59
Slide 59 text
Master
Saturday, August 3, 13
Slide 60
Slide 60 text
Master
Saturday, August 3, 13
Slide 61
Slide 61 text
Master
Saturday, August 3, 13
Slide 62
Slide 62 text
Master
Saturday, August 3, 13
Slide 63
Slide 63 text
Master
Saturday, August 3, 13
Slide 64
Slide 64 text
Master
Saturday, August 3, 13
Slide 65
Slide 65 text
Master
Saturday, August 3, 13
Slide 66
Slide 66 text
Master
Saturday, August 3, 13
Slide 67
Slide 67 text
Worker
Saturday, August 3, 13
Slide 68
Slide 68 text
Worker
Saturday, August 3, 13
Slide 69
Slide 69 text
Worker
Saturday, August 3, 13
Slide 70
Slide 70 text
Worker
Saturday, August 3, 13
Slide 71
Slide 71 text
Worker
Saturday, August 3, 13
Slide 72
Slide 72 text
Worker
Saturday, August 3, 13
Slide 73
Slide 73 text
Worker
Saturday, August 3, 13
Slide 74
Slide 74 text
Worker
Saturday, August 3, 13
Slide 75
Slide 75 text
Worker
Saturday, August 3, 13
Slide 76
Slide 76 text
Worker
Saturday, August 3, 13
Slide 77
Slide 77 text
Worker
Saturday, August 3, 13
Slide 78
Slide 78 text
Worker
Saturday, August 3, 13
Slide 79
Slide 79 text
Worker
Saturday, August 3, 13
Slide 80
Slide 80 text
Worker
Saturday, August 3, 13
Slide 81
Slide 81 text
Sabre
Saturday, August 3, 13
Slide 82
Slide 82 text
Application
Saturday, August 3, 13
Slide 83
Slide 83 text
Application
Application
Sabre
Master
ResultHandler
Saturday, August 3, 13
Slide 84
Slide 84 text
Application
Application
Sabre
Master
ResultHandler
Sabre.execute()
Saturday, August 3, 13