Jennifer Reif
March 14, 2019
110

# Graph Algorithms: Predict Real-World Behavior

Learn how graph algorithms can help you predict real-world behavior and why an averages approach fails. Find out which algorithms to apply for various types of data analysis. From this session, you will gain the knowledge to recognize whether you have a graph analytics problem and how you can get started.

March 14, 2019

## Transcript

1. ### Graph Algorithms:   Predict Real-World Behavior Groups and Routes and

Flows - Oh My!   Jennifer Reif jennifer.reif@neo4j.com @jmhreif
2. ### Jennifer Reif Software Engineer, Neo4j - Developer - Blogger -

Conference speaker Love cats, coffee, and traveling :) jennifer.reif@neo4j.com @jmhreif Graph Algorithms The Right Tool for Real-World Networks

5. ### Do You Have a Graph Analytics Problem? Requires Understanding Relationships

and Structures Flow & Dynamics Interactions & Resiliency Propagation Pathways Forecast Behavior & Prescribe Action
6. ### Averages Aren’t Reality Nodes Relationships Average Distribution  - Random -

“There is No Network in Nature that we know of that would be described by the Random network model.”   –Albert-László Barabási
7. ### Averages Aren’t Reality Nodes Relationships Average Distribution  - Random -

Nodes Relationships Power Law Distribution - Scale-Free -
8. ### Most nodes have the same number of links No highly

connected nodes - Scale-Free - - Small World - And You’ll Never See the Structures Nodes Relationships Average Distribution  - Random -
9. ### Graph Algorithms  Extract Structure and Infer Behavior Source: “Communities, modules

and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.

11. ### Graph Algorithms Finds the optimal path or evaluates route availability

and quality Pathfinding   & Search Determines the importance of distinct nodes in the network Centrality Evaluates how a group is clustered or partitioned Community Detection
12. ### • Single-Source Shortest Path ◦ Calculates “shortest” path between a

node and all other nodes Algorithms - Pathfinding & Search • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes
13. ### • Single-Source Shortest Path ◦ Calculates path between a node

and all other nodes Algorithms - Pathfinding & Search • All-Pairs Shortest Path ◦ Calculates shortest path group with all shortest paths between nodes • Minimum Weight Spanning Tree ◦ Calculates the path with the smallest value for visiting all nodes Least Cost Routing
14. ### • Single-Source Shortest Path ◦ Calculates path between a node

and all other nodes Algorithms - Pathfinding & Search • All-Pairs Shortest Path ◦ Calculates shortest path group with all shortest paths between nodes • Minimum Weight Spanning Tree ◦ Calculates the path with the smallest value for visiting all nodes
15. ### • Parallel Breadth-First Search & Depth-First Search ◦ Traverses tree

structure by exploring nearest neighbors (BFS) or down each branch (DFS) • Single-Source Shortest Path ◦ Calculates path between a node and all other nodes Algorithms - Pathfinding & Search • All-Pairs Shortest Path ◦ Calculates shortest path group with all shortest paths between nodes • Minimum Weight Spanning Tree ◦ Calculates the path with the smallest value for visiting all nodes
16. ### Algorithms - Centralities • PageRank ◦ Which nodes have the

most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
17. ### Algorithms - Centralities • PageRank ◦ Which nodes have the

most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
18. ### Algorithms - Centralities • PageRank ◦ Which nodes have the

most overall influence • Closeness ◦ Which nodes are able to reach entire group the fastest
19. ### Algorithms - Centralities • PageRank ◦ Which nodes have the

most overall influence • Closeness ◦ Which nodes are able to reach entire group the fastest • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths) Source: Maven 7
20. ### Algorithms - Centralities • PageRank ◦ Which nodes have the

most overall influence • Closeness ◦ Which nodes are able to reach entire group the fastest • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths) • Degree ◦ The number of connections in/out of a node

22. ### Understanding Influence Source: “Robustness of the European power grids under

intentional attack.” - R.V. Sole, M. Rosas-Casals, B. Corominas-Murtra, and S. Valverde. Source: “Network Science” - Barabasi Preventing   Cascading Failures with   4 Nodes Removed
23. ### Algorithms – Community Detection • Label Propagation ◦ Spreads labels

based on neighbors to infer clusters
24. ### Algorithms – Community Detection • Label Propagation ◦ Spreads labels

based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other  • Strongly Connected Components ◦ Finds groups of nodes that are all connected   to each other following the   direction of relationships
25. ### Algorithms – Community Detection • Label Propagation ◦ Spreads labels

based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other  • Strongly Connected Components ◦ Finds groups of nodes that are all connected   to each other following the   direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping Source: “Fast unfolding of communities in large networks” – Blondel, Guillaume, Lambiotte, Lefebvre
26. ### Algorithms – Community Detection • Label Propagation ◦ Spreads labels

based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other  • Strongly Connected Components ◦ Finds groups of nodes that are all connected   to each other following the   direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping • Triangle-Count & Clustering Coefficient ◦ Measures the degree that nodes tend to cluster together
27. ### 18 Graph Algorithms Apply to All Real-World Networks Where You

Need to Predict Complex Interactions Anti Money Laundering Recommendations Terrorist Networks Credit-Checks Fraud Prevention Cybersecurity Network Design PoS Profitability Alternate Routing Urban Resource Placement Theory Generation ML Feature Extraction Disease Spread Rippling Travel / Logistic Delays Drug Gene- Targeting
28. ### Graphs are one of the Unifying Themes of computer science

. . .     That so many different structures  can be modeled using a single formalism  is a Source of Great Power   to the educated programmer.”     - Steven S. Skiena “

30. ### Many Moving Parts! Example Workflow Pipeline based on John Swain’s

Twitter Analysis Twitter   Streaming API Python Tweet Collection   (includes user data) Rabbit MQ MongoDB Neo4j R Scripts  -Graph Stats -Community Detection MySQL Graph .gra phml Tableau Graph Visualization Moved from Twitter Search API to Streaming API Replaced Python Twitter libraries (Tweepy) with raw API calls Streaming tweets in message queue Full tweets and user data stored in MongoDB Built graph for analysis in Neo4j from tweets persisted in MongoDB Analysis in R iGraph libraries for algorithms Some text analysis e.g. LDA topics Results published in MySQL for Tableau Graphml for import to Gephi with stats precalculated
31. ### Our Goal is to Simplify Twitter   Streaming API Python

Tweet Collection   (includes user data) Rabbit MQ MongoDB Neo4j R Scripts  -Graph Stats -Community Detection MySQL Graph .gra phml Tableau Graph Visualization Example Workflow Pipeline based on John Swain’s Twitter Analysis

33. ### Neo4j   Native Graph Database Analytics Integrations Cypher Query Language

Wide Range of  APOC Procedures Optimized   Graph Algorithms
34. ### 1. Call as Cypher procedure 2. Pass in specification (Label,

Prop, Query) and configuration 3. Execute and return results A. ~.stream variant returns (a lot) of results  CALL algo.<name>.stream('Label','TYPE',{conf})  YIELD nodeId, score B. non-stream variant writes results to graph returns statistics  CALL algo.<name>('Label','TYPE',{conf}) How To…
35. ### Pass in Cypher statement for node- and relationship-lists.    CALL

algo.<name>(  'MATCH ... RETURN id(n)',  'MATCH (n)-->(m)   RETURN id(n) as source,   id(m) as target', {graph:'cypher'}) Cypher Projection

Time!

40. ### Game of Thrones • 800 nodes • 400 relationships •

Sandbox: Yelp Business Graph • 5m nodes • 17m relationships • GitHub: https://github.com/neo4j- contrib/neo4j-data-science- yelp/blob/master/notebooks/ neo4j_yelp_00_data_load.ipynb Neo4j Community Graph • 280k nodes • 1.4m relationships • GitHub: https://github.com/community- graph/documentation Browser: http://138.197.15.1:7474 username: "all" pwd: “readonly” DBPedia • 11m nodes • 116m relationships • https://github.com/jexp/ graphipedia Datasets :play data_science

Neo4j
42. ### • Community Site: community.neo4j.com • YouTube: youtube.com/neo4j • GitHub: github.com/neo4j-contrib/neo4j-graph-algorithms

• Whitepaper: neo4j.com/graph-analytics jennifer.reif@neo4j.com @jmhreif