Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Algorithms: Predict Real-World Behavior

Graph Algorithms: Predict Real-World Behavior

Learn why an averages approach fails to describe data in the real world and how graph algorithms can help you predict real-world behavior - or potential trends in the popular series, Game of Thrones! We will start with an overview of which algorithms in Neo4j to apply for various types of optimal paths, influence in a network, and community detection. We will also discuss real-life use cases that span across industries including recommendations, resiliency planning, fraud prevention, and traffic engineering/routing (such as IP and call). These topics will come alive through a live demo using data from the Game of Thrones, where we look at what kinds of information you can retrieve and decisions you can make based on results from different algorithms. From this session, you will gain the knowledge to recognize whether you have a graph analytics problem and how you can get started!

10f2b035856eef2b59c02267e3ee9e63?s=128

Jennifer Reif

July 18, 2019
Tweet

Transcript

  1. Graph Algorithms: 
 Predict Real-World Behavior …or Game of Thrones

    Spoilers! Jennifer Reif Neo4j jennifer.reif@neo4j.com @jmhreif
  2. Jennifer Reif Software Engineer, Neo4j - Developer - Blogger -

    Conference speaker jennifer.reif@neo4j.com @jmhreif A bit about me…. The Right Tool for Real-World Networks
  3. • Who knows what a graph database is? • Who

    knows what Neo4j is? • Who has used Neo4j before? • Who is running Neo4j in production? A bit about you…
  4. Graph Chart

  5. The world is a graph – everything is connected •

    people, places, events • companies, markets • countries, history, politics • sciences, art, teaching • technology, networks, machines, 
 applications, users • software, code, dependencies, 
 architecture, deployments • criminals, fraudsters, and their behavior
  6. What is it used to accomplish? Internal Applications • Master

    Data Management • Network and 
 IT Operations • Fraud Detection Customer-Facing Applications • Real-Time Recommendations • Graph-Based Search • Identity and 
 Access Management
  7. Whiteboard Friendliness Easy to design and model direct representation of

    the model
  8. Whiteboard Friendliness

  9. Whiteboard Friendliness

  10. Graph Data Model

  11. Property Graph Data Model • 2 Main Components: • Nodes

    • Relationships • Additional Components: • Labels • Properties
  12. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels Car Person Person
  13. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person
  14. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction • Properties: • Name-value pairs that can be applied to nodes or relationships Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70”
  15. Apply to Game of Thrones Data

  16. Nodes

  17. Labels

  18. Properties

  19. Relationships

  20. Cypher Query Language…. SQL for graphs

  21. Cypher: Powerful and Expressive CREATE (:Person { name:“Dan”}) -[:LOVES]-> (:Person

    { name:“Ann”}) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  22. Cypher: Powerful and Expressive LOVES Dan Ann MATCH (:Person {

    name:"Dan"} ) -[:LOVES]-> ( whom ) 
 RETURN whom
  23. Graph Algorithms

  24. Photo: US National Archives

  25. Averages 
 Will Fail You

  26. Do You Have a Graph Analytics Problem? Requires Understanding Relationships

    and Structures Flow & Dynamics Interactions & Resiliency Propagation Pathways Forecast Behavior & Prescribe Action
  27. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    “There is No Network in Nature that we know of that would be described by the Random network model.” 
 –Albert-László Barabási
  28. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    Nodes Relationships Power Law Distribution - Scale-Free -
  29. Nodes Relationships Average Distribution
 - Random - Most nodes have

    the same number of links No highly connected nodes - Scale-Free - - Small World - And You’ll Never See the Structures
  30. Graph Algorithms
 Extract Structure and Infer Behavior Source: “Communities, modules

    and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  31. Understand and Predict

  32. Algorithms - Pathfinding & Search • Single-Source Shortest Path ◦

    Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes
  33. Algorithms - Pathfinding & Search Least Cost Routing

  34. Algorithms - Pathfinding & Search • Minimum Weight Spanning Tree

    ◦ Calculates the path with the smallest value for visiting all nodes • Single-Source Shortest Path ◦ Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes
  35. Algorithms - Pathfinding & Search • Minimum Weight Spanning Tree

    ◦ Calculates the path with the smallest value for visiting all nodes • Single-Source Shortest Path ◦ Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes • Parallel Breadth-First Search & Depth-First Search ◦ Traverses tree by exploring nearest neighbors (BFS) or down each branch (DFS)
  36. Algorithms - Centralities • PageRank ◦ Which nodes have the

    most overall influence
  37. Algorithms - Centralities • PageRank ◦ Which nodes have the

    most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  38. Algorithms - Centralities

  39. Algorithms - Centralities • PageRank ◦ Which nodes have the

    most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  40. Algorithms - Centralities • Closeness ◦ Which nodes are able

    to reach entire group the fastest • PageRank ◦ Which nodes have the most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  41. Algorithms - Centralities • Closeness ◦ Which nodes are able

    to reach entire group the fastest • Degree ◦ The number of connections in/out of a node • PageRank ◦ Which nodes have the most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  42. Source: Network Science - Barabasi

  43. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters
  44. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships
  45. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping
  46. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping • Triangle-Count & Clustering Coefficient ◦ Measures the degree that nodes tend to cluster together
  47. Graph Algorithms with Neo4j: Accessing & Executing

  48. 1. Call as Cypher procedure 2. Pass in specification (Label,

    Prop, Query) and configuration 3. Execute and return results A. ~.stream variant returns (a lot) of results
 CALL algo.<name>.stream('Label','TYPE',{conf})
 YIELD nodeId, score B. non-stream variant writes results to graph returns statistics
 CALL algo.<name>('Label','TYPE',{conf}) How To…
  49. Pass in Cypher statement for node- and relationship-lists.
 
 CALL

    algo.<name>(
 'MATCH ... RETURN id(n)',
 'MATCH (n)-->(m) 
 RETURN id(n) as source, 
 id(m) as target', {graph:'cypher'}) Cypher Projection
  50. None
  51. It’s a Graph!

  52. None
  53. None
  54. The data is already a graph…. Why not store it

    in a graph db?!
  55. None
  56. None
  57. • Houses • Relations (families, marriages/betrothals, hookups, etc) • Allegiances

    • Locations / Regions • Actors / Characters • Episodes / Books • Deaths • Battles What kinds of data do we have?
  58. Interactions Link 2 characters each time their names (or nicknames)

    appear within 15 words of one another Interaction could be direct or indirect https://networkofthrones.wordpress.com/from-book-to-network/
  59. Interactions - import • LOAD CSV • Load Characters •

    Create relationships with # of interactions (weight) • Relationship by each book https://networkofthrones.wordpress.com/data/
  60. Interactions - data model

  61. Game of Thrones Demo Time!

  62. Neo4j Sandbox

  63. Neo4j Desktop

  64. Game of Thrones • 800 nodes • 400 relationships •

    Sandbox: Yelp Business Graph • 5m nodes • 17m relationships • GitHub: https://github.com/neo4j-contrib/ neo4j-data-science-yelp/blob/ master/notebooks/ neo4j_yelp_00_data_load.ipynb Neo4j Community Graph • 280k nodes • 1.4m relationships • GitHub: https://github.com/community- graph/documentation Browser: http://138.197.15.1:7474 username: "all" pwd: “readonly” DBPedia • 11m nodes • 116m relationships • Coming soon…. Datasets :play data_science
  65. • Community Site: community.neo4j.com • GitHub: github.com/neo4j-contrib/neo4j-graph-algorithms • Neo4j Desktop:

    https://www.neo4j.com/download • Neo4j Sandbox: https://www.neo4j.com/sandbox-v2/ • Developer Guides: https://www.neo4j.com/developer/get-started • Neo4j developer blog: https://medium.com/neo4j • Interactions data: https://github.com/mathbeveridge/asoiaf Resources jennifer.reif@neo4j.com @jmhreif