Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Algorithms: Predict Real-World Behavior

Graph Algorithms: Predict Real-World Behavior

Learn why an averages approach fails to describe data in the real world and how graph algorithms can help you predict real-world behavior - or potential trends in the popular series, Game of Thrones! We will start with an overview of which algorithms in Neo4j to apply for various types of optimal paths, influence in a network, and community detection. We will also discuss real-life use cases that span across industries including recommendations, resiliency planning, fraud prevention, and traffic engineering/routing (such as IP and call). These topics will come alive through a live demo using data from the Game of Thrones, where we look at what kinds of information you can retrieve and decisions you can make based on results from different algorithms. From this session, you will gain the knowledge to recognize whether you have a graph analytics problem and how you can get started!

Jennifer Reif

July 18, 2019
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Jennifer Reif Software Engineer, Neo4j - Developer - Blogger -

    Conference speaker [email protected] @jmhreif A bit about me…. The Right Tool for Real-World Networks
  2. • Who knows what a graph database is? • Who

    knows what Neo4j is? • Who has used Neo4j before? • Who is running Neo4j in production? A bit about you…
  3. The world is a graph – everything is connected •

    people, places, events • companies, markets • countries, history, politics • sciences, art, teaching • technology, networks, machines, 
 applications, users • software, code, dependencies, 
 architecture, deployments • criminals, fraudsters, and their behavior
  4. What is it used to accomplish? Internal Applications • Master

    Data Management • Network and 
 IT Operations • Fraud Detection Customer-Facing Applications • Real-Time Recommendations • Graph-Based Search • Identity and 
 Access Management
  5. Property Graph Data Model • 2 Main Components: • Nodes

    • Relationships • Additional Components: • Labels • Properties
  6. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels Car Person Person
  7. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person
  8. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction • Properties: • Name-value pairs that can be applied to nodes or relationships Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70”
  9. Cypher: Powerful and Expressive CREATE (:Person { name:“Dan”}) -[:LOVES]-> (:Person

    { name:“Ann”}) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  10. Cypher: Powerful and Expressive LOVES Dan Ann MATCH (:Person {

    name:"Dan"} ) -[:LOVES]-> ( whom ) 
 RETURN whom
  11. Do You Have a Graph Analytics Problem? Requires Understanding Relationships

    and Structures Flow & Dynamics Interactions & Resiliency Propagation Pathways Forecast Behavior & Prescribe Action
  12. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    “There is No Network in Nature that we know of that would be described by the Random network model.” 
 –Albert-László Barabási
  13. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    Nodes Relationships Power Law Distribution - Scale-Free -
  14. Nodes Relationships Average Distribution
 - Random - Most nodes have

    the same number of links No highly connected nodes - Scale-Free - - Small World - And You’ll Never See the Structures
  15. Graph Algorithms
 Extract Structure and Infer Behavior Source: “Communities, modules

    and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  16. Algorithms - Pathfinding & Search • Single-Source Shortest Path ◦

    Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes
  17. Algorithms - Pathfinding & Search • Minimum Weight Spanning Tree

    ◦ Calculates the path with the smallest value for visiting all nodes • Single-Source Shortest Path ◦ Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes
  18. Algorithms - Pathfinding & Search • Minimum Weight Spanning Tree

    ◦ Calculates the path with the smallest value for visiting all nodes • Single-Source Shortest Path ◦ Calculates “shortest” path between a node and all other nodes • All-Pairs Shortest Path ◦ Finds all shortest paths between all nodes • Parallel Breadth-First Search & Depth-First Search ◦ Traverses tree by exploring nearest neighbors (BFS) or down each branch (DFS)
  19. Algorithms - Centralities • PageRank ◦ Which nodes have the

    most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  20. Algorithms - Centralities • PageRank ◦ Which nodes have the

    most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  21. Algorithms - Centralities • Closeness ◦ Which nodes are able

    to reach entire group the fastest • PageRank ◦ Which nodes have the most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  22. Algorithms - Centralities • Closeness ◦ Which nodes are able

    to reach entire group the fastest • Degree ◦ The number of connections in/out of a node • PageRank ◦ Which nodes have the most overall influence • Betweenness ◦ Which nodes are the bridges between different clusters (most shortest paths)
  23. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships
  24. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping
  25. Algorithms - Community Detection • Label Propagation ◦ Spreads labels

    based on neighbors to infer clusters • Union Find / Weakly Connected Components ◦ Finds groups of nodes that all have a path to each other
 • Strongly Connected Components ◦ Finds groups of nodes that are all connected 
 to each other following the 
 direction of relationships • Louvain Modularity ◦ Measures the presumed accuracy of community grouping • Triangle-Count & Clustering Coefficient ◦ Measures the degree that nodes tend to cluster together
  26. 1. Call as Cypher procedure 2. Pass in specification (Label,

    Prop, Query) and configuration 3. Execute and return results A. ~.stream variant returns (a lot) of results
 CALL algo.<name>.stream('Label','TYPE',{conf})
 YIELD nodeId, score B. non-stream variant writes results to graph returns statistics
 CALL algo.<name>('Label','TYPE',{conf}) How To…
  27. Pass in Cypher statement for node- and relationship-lists.
 
 CALL

    algo.<name>(
 'MATCH ... RETURN id(n)',
 'MATCH (n)-->(m) 
 RETURN id(n) as source, 
 id(m) as target', {graph:'cypher'}) Cypher Projection
  28. • Houses • Relations (families, marriages/betrothals, hookups, etc) • Allegiances

    • Locations / Regions • Actors / Characters • Episodes / Books • Deaths • Battles What kinds of data do we have?
  29. Interactions Link 2 characters each time their names (or nicknames)

    appear within 15 words of one another Interaction could be direct or indirect https://networkofthrones.wordpress.com/from-book-to-network/
  30. Interactions - import • LOAD CSV • Load Characters •

    Create relationships with # of interactions (weight) • Relationship by each book https://networkofthrones.wordpress.com/data/
  31. Game of Thrones • 800 nodes • 400 relationships •

    Sandbox: Yelp Business Graph • 5m nodes • 17m relationships • GitHub: https://github.com/neo4j-contrib/ neo4j-data-science-yelp/blob/ master/notebooks/ neo4j_yelp_00_data_load.ipynb Neo4j Community Graph • 280k nodes • 1.4m relationships • GitHub: https://github.com/community- graph/documentation Browser: http://138.197.15.1:7474 username: "all" pwd: “readonly” DBPedia • 11m nodes • 116m relationships • Coming soon…. Datasets :play data_science
  32. • Community Site: community.neo4j.com • GitHub: github.com/neo4j-contrib/neo4j-graph-algorithms • Neo4j Desktop:

    https://www.neo4j.com/download • Neo4j Sandbox: https://www.neo4j.com/sandbox-v2/ • Developer Guides: https://www.neo4j.com/developer/get-started • Neo4j developer blog: https://medium.com/neo4j • Interactions data: https://github.com/mathbeveridge/asoiaf Resources [email protected] @jmhreif