Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging Graphs for Better AI

10f2b035856eef2b59c02267e3ee9e63?s=47 Jennifer Reif
September 21, 2019

Leveraging Graphs for Better AI

Today's tech news is littered with buzzwords and acronyms like - ML, AI, big data, smart devices, even graph. What does 'graph' actually provide and mean for technology and business in the artificial intelligence or machine learning space?

In this session, we will look at what advantages graph brings to existing tech for AI and why graphs can solve more complex problems with less developer pain and expertise. Exploring some of the tools for graph AI solutions will demonstrate how to get started or better use the tools at hand. We will also get a look at where these combined technologies are headed and what's being developed to combat new problems.

10f2b035856eef2b59c02267e3ee9e63?s=128

Jennifer Reif

September 21, 2019
Tweet

Transcript

  1. Leveraging Graphs for Better AI Jennifer Reif Developer Relations Engineer,

    Neo4j @JMHReif jennifer.reif@neo4j.com
  2. About Me Software Engineer, Neo4j - Developer - Blogger -

    Conference speaker Love cats, coffee, and traveling :) jennifer.reif@neo4j.com @JMHReif
  3. None
  4. What are graphs?

  5. None
  6. What are graphs?

  7. What are graphs?

  8. What are graphs?

  9. AWS Global Infrastructure Graph

  10. Bank Fraud Graph

  11. Travel Graph

  12. Data analytics

  13. Structure is 
 Hard to Unfold

  14. Ways to Tackle Data Analytics TRADITIONAL DATABASES BIG DATA TECHNOLOGY

    Store and retrieve data Aggregate and filter data Real time storage & retrieval Long-running queries aggregation & filtering Max # of hops ~3
  15. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    “There is No Network in Nature that we know of that would be described by the Random network model.” 
 –Albert-László Barabási Nodes Relationships Power Law Distribution - Scale-Free -
  16. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    - Scale-Free - - Small World -
  17. Number of NODES Number of RELATIONSHIPS per Node Many approaches

    erroneously focus on the average population where few entities actually exist Graphs help us invest in populous areas Find strategic entities Uncover structural information
  18. Ways to Tackle Data Analytics TRADITIONAL DATABASES BIG DATA TECHNOLOGY

    Store and retrieve data Aggregate and filter data Real time storage & retrieval Long-running queries aggregation & filtering Max # of hops ~3 Connections in data Real-Time Connected Insights Millions
  19. IT’S NOT WHAT YOU KNOW… IT’S HOW YOU ARE CONNECTED.

  20. Graph analytics

  21. What is graph analytics? Graph analytics is the use of

    any graph based approach to analyze connected data
  22. What Do These Graphs Have in Common?

  23. Simple isomorphisms can be visualized Anything more complicated requires graph

    algorithms to find
  24. Query (e.g. Cypher) Real-time, local decisioning 
 and pattern matching

    Graph Algorithms Libraries Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation How do we run Graph Analytics?
  25. Game of Thrones graph - courtesy of Will Lyon analysis

    & blog post
  26. • Subset of network science algorithms • Enable reasoning about

    structure • Find bottlenecks, influence, paths, clustering, potential connections, and more • Generally unsupervised, and split into categories: What are Graph Algorithms? Pathfinding 
 and Search Centrality Community Detection Link Prediction Similarity
  27. Others in this space Gelly

  28. • Determine what you need to know of your data

    • Looking for paths, influence, groupings, etc? • Domain, dense/sparse, large/small? • Tune config and parameters for specific algo • Execute over entire graph or defined subset of graph • Return results either on runtime or written to graph How they work
  29. Graphs help extract structure and infer behavior Source: “Communities, modules

    and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  30. Real-World Examples Source: Maven 7

  31. Real-World Examples

  32. Real-World Examples Source: “Fast unfolding of communities in large networks”

    – Blondel, Guillaume, Lambiotte, Lefebvre
  33. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  34. Using Graph Algorithms Explore, Plan, Measure Find significant patterns and

    plan for optimal structures Score outcomes and set a threshold value for a prediction Machine Learning Use the measures as features to train an ML model 1st Node 2nd Node Common
 Neighbors Preferential
 Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
  35. Graphs with ML

  36. “The idea is that graph networks are bigger than 


    any one machine-learning approach. Graphs bring an ability to generalize about structure that the individual neural nets don't have.” "Where do the graphs come from that 
 graph networks operate over?”
  37. Machine Learning Eats A Lot of Data Algorithms train software

    • with specific examples and progressive improvements • iterate, adjusting to get closer to an objective goal This learning requires a lot of data to a model and enabling it to learn how to process and incorporate that information
  38. • Current data science models ignore network structure • Graphs

    add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Novel & More Accurate Predictions
 with the Data You Already Have Machine Learning Pipeline
  39. Connection-related metrics about our graph… • # of relationships going

    into or out of nodes • count of potential triangles • neighbors in common What are connected features?
  40. What is feature extraction? Raw variables reduced to more manageable

    groups (features), yet still accurately describe the original data 1 1 0 1 0
  41. What is connected feature extraction?

  42. Feature Engineering + Feature Extraction Add More Descriptive Features: -

    Influence - Relationships - Communities
  43. Building a Graph ML Model Data Sources Native Graph Platform

    Machine Learning Aggregate Disparate Data and Cleanse Build Predictive Models Unify Graphs and Engineer Features Parquet JSON and more… MLlib and more…
  44. Graph Machine Learning Workflow Data aggregation Create and store graphs

    Extract Data & Store as Graph Explore, Clean, Modify Prepare for 
 Machine Learning Train 
 Models Evaluate Results Productionize Identify uninteresting features Cleanse (outliers+) Feature engineering/
 extraction Train / Test split Resample for meaningful representation (proportional, etc.) Precision, accuracy, recall (ROC curve & AUC) SME Review Cross-validation Model & variable selection Hyperparameter tuning Ensemble methods
  45. Co-authorship example

  46. 4 Layered Models Trained Common Authors Model “Graphy” Model Triangles

    Model Community Model Adds: • Pref. Attachment • Total Neighbors Adds: • Min & Max Triangles • Min & Max Clustering Coefficient Adds: • Label Propagation • Louvain Modularity Multiple graph features used to train the models
  47. Result: All Models Common Authors Model 1 Community Model 4

  48. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  49. To Infinity and Beyond…

  50. Financial Crimes Drug Discovery Recommendations Cybersecurity Predictive Maintenance Customer Segmentation

    Churn Prediction Search/MDM Graphs Data Science Applications
  51. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Delivery Data Science Complexity
  52. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity Data Science Complexity
  53. Query-Based Knowledge Graphs
 Connecting the Dots “Using Neo4j someone from

    our Orion project found information from the Apollo project that prevented an issue, saving well over two years of work and one million dollars of taxpayer funds.”
  54. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Query Based Feature Engineering Enterprise Maturity Data Science Complexity
  55. Query-Based Feature Engineering
 Mining Data for Drug Discovery HetioNet is

    a knowledge graph integrating over 50 years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links het.io
  56. Query-Based Feature Engineering
 Mining Data for Drug Discovery HetioNet is

    a knowledge graph integrating over 50 years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links het.io
  57. Query-Based Feature Engineering
 Mining Data for Drug Discovery

  58. Steps Forward in Graph Data Science Query Based Feature Engineering

    Graph Embeddings Graph Neural Networks Query Based Knowledge Graph Graph Algorithm Feature Engineering Enterprise Maturity Data Science Complexity
  59. Graph Feature Categories & Algorithms Pathfinding 
 & Search Finds

    the optimal paths or evaluates 
 route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic 
 Link Prediction Estimates the likelihood of nodes 
 forming a relationship Evaluates how alike nodes are Similarity Learned representations
 of connectivity or topology Embeddings
  60. • Connected components to identify disjointed graphs sharing identifiers •

    PageRank to measure influence and transaction volumes • Louvain to identify communities that frequently interact • Jaccard to measure account similarity Graph Connected Feature Engineering 
 Financial Crime: Detecting Fraud Large financial institutions already have existing pipelines to identify fraud via heuristics and models
 Graph based features improve accuracy:
  61. +48,000 U.S. Patents for 
 Graph Fraud / Anomaly Detection

    
 in the last 10 years
  62. Graph Algorithms in Neo4J • Parallel Breadth First Search •

    Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding 
 & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/
 graph-algorithms/current/ Link 
 Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  63. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Graph Neural Networks Query Based Feature Engineering Graph Embeddings Enterprise Maturity Data Science Complexity
  64. Graph Embeddings Embedding transforms graphs into a feature vector, or

    set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph • Vertex/Node embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector
  65. Explainable Reasoning over Knowledge Graphs for Recommendation Graph Embeddings -

    Recommendations
  66. Graph Embeddings - Recommendations Explainable Reasoning over Knowledge Graphs for

    Recommendation
  67. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Query Based Feature Engineering Graph Neural Networks Graph Embeddings Enterprise Maturity Data Science Complexity
  68. Deep Learning refers to training multi-layer neural networks using gradient

    descent Graph Native Learning
  69. Graph Native Learning refers to deep learning models that take

    a graph as an input, performs computations, and return a graph Graph Native Learning Battaglia et al, 2018
  70. Example: electron path prediction Bradshaw et al, 2019 Graph Native

    Learning Given reactants and reagents, what will the products be? Given reactants and reagents, how do they form it?
  71. Example: electron path prediction Graph Native Learning

  72. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  73. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Delivery Data Science Complexity
  74. Data Scientists/Developers • neo4j.com/sandbox • neo4j.com/graphacademy/ online-training/ • community.neo4j.com •

    neo4j.com/graph- algorithms-book/ @JMHReif jennifer.reif@neo4j.com Use Cases & Information • neo4j.com/use-cases/ artificial-intelligence- analytics/ • neo4j.com/use-cases/ knowlede-graphs/ • neo4j.com/graph-machine- learning-algorithms/