Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging Graphs for Better AI

Jennifer Reif
September 21, 2019

Leveraging Graphs for Better AI

Today's tech news is littered with buzzwords and acronyms like - ML, AI, big data, smart devices, even graph. What does 'graph' actually provide and mean for technology and business in the artificial intelligence or machine learning space?

In this session, we will look at what advantages graph brings to existing tech for AI and why graphs can solve more complex problems with less developer pain and expertise. Exploring some of the tools for graph AI solutions will demonstrate how to get started or better use the tools at hand. We will also get a look at where these combined technologies are headed and what's being developed to combat new problems.

Jennifer Reif

September 21, 2019
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. About Me Software Engineer, Neo4j - Developer - Blogger -

    Conference speaker Love cats, coffee, and traveling :) [email protected] @JMHReif
  2. Ways to Tackle Data Analytics TRADITIONAL DATABASES BIG DATA TECHNOLOGY

    Store and retrieve data Aggregate and filter data Real time storage & retrieval Long-running queries aggregation & filtering Max # of hops ~3
  3. Averages Aren’t Reality Nodes Relationships Average Distribution
 - Random -

    “There is No Network in Nature that we know of that would be described by the Random network model.” 
 –Albert-László Barabási Nodes Relationships Power Law Distribution - Scale-Free -
  4. Number of NODES Number of RELATIONSHIPS per Node Many approaches

    erroneously focus on the average population where few entities actually exist Graphs help us invest in populous areas Find strategic entities Uncover structural information
  5. Ways to Tackle Data Analytics TRADITIONAL DATABASES BIG DATA TECHNOLOGY

    Store and retrieve data Aggregate and filter data Real time storage & retrieval Long-running queries aggregation & filtering Max # of hops ~3 Connections in data Real-Time Connected Insights Millions
  6. What is graph analytics? Graph analytics is the use of

    any graph based approach to analyze connected data
  7. Query (e.g. Cypher) Real-time, local decisioning 
 and pattern matching

    Graph Algorithms Libraries Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation How do we run Graph Analytics?
  8. • Subset of network science algorithms • Enable reasoning about

    structure • Find bottlenecks, influence, paths, clustering, potential connections, and more • Generally unsupervised, and split into categories: What are Graph Algorithms? Pathfinding 
 and Search Centrality Community Detection Link Prediction Similarity
  9. • Determine what you need to know of your data

    • Looking for paths, influence, groupings, etc? • Domain, dense/sparse, large/small? • Tune config and parameters for specific algo • Execute over entire graph or defined subset of graph • Return results either on runtime or written to graph How they work
  10. Graphs help extract structure and infer behavior Source: “Communities, modules

    and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  11. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  12. Using Graph Algorithms Explore, Plan, Measure Find significant patterns and

    plan for optimal structures Score outcomes and set a threshold value for a prediction Machine Learning Use the measures as features to train an ML model 1st Node 2nd Node Common
 Neighbors Preferential
 Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
  13. “The idea is that graph networks are bigger than 


    any one machine-learning approach. Graphs bring an ability to generalize about structure that the individual neural nets don't have.” "Where do the graphs come from that 
 graph networks operate over?”
  14. Machine Learning Eats A Lot of Data Algorithms train software

    • with specific examples and progressive improvements • iterate, adjusting to get closer to an objective goal This learning requires a lot of data to a model and enabling it to learn how to process and incorporate that information
  15. • Current data science models ignore network structure • Graphs

    add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Novel & More Accurate Predictions
 with the Data You Already Have Machine Learning Pipeline
  16. Connection-related metrics about our graph… • # of relationships going

    into or out of nodes • count of potential triangles • neighbors in common What are connected features?
  17. What is feature extraction? Raw variables reduced to more manageable

    groups (features), yet still accurately describe the original data 1 1 0 1 0
  18. Building a Graph ML Model Data Sources Native Graph Platform

    Machine Learning Aggregate Disparate Data and Cleanse Build Predictive Models Unify Graphs and Engineer Features Parquet JSON and more… MLlib and more…
  19. Graph Machine Learning Workflow Data aggregation Create and store graphs

    Extract Data & Store as Graph Explore, Clean, Modify Prepare for 
 Machine Learning Train 
 Models Evaluate Results Productionize Identify uninteresting features Cleanse (outliers+) Feature engineering/
 extraction Train / Test split Resample for meaningful representation (proportional, etc.) Precision, accuracy, recall (ROC curve & AUC) SME Review Cross-validation Model & variable selection Hyperparameter tuning Ensemble methods
  20. 4 Layered Models Trained Common Authors Model “Graphy” Model Triangles

    Model Community Model Adds: • Pref. Attachment • Total Neighbors Adds: • Min & Max Triangles • Min & Max Clustering Coefficient Adds: • Label Propagation • Louvain Modularity Multiple graph features used to train the models
  21. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  22. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Delivery Data Science Complexity
  23. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity Data Science Complexity
  24. Query-Based Knowledge Graphs
 Connecting the Dots “Using Neo4j someone from

    our Orion project found information from the Apollo project that prevented an issue, saving well over two years of work and one million dollars of taxpayer funds.”
  25. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Query Based Feature Engineering Enterprise Maturity Data Science Complexity
  26. Query-Based Feature Engineering
 Mining Data for Drug Discovery HetioNet is

    a knowledge graph integrating over 50 years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links het.io
  27. Query-Based Feature Engineering
 Mining Data for Drug Discovery HetioNet is

    a knowledge graph integrating over 50 years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links het.io
  28. Steps Forward in Graph Data Science Query Based Feature Engineering

    Graph Embeddings Graph Neural Networks Query Based Knowledge Graph Graph Algorithm Feature Engineering Enterprise Maturity Data Science Complexity
  29. Graph Feature Categories & Algorithms Pathfinding 
 & Search Finds

    the optimal paths or evaluates 
 route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic 
 Link Prediction Estimates the likelihood of nodes 
 forming a relationship Evaluates how alike nodes are Similarity Learned representations
 of connectivity or topology Embeddings
  30. • Connected components to identify disjointed graphs sharing identifiers •

    PageRank to measure influence and transaction volumes • Louvain to identify communities that frequently interact • Jaccard to measure account similarity Graph Connected Feature Engineering 
 Financial Crime: Detecting Fraud Large financial institutions already have existing pipelines to identify fraud via heuristics and models
 Graph based features improve accuracy:
  31. Graph Algorithms in Neo4J • Parallel Breadth First Search •

    Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding 
 & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/
 graph-algorithms/current/ Link 
 Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  32. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Graph Neural Networks Query Based Feature Engineering Graph Embeddings Enterprise Maturity Data Science Complexity
  33. Graph Embeddings Embedding transforms graphs into a feature vector, or

    set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph • Vertex/Node embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector
  34. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Graph Algorithm Feature Engineering Query Based Feature Engineering Graph Neural Networks Graph Embeddings Enterprise Maturity Data Science Complexity
  35. Graph Native Learning refers to deep learning models that take

    a graph as an input, performs computations, and return a graph Graph Native Learning Battaglia et al, 2018
  36. Example: electron path prediction Bradshaw et al, 2019 Graph Native

    Learning Given reactants and reagents, what will the products be? Given reactants and reagents, how do they form it?
  37. Graph Data Science Gives Us Better Decisions Knowledge Graphs Higher


    Accuracy Connected Feature Engineering More Trust
 and Applicability Graph Native
 Learning
  38. Steps Forward in Graph Data Science Query Based Knowledge Graph

    Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Delivery Data Science Complexity
  39. Data Scientists/Developers • neo4j.com/sandbox • neo4j.com/graphacademy/ online-training/ • community.neo4j.com •

    neo4j.com/graph- algorithms-book/ @JMHReif [email protected] Use Cases & Information • neo4j.com/use-cases/ artificial-intelligence- analytics/ • neo4j.com/use-cases/ knowlede-graphs/ • neo4j.com/graph-machine- learning-algorithms/