Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Graph Neural Networks

March 07, 2024

Introduction to Graph Neural Networks


March 07, 2024

More Decks by joisino

Other Decks in Science


  1. 3 KYOTO UNIVERSITY Graph represents relations ◼ Graph is a

    data structure representing relations. ◼ Node is an atomic element. Edge connects two nodes. Node Edge
  2. 4 KYOTO UNIVERSITY Social network represents friend relations ◼ Graph

    can represent many relations ◼ Social Network Node: person, Edge: friend relationship
  3. 6 KYOTO UNIVERSITY Graphs can also represent compounds ◼ Compound

    graph Node: atom, Edge: chemical bond H C O H N O N
  4. 7 KYOTO UNIVERSITY Graphs can represent facts flexibly ◼ Knowledge

    graph Node: entity, Edge: relations and facts H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Andrea del Verrocchio
  5. 8 KYOTO UNIVERSITY Image is also a graph ◼ Image

    Node: pixel, Edge: spatial adjacency
  6. 9 KYOTO UNIVERSITY Text is also a graph ◼ Text

    Node: word occurrence, Edge: adjacency The quick brown fox jumps over the lazy dog
  7. 10 KYOTO UNIVERSITY Combined graphs are also a graph ◼

    Combination of text + knowledge graph H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Bob visited Florence and enjoyed works by Leonardo da Vicni Andrea del Verrocchio
  8. 11 KYOTO UNIVERSITY Graph is an all-rounder ◼ Graph is

    general and flexible. It can represent social relations to chemical compounds to images to texts. ◼ Once you learn how to handle graphs, you can process many forms of data, including complex ones. Graph is an all-rounder that can handle many forms of data.
  9. 13 KYOTO UNIVERSITY There are node-, edge-, and graph- level

    tasks ◼ There are three types of graph tasks ◼ Node-level tasks: Node classification, regression, clustering, … ◼ Graph-level tasks: Graph classification, regression, generation, … ◼ Edge-level tasks: Link prediction, link classification, …
  10. 14 KYOTO UNIVERSITY Estimate the label of each person in

    a social net ◼ Node Classification: Estimate whether each person likes soccer or not ? ? ?
  11. 15 KYOTO UNIVERSITY Typical methods handle vectors independently ◼ Typical

    (non-graph) classification methods handle each individual. (20s, male, Osaka) -> Positive (30s, female, Tokyo) -> Negative (20s, male, Okinawa) -> Positive (40s, female, Tokyo) -> ? (Negative?) Vector data
  12. 16 KYOTO UNIVERSITY Graph methods use both graph and vector

    data ◼ Graph-based methods use both information from the graph and feature vectors → more accurate ? (40s, female, Tokyo) (30s, female, Okinawa) (20s, male, Okinawa) (20s, male, Osaka) because she has many friends who like soccer. Vector + graph data → Positive?
  13. 17 KYOTO UNIVERSITY Node-level problems are dense predction ◼ In

    an image’s term, node-level problems correspond to dense prediction, e.g., segmentation.
  14. 18 KYOTO UNIVERSITY Estimate the label of the entire graph

    ◼ Graph Classification H C O H N O N drug efficacy H C O H O H H drug efficacy H C O H N O N H C H H N H H No drug efficacy O C O H O H H ? H
  15. 19 KYOTO UNIVERSITY Image classification is an graph-level problem ◼

    In an image’s term, graph-level problems correspond to image-level problems. cat dog cat
  16. 20 KYOTO UNIVERSITY User recommend is a link prediction problem

    ◼ Link prediction predicts existence of edges Ex) User recommendation in Facebook ? ?
  17. 21 KYOTO UNIVERSITY Item recommend is also a link prediction

    problem ◼ Recommender systems can be formulated as a link prediction problem. Node: Users + Movies Edge: Consumption ? ?
  18. 22 KYOTO UNIVERSITY Many problems are graph problems ◼ Many

    problems, including user classification, segmentation, image classification, recommender systems, can be formulated as graph problems. ◼ Graph neural networks can solve all of these problems in a unified way. Graph neural networks are all you need.
  19. 24 KYOTO UNIVERSITY GNNs handle graph data ◼ GNNs are

    neural networks for graph data. ◼ Do not be confused with graph shaped NNs. MLP looks like a graph but is not GNN. We can say MLP is a vector neural network because it is a NN for vector data. GNN MLP
  20. 25 KYOTO UNIVERSITY Given a graph, we compute node embeddings

    ◼ Input: Graph with node features G = (V, E, X) V and E are a set of nodes and edges is node features for node Output: Graph-aware node embedding Once we obtain z v , we can classify nodes by applying MLP for each z v independently. 𝑥𝑣 ∈ ℝd 𝑧𝑣 ∈ ℝd𝑜𝑢𝑡 𝑣 ∈ 𝑉
  21. 26 KYOTO UNIVERSITY Message passing is the basis of GNNs

    ◼ GNNs initialize the node state by node features then it updates the state by aggregating information from the neighboring nodes where is the set of neighboring nodes to v is the aggregation function typically modeled by neural networks. This mechanism is called message passing. ℎ𝑣 (0) ← 𝑥𝑣 ∈ ℝd ∀𝑣 ∈ 𝑉 ℎ𝑣 (𝑙+1) ← 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 𝒩 𝑣 𝑓 𝜃 𝑎𝑔𝑔,(𝑙)
  22. 27 KYOTO UNIVERSITY Example of GNNs: Initialize node embeddings Node

    features e.g., user’s profile vector x 2 x 1 x 3 x 5 x 6 x 7 x 4
  23. 28 KYOTO UNIVERSITY Example of GNNs: Aggregate features h 5

    = σ(W(x 4 + x 5 + x 6 + x 7 )) h 7 = σ(W(x 5 + x 6 + x 7 )) h 6 = σ(W(x 3 + x 5 + x 7 )) h 4 = σ(W(x 3 + x 4 + x 5 )) h 1 = σ(W(x 1 + x 3 )) h 2 = σ(W(x 2 + x 3 )) h 3 = σ(W(x 1 + x 2 + x 3 + x 4 + x 6 )) ◼ Aggregate features and transform them. The aggregation function is shared for all nodes. Ex) sum aggregation
  24. 29 KYOTO UNIVERSITY Example of GNNs: Repeat this ◼ This

    process is stacked just like multi-layered neural networks. z 5 = σ(V(h 4 + h 5 + h 6 + h 7 )) z 7 = σ(V(h 5 + h 6 + h 7 )) z 6 = σ(V(h 3 + h 5 + h 7 )) z 4 = σ(V(h 3 + h 4 + h 5 )) z 1 = σ(V(h 1 + h 3 )) z 2 = σ(V(h 2 + h 3 )) z 3 = σ(V(h 1 + h 2 + h 3 + h 4 + h 6 ))
  25. 30 KYOTO UNIVERSITY Example of GNNs: Apply prediction head. ◼

    Finally, the predictor is applied independently. In training, all parameters are trained by backpropagation. z 2 z 1 z 3 z 4 z 5 z 6 z 7 y 5 y 7 Prediction head y 4 Prediction head Prediction head
  26. 31 KYOTO UNIVERSITY Message passing can be seen as convolution

    ◼ GNNs are sometimes called graph convolution. This can be understood by an analogy to CNNs. For clearer correspondence, see graph signal processing, e.g., Shuman+ https://arxiv.org/abs/1211.0053.
  27. 32 KYOTO UNIVERSITY GNNs are more general than standard NNs.

    ◼ The aggregation function can be any function. ◼ GNNs are more general than standard NNs. ◼ Let be any neural network, where may be self-attention or MLP or conv. ◼ GNN defined by the following is the same as : This is a special case of GNN that ignores all the neighboring nodes. 𝑔 = 𝑔𝐿 ∘ 𝑔𝐿−1 ∘ ⋯ ∘ 𝑔1 𝑔𝑙 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = 𝑔𝑙 ℎ𝑣 (𝑙) 𝑔
  28. 33 KYOTO UNIVERSITY Implication of the fact that GNNs are

    general NNs ◼ If you already have a good neural networks, you can design GNNs based on it by adding message passing to it. ◼ You can debug GNNs using this fact. When you are in trouble with GNNs, remove all the edges and fall back to standard NNs. ◼ Start from no edges (standard NNs) and try adding some edges. This ensures GNNs are not worse than standard NNs.
  29. 34 KYOTO UNIVERSITY Example with a citation dataset ◼ Example:

    Cora Dataset ◼ Node: Paper (text) ◼ Node feature: BoW vector of the abstract ◼ Edge: (u, v) indicates u cites v ◼ Node label: category of the paper ◼ This is a text classification task (with citation information)
  30. 35 KYOTO UNIVERSITY Comparison of standard NN and GNN ◼

    We consider 1-layer NN and GNN. ◼ 𝑥𝑣 ∈ ℝ𝑑 is the raw node feature (BoW) ◼ Let ℎ𝑣 = 𝑊 𝑥𝑣 (standard linear layer) ◼ Let 𝑧𝑣 = 𝑓 𝜃 𝑎𝑔𝑔 ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = σ 𝑢∈𝒩 𝑣 ∪{𝑣} 1 (|𝒩 𝑣 |+1)( 𝒩 𝑢 +1) ℎ𝑢 ◼ We predict the node label by Softmax(z v ) Averaging neighboring h’s with special normalization. This form is called a GCN layer
  31. 36 KYOTO UNIVERSITY GNN representation is better than standard NN

    𝑇𝑆𝑁𝐸(𝑥𝑣 ) 𝑇𝑆𝑁𝐸(ℎ𝑣 ) 𝑇𝑆𝑁𝐸(𝑧𝑣 ) Does not care the graph Representations are improved by aggregation Standard NN GNN counterpart Raw feature ◼ Node color indicates the node label.
  32. 37 KYOTO UNIVERSITY Homophily helps good representation of GNNs ◼

    Why does aggregation improve the embeddings? ◼ Nodes with the same label tend to be connected. Theory papers tend to cite theory papers. DeepNets papers tend to cite DeepNets papers. ◼ This tendency is called homophily. homo-: same, identical -phily: liking for, tendency towards ◼ The signal of the label is reinforced by aggregation thanks to homophily.
  33. 38 KYOTO UNIVERSITY Designing homophily is a key of success

    ◼ GNNs much benefit from homophilous graphs. ◼ Many GNNs for heterophilous ( homophilous) graphs have been proposed, e.g., H2GCN and FAGCN, recently. ◼ For the moment, focus on designing homophilous graphs. It is the most promising approach. ◼ If you obtain a homophilous graph, GNNs will surely help you boost the performance of the model.
  34. 39 KYOTO UNIVERSITY Attention is a popular choice of aggregation

    ◼ There are many variants of aggregation. ◼ Among them, attention aggregation and graph attention networks (GATs) [Veličković+ ICLR 2018] are popular in practice. ◼ It aggregates neighboring embeddings with weighted sum computed by attention. ℎ𝑣 (𝑙+1) ← ෍ 𝑢∈𝒩 𝑣 ∪{𝑣} 𝛼𝑣𝑢 𝑊ℎ𝑢 (𝑙) Attention weight
  35. 40 KYOTO UNIVERSITY Transformer is a special case of GNNs

    ◼ Transformer is a special case of GNNs with attention aggregation. ◼ Attention layer in Transformer aggregates values from all items. This corresponds to attention aggregation GNN with edges between all pairs of nodes. ◼ Edges give hints where we should put attention especially when homophily exists. We can also use both transformer-like attention heads (with less inductive biases) & GNN-like attention heads (with inductive biases).
  36. 41 KYOTO UNIVERSITY Solving graph and edge level tasks with

    GNNs ◼ GNNs provide node embeddings. How can we solve graph and edge level tasks? z 2 z 1 z 3 z 4 z 5 z 6 z 7
  37. 42 KYOTO UNIVERSITY Readout functions makes graph embeddings ◼ For

    graph-level tasks, use a readout function to generate graph-level embeddings. E.g., 𝑧𝐺 can be fed into downstream graph predictors. 𝑧𝐺 = 1 |𝑉| ෍ 𝑣∈𝑉 𝑧𝑣 ∈ ℝ𝑑 Mean pooling 𝑧𝐺 = max 𝑧𝑣 | 𝑣 ∈ 𝑉 ∈ ℝ𝑑 Max pooling Elementwise max 𝑧𝐺 = Transformer( 𝑧𝑣 | 𝑣 ∈ 𝑉 , CLS )[CLS] ∈ ℝ𝑑 Attention pooling
  38. 43 KYOTO UNIVERSITY Pair of node embeddings for edge-level tasks

    ◼ For edge-level tasks, a pair of node embeddings can be used: z 2 z 1 z 3 z 4 z 5 z 6 z 7 𝑦24 = 𝜎 𝑧2 ⊤ 𝑧4 Trained with the log loss
  39. 44 KYOTO UNIVERSITY PyG and DGL are popular frameworks for

    GNNs ◼ Many GNN-frameworks are available ◼ You can find graph datasets in PyTorch Geometric GitHub 19k+ stars PyTorch DGL GitHub 12k+ stars PyTorch, MXNet, TensorFlow
  40. 45 KYOTO UNIVERSITY Frameworks are intuitive and easy to use

    ◼ These frameworks seamlessly integrate GNNs into DNN frameworks. https://colab.research.google.com/drive/1M8Xss9mcQsdZA5p94jpfZQPIph24F4v9?usp=sharing small example:
  41. 47 KYOTO UNIVERSITY Social recommendation use friend relations ◼ Social

    recommendation: ◼ In: users’ buying history + friendship relations ◼ Out: recommending items to users Fan et al. Graph Neural Networks for Social Recommendation. WWW 2019.
  42. 48 KYOTO UNIVERSITY We use a graph with watch and

    friend edges ? ? ◼ Node: Users + Items Edge: Watch + Friend Link prediction
  43. 49 KYOTO UNIVERSITY Challenge: User who has never seen a

    movie ? ◼ Suppose the user has never seen a movie → cold start problem
  44. 50 KYOTO UNIVERSITY Friend relations help alleviate cold start !

    ◼ We can recommend items based on friendship. Her friends like this movie
  45. 51 KYOTO UNIVERSITY GNNs can utilize friendship effectively ◼ GNNs

    can do this kind of inference x1 x2 x3 x4 x5 x6 x7
  46. 52 KYOTO UNIVERSITY GNNs gather information by message passing ◼

    GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 Sum aggregation
  47. 53 KYOTO UNIVERSITY GNNs find users and items with common

    friends ◼ GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 h6 T h4 is high due to these components → recommended
  48. 54 KYOTO UNIVERSITY Knowledge graph enriches the graph ◼ We

    can enrich the graph by knowledge graphs ? ? director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ
  49. 55 KYOTO UNIVERSITY Implication of the fact that GNNs are

    general NNs ◼ The director is the same as his previous movie → recommended ! director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ
  50. 56 KYOTO UNIVERSITY GNNs are flexible enough to utilize messy

    data ◼ Graph is general and flexible. ◼ We can incorporate all information (including messy one) we have into the graph and let GNNs learn patterns from it. ◼ Graphs and GNNs are suitable to incorporate complex data thanks to its flexibility. ◼ Of course it is better to use only useful relations.
  51. 57 KYOTO UNIVERSITY GNNs are helpful in both sparse and

    rich regimes ◼ When data (e.g., purchase log) are sparse → This is a challenging problem but can be alleviated by auxiliary (e.g., friendship) information. ◼ When data are rich → incorporate all information you have. GNNs will learn patterns from nuanced signals and boost the performance. ◼ GNNs are helpful in both regimes
  52. 58 KYOTO UNIVERSITY GNNs and KG for factual QA with

    LLMs ◼ Graph Neural Prompting with Large Language Models (AAAI 2024) ◼ In: Question, choices, knowledge graph Out: Answer
  53. 59 KYOTO UNIVERSITY Use KG embeddings as a prompt for

    LLM ◼ Method: Retrieve a relevant part from the knowledge graph, compute the embeddings by GNNs, and prepend it as the prompt for LLM.
  54. 60 KYOTO UNIVERSITY GNNs and KG boost the performance ◼

    Results: Graph prompting provides a huge benefit.
  55. 61 KYOTO UNIVERSITY GNNs for physics simulations ◼ Learning Mesh-Based

    Simulation with Graph Networks (ICLR 2021) ◼ In: The state of a material (e.g., cloth) ◼ Out: How the material changes
  56. 62 KYOTO UNIVERSITY Represent the material as a mesh graph

    ◼ Idea: Each point interacts with its neighbors. Near points will move in similar directions. ◼ Method: Construct a mesh-like graph on the substance and use a GNN to make predictions.
  57. 64 KYOTO UNIVERSITY GNNs for image classification ◼ Vision GNN:

    An Image is Worth Graph of Nodes (NeurIPS 2022) ◼ Task: Image Classification (ImageNet) ◼ The proposed method is like vision transformer, but guides attention with graphs. Node: Patch of the image Edge: kNN w.r.t. patch embeddings
  58. 66 KYOTO UNIVERSITY Generative models for molecule graphs Tags-to-image models

    are popular these days It would be useful if we could do this with molecules ((best quality)), ((masterpiece)), ((ultra-detailed)), (illustration), (detailed light), (an extremely delicate and beautiful), ..., stars in the eyes, messy floating hair, ... Organic, water soluble, lightweight, inexpensive, non-toxic, medicinal,...
  59. 67 KYOTO UNIVERSITY We use GNNs to build graph generative

    models ◼ Many generative models can be used for graph data by replacing the components with GNNs. ◼ E.g., Replacing U-Net for diffusion with GNNs ◼ VAE and GANs can also be used with GNNs. E.g., GraphVAE [Simonovsky+ 2019]
  60. 68 KYOTO UNIVERSITY Diffusion model removes noise from data ◼

    Diffusion models are popular these days. ◼ They take noisy data and estimate the noise. ◼ From complete noise, they iteratively refine the data by estimating and subtracting noise. https://drive.google.com/file/ d/18zIMEAZzLWjyh8FhVX3cPS lJst1JNiQK/view?usp=sharing
  61. 69 KYOTO UNIVERSITY We can build diffusion models for graphs

    ◼ The same idea can be applied to graphs. ◼ GNN takes a noisy graph and estimate which edges and features are noisy. https://drive.google.com/file/d /1NyT3FAGMq2LpqbgRoKfPd9s UVmktC0-5/view?usp=sharing
  62. 70 KYOTO UNIVERSITY Diffusion models for graphs ◼ Score-based Generative

    Modeling of Graphs via the System of Stochastic Differential Equations (ICML 2022) ◼ It models denoising of the graph structure A and the node features X.
  63. 71 KYOTO UNIVERSITY Graph models suffer from lack of data

    ◼ Graph generative models have not been so successful than image and text generative models. ◼ This is partly because graph data are scarce. People do not post molecules on the Internet.
  64. 72 KYOTO UNIVERSITY LLMs may be beneficial to provide commonsense

    ◼ Direction 1: Use LLMs and papers and textbooks on molecules and drugs. ◼ We can pretrain models using plenty of texts and have the model acquire commonsense on chemistry and pharmacy. ◼ What can Large Language Models do in chemistry? paper [Guo+ NeurIPS Dataset and Benchmark 2023] investigated the power of GPT-4 in molecule design and related tasks.
  65. 73 KYOTO UNIVERSITY Feedback from simulators is also beneficial ◼

    Direction 2: Use feedback from simulators. ◼ We can simulate the effect of molecules. It is difficult to judge the quality of text without humans. ◼ It’s like reinforcement learning from human feedback but from simulator feedback, which is cheaper. ◼ Many RL-based graph generative models have been proposed, e.g., GCPN [You+ NeurIPS 2018].
  66. 75 KYOTO UNIVERSITY GNN is an all-rounder ◼ GNNs are

    general and flexible. It can solve many kinds of tasks. Recommender systems, image classification, physics simulation, drug discovery, … ◼ We can build graphs with various information, which help us boost the performance of models. This idea can be used for any task, thanks to the flexibility of graphs and GNNs. Graph is an all-rounder that can handle many forms of data and boosts your model.