Introduction to Graph Neural Networks

Slide 1

Slide 1 text

1 KYOTO UNIVERSITY KYOTO UNIVERSITY Introduction to Graph Neural Networks Ryoma Sato

Slide 2

Slide 2 text

2 KYOTO UNIVERSITY Graphs are Everywhere

Slide 3

Slide 3 text

3 KYOTO UNIVERSITY Graph represents relations ◼ Graph is a data structure representing relations. ◼ Node is an atomic element. Edge connects two nodes. Node Edge

Slide 4

Slide 4 text

4 KYOTO UNIVERSITY Social network represents friend relations ◼ Graph can represent many relations ◼ Social Network Node: person, Edge: friend relationship

Slide 5

Slide 5 text

5 KYOTO UNIVERSITY Transportation network represents roads ◼ Transportation Network Node: city, Edge: road

Slide 6

Slide 6 text

6 KYOTO UNIVERSITY Graphs can also represent compounds ◼ Compound graph Node: atom, Edge: chemical bond H C O H N O N

Slide 7

Slide 7 text

7 KYOTO UNIVERSITY Graphs can represent facts flexibly ◼ Knowledge graph Node: entity, Edge: relations and facts H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Andrea del Verrocchio

Slide 8

Slide 8 text

8 KYOTO UNIVERSITY Image is also a graph ◼ Image Node: pixel, Edge: spatial adjacency

Slide 9

Slide 9 text

9 KYOTO UNIVERSITY Text is also a graph ◼ Text Node: word occurrence, Edge: adjacency The quick brown fox jumps over the lazy dog

Slide 10

Slide 10 text

10 KYOTO UNIVERSITY Combined graphs are also a graph ◼ Combination of text + knowledge graph H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Bob visited Florence and enjoyed works by Leonardo da Vicni Andrea del Verrocchio

Slide 11

Slide 11 text

11 KYOTO UNIVERSITY Graph is an all-rounder ◼ Graph is general and flexible. It can represent social relations to chemical compounds to images to texts. ◼ Once you learn how to handle graphs, you can process many forms of data, including complex ones. Graph is an all-rounder that can handle many forms of data.

Slide 12

Slide 12 text

12 KYOTO UNIVERSITY Graph Tasks

Slide 13

Slide 13 text

13 KYOTO UNIVERSITY There are node-, edge-, and graph- level tasks ◼ There are three types of graph tasks ◼ Node-level tasks: Node classification, regression, clustering, … ◼ Graph-level tasks: Graph classification, regression, generation, … ◼ Edge-level tasks: Link prediction, link classification, …

Slide 14

Slide 14 text

14 KYOTO UNIVERSITY Estimate the label of each person in a social net ◼ Node Classification: Estimate whether each person likes soccer or not ？？？

Slide 15

Slide 15 text

15 KYOTO UNIVERSITY Typical methods handle vectors independently ◼ Typical (non-graph) classification methods handle each individual. (20s, male, Osaka) -> Positive (30s, female, Tokyo) -> Negative (20s, male, Okinawa) -> Positive (40s, female, Tokyo) -> ? (Negative?) Vector data

Slide 16

Slide 16 text

16 KYOTO UNIVERSITY Graph methods use both graph and vector data ◼ Graph-based methods use both information from the graph and feature vectors → more accurate ？ (40s, female, Tokyo) (30s, female, Okinawa) (20s, male, Okinawa) (20s, male, Osaka) because she has many friends who like soccer. Vector + graph data → Positive?

Slide 17

Slide 17 text

17 KYOTO UNIVERSITY Node-level problems are dense predction ◼ In an image’s term, node-level problems correspond to dense prediction, e.g., segmentation.

Slide 18

Slide 18 text

18 KYOTO UNIVERSITY Estimate the label of the entire graph ◼ Graph Classification H C O H N O N drug efficacy H C O H O H H drug efficacy H C O H N O N H C H H N H H No drug efficacy O C O H O H H ？ H

Slide 19

Slide 19 text

19 KYOTO UNIVERSITY Image classification is an graph-level problem ◼ In an image’s term, graph-level problems correspond to image-level problems. cat dog cat

Slide 20

Slide 20 text

20 KYOTO UNIVERSITY User recommend is a link prediction problem ◼ Link prediction predicts existence of edges Ex) User recommendation in Facebook ? ?

Slide 21

Slide 21 text

21 KYOTO UNIVERSITY Item recommend is also a link prediction problem ◼ Recommender systems can be formulated as a link prediction problem. Node: Users + Movies Edge: Consumption ? ?

Slide 22

Slide 22 text

22 KYOTO UNIVERSITY Many problems are graph problems ◼ Many problems, including user classification, segmentation, image classification, recommender systems, can be formulated as graph problems. ◼ Graph neural networks can solve all of these problems in a unified way. Graph neural networks are all you need.

Slide 23

Slide 23 text

23 KYOTO UNIVERSITY Graph Neural Networks (GNNs)

Slide 24

Slide 24 text

24 KYOTO UNIVERSITY GNNs handle graph data ◼ GNNs are neural networks for graph data. ◼ Do not be confused with graph shaped NNs. MLP looks like a graph but is not GNN. We can say MLP is a vector neural network because it is a NN for vector data. GNN MLP

Slide 25

Slide 25 text

25 KYOTO UNIVERSITY Given a graph, we compute node embeddings ◼ Input: Graph with node features G = (V, E, X) V and E are a set of nodes and edges is node features for node Output: Graph-aware node embedding Once we obtain z v , we can classify nodes by applying MLP for each z v independently. 𝑥𝑣 ∈ ℝd 𝑧𝑣 ∈ ℝd𝑜𝑢𝑡 𝑣 ∈ 𝑉

Slide 26

Slide 26 text

26 KYOTO UNIVERSITY Message passing is the basis of GNNs ◼ GNNs initialize the node state by node features then it updates the state by aggregating information from the neighboring nodes where is the set of neighboring nodes to v is the aggregation function typically modeled by neural networks. This mechanism is called message passing. ℎ𝑣 (0) ← 𝑥𝑣 ∈ ℝd ∀𝑣 ∈ 𝑉 ℎ𝑣 (𝑙+1) ← 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 𝒩 𝑣 𝑓 𝜃 𝑎𝑔𝑔,(𝑙)

Slide 27

Slide 27 text

27 KYOTO UNIVERSITY Example of GNNs: Initialize node embeddings Node features e.g., user’s profile vector x 2 x 1 x 3 x 5 x 6 x 7 x 4

Slide 28

Slide 28 text

28 KYOTO UNIVERSITY Example of GNNs: Aggregate features h 5 = σ(W(x 4 + x 5 + x 6 + x 7 )) h 7 = σ(W(x 5 + x 6 + x 7 )) h 6 = σ(W(x 3 + x 5 + x 7 )) h 4 = σ(W(x 3 + x 4 + x 5 )) h 1 = σ(W(x 1 + x 3 )) h 2 = σ(W(x 2 + x 3 )) h 3 = σ(W(x 1 + x 2 + x 3 + x 4 + x 6 )) ◼ Aggregate features and transform them. The aggregation function is shared for all nodes. Ex) sum aggregation

Slide 29

Slide 29 text

29 KYOTO UNIVERSITY Example of GNNs: Repeat this ◼ This process is stacked just like multi-layered neural networks. z 5 = σ(V(h 4 + h 5 + h 6 + h 7 )) z 7 = σ(V(h 5 + h 6 + h 7 )) z 6 = σ(V(h 3 + h 5 + h 7 )) z 4 = σ(V(h 3 + h 4 + h 5 )) z 1 = σ(V(h 1 + h 3 )) z 2 = σ(V(h 2 + h 3 )) z 3 = σ(V(h 1 + h 2 + h 3 + h 4 + h 6 ))

Slide 30

Slide 30 text

30 KYOTO UNIVERSITY Example of GNNs: Apply prediction head. ◼ Finally, the predictor is applied independently. In training, all parameters are trained by backpropagation. z 2 z 1 z 3 z 4 z 5 z 6 z 7 y 5 y 7 Prediction head y 4 Prediction head Prediction head

Slide 31

Slide 31 text

31 KYOTO UNIVERSITY Message passing can be seen as convolution ◼ GNNs are sometimes called graph convolution. This can be understood by an analogy to CNNs. For clearer correspondence, see graph signal processing, e.g., Shuman+ https://arxiv.org/abs/1211.0053.

Slide 32

Slide 32 text

32 KYOTO UNIVERSITY GNNs are more general than standard NNs. ◼ The aggregation function can be any function. ◼ GNNs are more general than standard NNs. ◼ Let be any neural network, where may be self-attention or MLP or conv. ◼ GNN defined by the following is the same as : This is a special case of GNN that ignores all the neighboring nodes. 𝑔 = 𝑔𝐿 ∘ 𝑔𝐿−1 ∘ ⋯ ∘ 𝑔1 𝑔𝑙 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = 𝑔𝑙 ℎ𝑣 (𝑙) 𝑔

Slide 33

Slide 33 text

33 KYOTO UNIVERSITY Implication of the fact that GNNs are general NNs ◼ If you already have a good neural networks, you can design GNNs based on it by adding message passing to it. ◼ You can debug GNNs using this fact. When you are in trouble with GNNs, remove all the edges and fall back to standard NNs. ◼ Start from no edges (standard NNs) and try adding some edges. This ensures GNNs are not worse than standard NNs.

Slide 34

Slide 34 text

34 KYOTO UNIVERSITY Example with a citation dataset ◼ Example: Cora Dataset ◼ Node: Paper (text) ◼ Node feature: BoW vector of the abstract ◼ Edge: (u, v) indicates u cites v ◼ Node label: category of the paper ◼ This is a text classification task (with citation information)

Slide 35

Slide 35 text

35 KYOTO UNIVERSITY Comparison of standard NN and GNN ◼ We consider 1-layer NN and GNN. ◼ 𝑥𝑣 ∈ ℝ𝑑 is the raw node feature (BoW) ◼ Let ℎ𝑣 = 𝑊 𝑥𝑣 (standard linear layer) ◼ Let 𝑧𝑣 = 𝑓 𝜃 𝑎𝑔𝑔 ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = σ 𝑢∈𝒩 𝑣 ∪{𝑣} 1 (|𝒩 𝑣 |+1)( 𝒩 𝑢 +1) ℎ𝑢 ◼ We predict the node label by Softmax(z v ) Averaging neighboring h’s with special normalization. This form is called a GCN layer

Slide 36

Slide 36 text

36 KYOTO UNIVERSITY GNN representation is better than standard NN 𝑇𝑆𝑁𝐸(𝑥𝑣 ) 𝑇𝑆𝑁𝐸(ℎ𝑣 ) 𝑇𝑆𝑁𝐸(𝑧𝑣 ) Does not care the graph Representations are improved by aggregation Standard NN GNN counterpart Raw feature ◼ Node color indicates the node label.

Slide 37

Slide 37 text

37 KYOTO UNIVERSITY Homophily helps good representation of GNNs ◼ Why does aggregation improve the embeddings? ◼ Nodes with the same label tend to be connected. Theory papers tend to cite theory papers. DeepNets papers tend to cite DeepNets papers. ◼ This tendency is called homophily. homo-: same, identical -phily: liking for, tendency towards ◼ The signal of the label is reinforced by aggregation thanks to homophily.

Slide 38

Slide 38 text

38 KYOTO UNIVERSITY Designing homophily is a key of success ◼ GNNs much benefit from homophilous graphs. ◼ Many GNNs for heterophilous ( homophilous) graphs have been proposed, e.g., H2GCN and FAGCN, recently. ◼ For the moment, focus on designing homophilous graphs. It is the most promising approach. ◼ If you obtain a homophilous graph, GNNs will surely help you boost the performance of the model.

Slide 39

Slide 39 text

39 KYOTO UNIVERSITY Attention is a popular choice of aggregation ◼ There are many variants of aggregation. ◼ Among them, attention aggregation and graph attention networks (GATs) [Veličković+ ICLR 2018] are popular in practice. ◼ It aggregates neighboring embeddings with weighted sum computed by attention. ℎ𝑣 (𝑙+1) ← ෍ 𝑢∈𝒩 𝑣 ∪{𝑣} 𝛼𝑣𝑢 𝑊ℎ𝑢 (𝑙) Attention weight

Slide 40

Slide 40 text

40 KYOTO UNIVERSITY Transformer is a special case of GNNs ◼ Transformer is a special case of GNNs with attention aggregation. ◼ Attention layer in Transformer aggregates values from all items. This corresponds to attention aggregation GNN with edges between all pairs of nodes. ◼ Edges give hints where we should put attention especially when homophily exists. We can also use both transformer-like attention heads (with less inductive biases) & GNN-like attention heads (with inductive biases).

Slide 41

Slide 41 text

41 KYOTO UNIVERSITY Solving graph and edge level tasks with GNNs ◼ GNNs provide node embeddings. How can we solve graph and edge level tasks? z 2 z 1 z 3 z 4 z 5 z 6 z 7

Slide 42

Slide 42 text

42 KYOTO UNIVERSITY Readout functions makes graph embeddings ◼ For graph-level tasks, use a readout function to generate graph-level embeddings. E.g., 𝑧𝐺 can be fed into downstream graph predictors. 𝑧𝐺 = 1 |𝑉| ෍ 𝑣∈𝑉 𝑧𝑣 ∈ ℝ𝑑 Mean pooling 𝑧𝐺 = max 𝑧𝑣 | 𝑣 ∈ 𝑉 ∈ ℝ𝑑 Max pooling Elementwise max 𝑧𝐺 = Transformer( 𝑧𝑣 | 𝑣 ∈ 𝑉 , CLS )[CLS] ∈ ℝ𝑑 Attention pooling

Slide 43

Slide 43 text

43 KYOTO UNIVERSITY Pair of node embeddings for edge-level tasks ◼ For edge-level tasks, a pair of node embeddings can be used: z 2 z 1 z 3 z 4 z 5 z 6 z 7 𝑦24 = 𝜎 𝑧2 ⊤ 𝑧4 Trained with the log loss

Slide 44

Slide 44 text

44 KYOTO UNIVERSITY PyG and DGL are popular frameworks for GNNs ◼ Many GNN-frameworks are available ◼ You can find graph datasets in PyTorch Geometric GitHub 19k+ stars PyTorch DGL GitHub 12k+ stars PyTorch, MXNet, TensorFlow

Slide 45

Slide 45 text

45 KYOTO UNIVERSITY Frameworks are intuitive and easy to use ◼ These frameworks seamlessly integrate GNNs into DNN frameworks. https://colab.research.google.com/drive/1M8Xss9mcQsdZA5p94jpfZQPIph24F4v9?usp=sharing small example:

Slide 46

Slide 46 text

46 KYOTO UNIVERSITY Applications

Slide 47

Slide 47 text

47 KYOTO UNIVERSITY Social recommendation use friend relations ◼ Social recommendation: ◼ In: users’ buying history + friendship relations ◼ Out: recommending items to users Fan et al. Graph Neural Networks for Social Recommendation. WWW 2019.

Slide 48

Slide 48 text

48 KYOTO UNIVERSITY We use a graph with watch and friend edges ? ? ◼ Node: Users + Items Edge: Watch + Friend Link prediction

Slide 49

Slide 49 text

49 KYOTO UNIVERSITY Challenge: User who has never seen a movie ? ◼ Suppose the user has never seen a movie → cold start problem

Slide 50

Slide 50 text

50 KYOTO UNIVERSITY Friend relations help alleviate cold start ! ◼ We can recommend items based on friendship. Her friends like this movie

Slide 51

Slide 51 text

51 KYOTO UNIVERSITY GNNs can utilize friendship effectively ◼ GNNs can do this kind of inference x1 x2 x3 x4 x5 x6 x7

Slide 52

Slide 52 text

52 KYOTO UNIVERSITY GNNs gather information by message passing ◼ GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 Sum aggregation

Slide 53

Slide 53 text

53 KYOTO UNIVERSITY GNNs find users and items with common friends ◼ GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 h6 T h4 is high due to these components → recommended

Slide 54

Slide 54 text

54 KYOTO UNIVERSITY Knowledge graph enriches the graph ◼ We can enrich the graph by knowledge graphs ? ? director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ

Slide 55

Slide 55 text

55 KYOTO UNIVERSITY Implication of the fact that GNNs are general NNs ◼ The director is the same as his previous movie → recommended ! director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ

Slide 56

Slide 56 text

56 KYOTO UNIVERSITY GNNs are flexible enough to utilize messy data ◼ Graph is general and flexible. ◼ We can incorporate all information (including messy one) we have into the graph and let GNNs learn patterns from it. ◼ Graphs and GNNs are suitable to incorporate complex data thanks to its flexibility. ◼ Of course it is better to use only useful relations.

Slide 57

Slide 57 text

57 KYOTO UNIVERSITY GNNs are helpful in both sparse and rich regimes ◼ When data (e.g., purchase log) are sparse → This is a challenging problem but can be alleviated by auxiliary (e.g., friendship) information. ◼ When data are rich → incorporate all information you have. GNNs will learn patterns from nuanced signals and boost the performance. ◼ GNNs are helpful in both regimes

Slide 58

Slide 58 text

58 KYOTO UNIVERSITY GNNs and KG for factual QA with LLMs ◼ Graph Neural Prompting with Large Language Models (AAAI 2024) ◼ In: Question, choices, knowledge graph Out: Answer

Slide 59

Slide 59 text

59 KYOTO UNIVERSITY Use KG embeddings as a prompt for LLM ◼ Method: Retrieve a relevant part from the knowledge graph, compute the embeddings by GNNs, and prepend it as the prompt for LLM.

Slide 60

Slide 60 text

60 KYOTO UNIVERSITY GNNs and KG boost the performance ◼ Results: Graph prompting provides a huge benefit.

Slide 61

Slide 61 text

61 KYOTO UNIVERSITY GNNs for physics simulations ◼ Learning Mesh-Based Simulation with Graph Networks (ICLR 2021) ◼ In: The state of a material (e.g., cloth) ◼ Out: How the material changes

Slide 62

Slide 62 text

62 KYOTO UNIVERSITY Represent the material as a mesh graph ◼ Idea: Each point interacts with its neighbors. Near points will move in similar directions. ◼ Method: Construct a mesh-like graph on the substance and use a GNN to make predictions.

Slide 63

Slide 63 text

63 KYOTO UNIVERSITY Results: Natural and high quality simulations https://sites.google.com/view/meshgraphnets

Slide 64

Slide 64 text

64 KYOTO UNIVERSITY GNNs for image classification ◼ Vision GNN: An Image is Worth Graph of Nodes (NeurIPS 2022) ◼ Task: Image Classification (ImageNet) ◼ The proposed method is like vision transformer, but guides attention with graphs. Node: Patch of the image Edge: kNN w.r.t. patch embeddings

Slide 65

Slide 65 text

65 KYOTO UNIVERSITY GNNs outperform ViTs ◼ Results: Better than ViTs

Slide 66

Slide 66 text

66 KYOTO UNIVERSITY Generative models for molecule graphs Tags-to-image models are popular these days It would be useful if we could do this with molecules ((best quality)), ((masterpiece)), ((ultra-detailed)), (illustration), (detailed light), (an extremely delicate and beautiful), ..., stars in the eyes, messy floating hair, ... Organic, water soluble, lightweight, inexpensive, non-toxic, medicinal,...

Slide 67

Slide 67 text

67 KYOTO UNIVERSITY We use GNNs to build graph generative models ◼ Many generative models can be used for graph data by replacing the components with GNNs. ◼ E.g., Replacing U-Net for diffusion with GNNs ◼ VAE and GANs can also be used with GNNs. E.g., GraphVAE [Simonovsky+ 2019]

Slide 68

Slide 68 text

68 KYOTO UNIVERSITY Diffusion model removes noise from data ◼ Diffusion models are popular these days. ◼ They take noisy data and estimate the noise. ◼ From complete noise, they iteratively refine the data by estimating and subtracting noise. https://drive.google.com/file/ d/18zIMEAZzLWjyh8FhVX3cPS lJst1JNiQK/view?usp=sharing

Slide 69

Slide 69 text

69 KYOTO UNIVERSITY We can build diffusion models for graphs ◼ The same idea can be applied to graphs. ◼ GNN takes a noisy graph and estimate which edges and features are noisy. https://drive.google.com/file/d /1NyT3FAGMq2LpqbgRoKfPd9s UVmktC0-5/view?usp=sharing

Slide 70

Slide 70 text

70 KYOTO UNIVERSITY Diffusion models for graphs ◼ Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations (ICML 2022) ◼ It models denoising of the graph structure A and the node features X.

Slide 71

Slide 71 text

71 KYOTO UNIVERSITY Graph models suffer from lack of data ◼ Graph generative models have not been so successful than image and text generative models. ◼ This is partly because graph data are scarce. People do not post molecules on the Internet.

Slide 72

Slide 72 text

72 KYOTO UNIVERSITY LLMs may be beneficial to provide commonsense ◼ Direction 1: Use LLMs and papers and textbooks on molecules and drugs. ◼ We can pretrain models using plenty of texts and have the model acquire commonsense on chemistry and pharmacy. ◼ What can Large Language Models do in chemistry? paper [Guo+ NeurIPS Dataset and Benchmark 2023] investigated the power of GPT-4 in molecule design and related tasks.

Slide 73

Slide 73 text

73 KYOTO UNIVERSITY Feedback from simulators is also beneficial ◼ Direction 2: Use feedback from simulators. ◼ We can simulate the effect of molecules. It is difficult to judge the quality of text without humans. ◼ It’s like reinforcement learning from human feedback but from simulator feedback, which is cheaper. ◼ Many RL-based graph generative models have been proposed, e.g., GCPN [You+ NeurIPS 2018].

Slide 74

Slide 74 text

74 KYOTO UNIVERSITY Conclusion

Slide 75

Slide 75 text

75 KYOTO UNIVERSITY GNN is an all-rounder ◼ GNNs are general and flexible. It can solve many kinds of tasks. Recommender systems, image classification, physics simulation, drug discovery, … ◼ We can build graphs with various information, which help us boost the performance of models. This idea can be used for any task, thanks to the flexibility of graphs and GNNs. Graph is an all-rounder that can handle many forms of data and boosts your model.