joisino
March 07, 2024
1.4k

# Introduction to Graph Neural Networks

Lecture on Graph Neural Networks at Machine Learning Summer Seminar (MLSS2024@Okinawa) https://groups.oist.jp/mlss

March 07, 2024

## Transcript

Ryoma Sato

3. ### 3 KYOTO UNIVERSITY Graph represents relations ◼ Graph is a

data structure representing relations. ◼ Node is an atomic element. Edge connects two nodes. Node Edge
4. ### 4 KYOTO UNIVERSITY Social network represents friend relations ◼ Graph

can represent many relations ◼ Social Network Node: person, Edge: friend relationship

6. ### 6 KYOTO UNIVERSITY Graphs can also represent compounds ◼ Compound

graph Node: atom, Edge: chemical bond H C O H N O N
7. ### 7 KYOTO UNIVERSITY Graphs can represent facts flexibly ◼ Knowledge

graph Node: entity, Edge: relations and facts H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Andrea del Verrocchio

9. ### 9 KYOTO UNIVERSITY Text is also a graph ◼ Text

Node: word occurrence, Edge: adjacency The quick brown fox jumps over the lazy dog
10. ### 10 KYOTO UNIVERSITY Combined graphs are also a graph ◼

Combination of text + knowledge graph H artist is in Florence Leonardo da Vinci Basilica di Santa Maria del Fiore is in is in Born in drew Is a Is a apprentice Bob visited Florence and enjoyed works by Leonardo da Vicni Andrea del Verrocchio
11. ### 11 KYOTO UNIVERSITY Graph is an all-rounder ◼ Graph is

general and flexible. It can represent social relations to chemical compounds to images to texts. ◼ Once you learn how to handle graphs, you can process many forms of data, including complex ones. Graph is an all-rounder that can handle many forms of data.

14. ### 14 KYOTO UNIVERSITY Estimate the label of each person in

a social net ◼ Node Classification: Estimate whether each person likes soccer or not ？ ？ ？
15. ### 15 KYOTO UNIVERSITY Typical methods handle vectors independently ◼ Typical

(non-graph) classification methods handle each individual. (20s, male, Osaka) -> Positive (30s, female, Tokyo) -> Negative (20s, male, Okinawa) -> Positive (40s, female, Tokyo) -> ? (Negative?) Vector data
16. ### 16 KYOTO UNIVERSITY Graph methods use both graph and vector

data ◼ Graph-based methods use both information from the graph and feature vectors → more accurate ？ (40s, female, Tokyo) (30s, female, Okinawa) (20s, male, Okinawa) (20s, male, Osaka) because she has many friends who like soccer. Vector + graph data → Positive?
17. ### 17 KYOTO UNIVERSITY Node-level problems are dense predction ◼ In

an image’s term, node-level problems correspond to dense prediction, e.g., segmentation.
18. ### 18 KYOTO UNIVERSITY Estimate the label of the entire graph

◼ Graph Classification H C O H N O N drug efficacy H C O H O H H drug efficacy H C O H N O N H C H H N H H No drug efficacy O C O H O H H ？ H
19. ### 19 KYOTO UNIVERSITY Image classification is an graph-level problem ◼

In an image’s term, graph-level problems correspond to image-level problems. cat dog cat
20. ### 20 KYOTO UNIVERSITY User recommend is a link prediction problem

◼ Link prediction predicts existence of edges Ex) User recommendation in Facebook ? ?
21. ### 21 KYOTO UNIVERSITY Item recommend is also a link prediction

problem ◼ Recommender systems can be formulated as a link prediction problem. Node: Users + Movies Edge: Consumption ? ?
22. ### 22 KYOTO UNIVERSITY Many problems are graph problems ◼ Many

problems, including user classification, segmentation, image classification, recommender systems, can be formulated as graph problems. ◼ Graph neural networks can solve all of these problems in a unified way. Graph neural networks are all you need.

24. ### 24 KYOTO UNIVERSITY GNNs handle graph data ◼ GNNs are

neural networks for graph data. ◼ Do not be confused with graph shaped NNs. MLP looks like a graph but is not GNN. We can say MLP is a vector neural network because it is a NN for vector data. GNN MLP
25. ### 25 KYOTO UNIVERSITY Given a graph, we compute node embeddings

◼ Input: Graph with node features G = (V, E, X) V and E are a set of nodes and edges is node features for node Output: Graph-aware node embedding Once we obtain z v , we can classify nodes by applying MLP for each z v independently. 𝑥𝑣 ∈ ℝd 𝑧𝑣 ∈ ℝd𝑜𝑢𝑡 𝑣 ∈ 𝑉
26. ### 26 KYOTO UNIVERSITY Message passing is the basis of GNNs

◼ GNNs initialize the node state by node features then it updates the state by aggregating information from the neighboring nodes where is the set of neighboring nodes to v is the aggregation function typically modeled by neural networks. This mechanism is called message passing. ℎ𝑣 (0) ← 𝑥𝑣 ∈ ℝd ∀𝑣 ∈ 𝑉 ℎ𝑣 (𝑙+1) ← 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 𝒩 𝑣 𝑓 𝜃 𝑎𝑔𝑔,(𝑙)
27. ### 27 KYOTO UNIVERSITY Example of GNNs: Initialize node embeddings Node

features e.g., user’s profile vector x 2 x 1 x 3 x 5 x 6 x 7 x 4
28. ### 28 KYOTO UNIVERSITY Example of GNNs: Aggregate features h 5

= σ(W(x 4 + x 5 + x 6 + x 7 )) h 7 = σ(W(x 5 + x 6 + x 7 )) h 6 = σ(W(x 3 + x 5 + x 7 )) h 4 = σ(W(x 3 + x 4 + x 5 )) h 1 = σ(W(x 1 + x 3 )) h 2 = σ(W(x 2 + x 3 )) h 3 = σ(W(x 1 + x 2 + x 3 + x 4 + x 6 )) ◼ Aggregate features and transform them. The aggregation function is shared for all nodes. Ex) sum aggregation
29. ### 29 KYOTO UNIVERSITY Example of GNNs: Repeat this ◼ This

process is stacked just like multi-layered neural networks. z 5 = σ(V(h 4 + h 5 + h 6 + h 7 )) z 7 = σ(V(h 5 + h 6 + h 7 )) z 6 = σ(V(h 3 + h 5 + h 7 )) z 4 = σ(V(h 3 + h 4 + h 5 )) z 1 = σ(V(h 1 + h 3 )) z 2 = σ(V(h 2 + h 3 )) z 3 = σ(V(h 1 + h 2 + h 3 + h 4 + h 6 ))
30. ### 30 KYOTO UNIVERSITY Example of GNNs: Apply prediction head. ◼

Finally, the predictor is applied independently. In training, all parameters are trained by backpropagation. z 2 z 1 z 3 z 4 z 5 z 6 z 7 y 5 y 7 Prediction head y 4 Prediction head Prediction head
31. ### 31 KYOTO UNIVERSITY Message passing can be seen as convolution

◼ GNNs are sometimes called graph convolution. This can be understood by an analogy to CNNs. For clearer correspondence, see graph signal processing, e.g., Shuman+ https://arxiv.org/abs/1211.0053.
32. ### 32 KYOTO UNIVERSITY GNNs are more general than standard NNs.

◼ The aggregation function can be any function. ◼ GNNs are more general than standard NNs. ◼ Let be any neural network, where may be self-attention or MLP or conv. ◼ GNN defined by the following is the same as : This is a special case of GNN that ignores all the neighboring nodes. 𝑔 = 𝑔𝐿 ∘ 𝑔𝐿−1 ∘ ⋯ ∘ 𝑔1 𝑔𝑙 𝑓 𝜃 𝑎𝑔𝑔,(𝑙) ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = 𝑔𝑙 ℎ𝑣 (𝑙) 𝑔
33. ### 33 KYOTO UNIVERSITY Implication of the fact that GNNs are

general NNs ◼ If you already have a good neural networks, you can design GNNs based on it by adding message passing to it. ◼ You can debug GNNs using this fact. When you are in trouble with GNNs, remove all the edges and fall back to standard NNs. ◼ Start from no edges (standard NNs) and try adding some edges. This ensures GNNs are not worse than standard NNs.
34. ### 34 KYOTO UNIVERSITY Example with a citation dataset ◼ Example:

Cora Dataset ◼ Node: Paper (text) ◼ Node feature: BoW vector of the abstract ◼ Edge: (u, v) indicates u cites v ◼ Node label: category of the paper ◼ This is a text classification task (with citation information)
35. ### 35 KYOTO UNIVERSITY Comparison of standard NN and GNN ◼

We consider 1-layer NN and GNN. ◼ 𝑥𝑣 ∈ ℝ𝑑 is the raw node feature (BoW) ◼ Let ℎ𝑣 = 𝑊 𝑥𝑣 (standard linear layer) ◼ Let 𝑧𝑣 = 𝑓 𝜃 𝑎𝑔𝑔 ℎ𝑣 𝑙 , ℎ𝑢 𝑙 | 𝑢 ∈ 𝒩 𝑣 = σ 𝑢∈𝒩 𝑣 ∪{𝑣} 1 (|𝒩 𝑣 |+1)( 𝒩 𝑢 +1) ℎ𝑢 ◼ We predict the node label by Softmax(z v ) Averaging neighboring h’s with special normalization. This form is called a GCN layer
36. ### 36 KYOTO UNIVERSITY GNN representation is better than standard NN

𝑇𝑆𝑁𝐸(𝑥𝑣 ) 𝑇𝑆𝑁𝐸(ℎ𝑣 ) 𝑇𝑆𝑁𝐸(𝑧𝑣 ) Does not care the graph Representations are improved by aggregation Standard NN GNN counterpart Raw feature ◼ Node color indicates the node label.
37. ### 37 KYOTO UNIVERSITY Homophily helps good representation of GNNs ◼

Why does aggregation improve the embeddings? ◼ Nodes with the same label tend to be connected. Theory papers tend to cite theory papers. DeepNets papers tend to cite DeepNets papers. ◼ This tendency is called homophily. homo-: same, identical -phily: liking for, tendency towards ◼ The signal of the label is reinforced by aggregation thanks to homophily.
38. ### 38 KYOTO UNIVERSITY Designing homophily is a key of success

◼ GNNs much benefit from homophilous graphs. ◼ Many GNNs for heterophilous ( homophilous) graphs have been proposed, e.g., H2GCN and FAGCN, recently. ◼ For the moment, focus on designing homophilous graphs. It is the most promising approach. ◼ If you obtain a homophilous graph, GNNs will surely help you boost the performance of the model.
39. ### 39 KYOTO UNIVERSITY Attention is a popular choice of aggregation

◼ There are many variants of aggregation. ◼ Among them, attention aggregation and graph attention networks (GATs) [Veličković+ ICLR 2018] are popular in practice. ◼ It aggregates neighboring embeddings with weighted sum computed by attention. ℎ𝑣 (𝑙+1) ← ෍ 𝑢∈𝒩 𝑣 ∪{𝑣} 𝛼𝑣𝑢 𝑊ℎ𝑢 (𝑙) Attention weight
40. ### 40 KYOTO UNIVERSITY Transformer is a special case of GNNs

◼ Transformer is a special case of GNNs with attention aggregation. ◼ Attention layer in Transformer aggregates values from all items. This corresponds to attention aggregation GNN with edges between all pairs of nodes. ◼ Edges give hints where we should put attention especially when homophily exists. We can also use both transformer-like attention heads (with less inductive biases) & GNN-like attention heads (with inductive biases).
41. ### 41 KYOTO UNIVERSITY Solving graph and edge level tasks with

GNNs ◼ GNNs provide node embeddings. How can we solve graph and edge level tasks? z 2 z 1 z 3 z 4 z 5 z 6 z 7
42. ### 42 KYOTO UNIVERSITY Readout functions makes graph embeddings ◼ For

graph-level tasks, use a readout function to generate graph-level embeddings. E.g., 𝑧𝐺 can be fed into downstream graph predictors. 𝑧𝐺 = 1 |𝑉| ෍ 𝑣∈𝑉 𝑧𝑣 ∈ ℝ𝑑 Mean pooling 𝑧𝐺 = max 𝑧𝑣 | 𝑣 ∈ 𝑉 ∈ ℝ𝑑 Max pooling Elementwise max 𝑧𝐺 = Transformer( 𝑧𝑣 | 𝑣 ∈ 𝑉 , CLS )[CLS] ∈ ℝ𝑑 Attention pooling
43. ### 43 KYOTO UNIVERSITY Pair of node embeddings for edge-level tasks

◼ For edge-level tasks, a pair of node embeddings can be used: z 2 z 1 z 3 z 4 z 5 z 6 z 7 𝑦24 = 𝜎 𝑧2 ⊤ 𝑧4 Trained with the log loss
44. ### 44 KYOTO UNIVERSITY PyG and DGL are popular frameworks for

GNNs ◼ Many GNN-frameworks are available ◼ You can find graph datasets in PyTorch Geometric GitHub 19k+ stars PyTorch DGL GitHub 12k+ stars PyTorch, MXNet, TensorFlow
45. ### 45 KYOTO UNIVERSITY Frameworks are intuitive and easy to use

◼ These frameworks seamlessly integrate GNNs into DNN frameworks. https://colab.research.google.com/drive/1M8Xss9mcQsdZA5p94jpfZQPIph24F4v9?usp=sharing small example:

47. ### 47 KYOTO UNIVERSITY Social recommendation use friend relations ◼ Social

recommendation: ◼ In: users’ buying history + friendship relations ◼ Out: recommending items to users Fan et al. Graph Neural Networks for Social Recommendation. WWW 2019.
48. ### 48 KYOTO UNIVERSITY We use a graph with watch and

friend edges ? ? ◼ Node: Users + Items Edge: Watch + Friend Link prediction
49. ### 49 KYOTO UNIVERSITY Challenge: User who has never seen a

movie ? ◼ Suppose the user has never seen a movie → cold start problem
50. ### 50 KYOTO UNIVERSITY Friend relations help alleviate cold start !

◼ We can recommend items based on friendship. Her friends like this movie
51. ### 51 KYOTO UNIVERSITY GNNs can utilize friendship effectively ◼ GNNs

can do this kind of inference x1 x2 x3 x4 x5 x6 x7
52. ### 52 KYOTO UNIVERSITY GNNs gather information by message passing ◼

GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 Sum aggregation
53. ### 53 KYOTO UNIVERSITY GNNs find users and items with common

friends ◼ GNNs can do this kind of inference h4 = x4 + x1 + x2 h6 = x6 + x1 + x2 h6 T h4 is high due to these components → recommended
54. ### 54 KYOTO UNIVERSITY Knowledge graph enriches the graph ◼ We

can enrich the graph by knowledge graphs ? ? director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ
55. ### 55 KYOTO UNIVERSITY Implication of the fact that GNNs are

general NNs ◼ The director is the same as his previous movie → recommended ! director Damien Sayre Chazelle Ryan Thomas Gosling Starring Stephen Edwin King original influence original レイ・ブラッドベリ
56. ### 56 KYOTO UNIVERSITY GNNs are flexible enough to utilize messy

data ◼ Graph is general and flexible. ◼ We can incorporate all information (including messy one) we have into the graph and let GNNs learn patterns from it. ◼ Graphs and GNNs are suitable to incorporate complex data thanks to its flexibility. ◼ Of course it is better to use only useful relations.
57. ### 57 KYOTO UNIVERSITY GNNs are helpful in both sparse and

rich regimes ◼ When data (e.g., purchase log) are sparse → This is a challenging problem but can be alleviated by auxiliary (e.g., friendship) information. ◼ When data are rich → incorporate all information you have. GNNs will learn patterns from nuanced signals and boost the performance. ◼ GNNs are helpful in both regimes
58. ### 58 KYOTO UNIVERSITY GNNs and KG for factual QA with

LLMs ◼ Graph Neural Prompting with Large Language Models (AAAI 2024) ◼ In: Question, choices, knowledge graph Out: Answer
59. ### 59 KYOTO UNIVERSITY Use KG embeddings as a prompt for

LLM ◼ Method: Retrieve a relevant part from the knowledge graph, compute the embeddings by GNNs, and prepend it as the prompt for LLM.
60. ### 60 KYOTO UNIVERSITY GNNs and KG boost the performance ◼

Results: Graph prompting provides a huge benefit.
61. ### 61 KYOTO UNIVERSITY GNNs for physics simulations ◼ Learning Mesh-Based

Simulation with Graph Networks (ICLR 2021) ◼ In: The state of a material (e.g., cloth) ◼ Out: How the material changes
62. ### 62 KYOTO UNIVERSITY Represent the material as a mesh graph

◼ Idea: Each point interacts with its neighbors. Near points will move in similar directions. ◼ Method: Construct a mesh-like graph on the substance and use a GNN to make predictions.

64. ### 64 KYOTO UNIVERSITY GNNs for image classification ◼ Vision GNN:

An Image is Worth Graph of Nodes (NeurIPS 2022) ◼ Task: Image Classification (ImageNet) ◼ The proposed method is like vision transformer, but guides attention with graphs. Node: Patch of the image Edge: kNN w.r.t. patch embeddings

ViTs
66. ### 66 KYOTO UNIVERSITY Generative models for molecule graphs Tags-to-image models

are popular these days It would be useful if we could do this with molecules ((best quality)), ((masterpiece)), ((ultra-detailed)), (illustration), (detailed light), (an extremely delicate and beautiful), ..., stars in the eyes, messy floating hair, ... Organic, water soluble, lightweight, inexpensive, non-toxic, medicinal,...
67. ### 67 KYOTO UNIVERSITY We use GNNs to build graph generative

models ◼ Many generative models can be used for graph data by replacing the components with GNNs. ◼ E.g., Replacing U-Net for diffusion with GNNs ◼ VAE and GANs can also be used with GNNs. E.g., GraphVAE [Simonovsky+ 2019]
68. ### 68 KYOTO UNIVERSITY Diffusion model removes noise from data ◼

Diffusion models are popular these days. ◼ They take noisy data and estimate the noise. ◼ From complete noise, they iteratively refine the data by estimating and subtracting noise. https://drive.google.com/file/ d/18zIMEAZzLWjyh8FhVX3cPS lJst1JNiQK/view?usp=sharing
69. ### 69 KYOTO UNIVERSITY We can build diffusion models for graphs

◼ The same idea can be applied to graphs. ◼ GNN takes a noisy graph and estimate which edges and features are noisy. https://drive.google.com/file/d /1NyT3FAGMq2LpqbgRoKfPd9s UVmktC0-5/view?usp=sharing
70. ### 70 KYOTO UNIVERSITY Diffusion models for graphs ◼ Score-based Generative

Modeling of Graphs via the System of Stochastic Differential Equations (ICML 2022) ◼ It models denoising of the graph structure A and the node features X.
71. ### 71 KYOTO UNIVERSITY Graph models suffer from lack of data

◼ Graph generative models have not been so successful than image and text generative models. ◼ This is partly because graph data are scarce. People do not post molecules on the Internet.
72. ### 72 KYOTO UNIVERSITY LLMs may be beneficial to provide commonsense

◼ Direction 1: Use LLMs and papers and textbooks on molecules and drugs. ◼ We can pretrain models using plenty of texts and have the model acquire commonsense on chemistry and pharmacy. ◼ What can Large Language Models do in chemistry? paper [Guo+ NeurIPS Dataset and Benchmark 2023] investigated the power of GPT-4 in molecule design and related tasks.
73. ### 73 KYOTO UNIVERSITY Feedback from simulators is also beneficial ◼

Direction 2: Use feedback from simulators. ◼ We can simulate the effect of molecules. It is difficult to judge the quality of text without humans. ◼ It’s like reinforcement learning from human feedback but from simulator feedback, which is cheaper. ◼ Many RL-based graph generative models have been proposed, e.g., GCPN [You+ NeurIPS 2018].

75. ### 75 KYOTO UNIVERSITY GNN is an all-rounder ◼ GNNs are

general and flexible. It can solve many kinds of tasks. Recommender systems, image classification, physics simulation, drug discovery, … ◼ We can build graphs with various information, which help us boost the performance of models. This idea can be used for any task, thanks to the flexibility of graphs and GNNs. Graph is an all-rounder that can handle many forms of data and boosts your model.