A gentle introduction to graph neural networks

Slide 1

Slide 1 text

A gentle introduction to graph neural networks Alex Ganose Department of Chemistry Imperial College London [email protected] website: virtualatoms.org

Slide 2

Slide 2 text

What is a graph? Node / vertex Edge Graphs encode relations between entities

Slide 3

Slide 3 text

What is a graph? Node / vertex Edge Edges can be directed

Slide 4

Slide 4 text

What is a graph? Node embedding Edge embedding Information is stored in each piece

Slide 5

Slide 5 text

Where do we find graphs Social networks > 1B nodes > 10B edges Biological systems Clamydomonas reinhardtii

Slide 6

Slide 6 text

Where do we find graphs Eurovision Economics

Slide 7

Slide 7 text

An image is a graph with regular structure 0-0 0-1 0-2 0-3 0-4 4-0 4-1 4-2 4-3 4-4 3-0 3-1 3-2 3-3 3-4 2-0 2-1 2-2 2-3 2-4 1-0 1-1 1-2 1-3 1-4 Image pixels Adjacency matrix 0-0 1-0 2-0 3-0 4-0 0-4 1-4 2-4 3-4 4-4 0-3 1-3 2-3 3-3 4-3 0-2 1-2 2-2 3-2 4-2 0-1 1-1 2-1 3-1 4-1 Graph 0-0 1-0 2-0 3-0 4-0 0-1 1-1 2-1 3-1 4-1 0-2 1-2 2-2 3-2 4-2 0-3 1-3 2-3 3-3 4-3 0-4 1-4 2-4 3-4 4-4 0-0 1-0 2-0 3-0 4-0 0-1 1-1 2-1 3-1 4-1 0-2 1-2 2-2 3-2 4-2 0-3 1-3 2-3 3-3 4-3 0-4 1-4 2-4 3-4 4-4

Slide 8

Slide 8 text

A sentence can be viewed as a directed graph Graphs are all around us Graphs are all around us us around all are Graphs

Slide 9

Slide 9 text

Graphs are a natural representation in chemistry Molecules Crystals N S O OH

Slide 10

Slide 10 text

All graphs are not alike The size and connectivity of graphs can vary enormously Fully connected Sparse Dataset Graphs Nodes Edges Fully con. 1 5 20 Sparse 2 <4 <3 Wikipedia 1 12M 378M qm9 134k <9 <26 Cora 1 23k 91k

Slide 11

Slide 11 text

The types of problems tackled with graphs Graph level e.g. total energy of a molecule Node level e.g. oxidation state of an atom Edge level e.g. strength of a bond

Slide 12

Slide 12 text

Graph networks enabled Alpha Fold (node level) Protein as a graph with amino acids (nodes) linked by edges Used to calculate interactions between parts of the protein

Slide 13

Slide 13 text

Deep learning with graphs Include adjacency matrix as features in a standard neural network Issues: fixed size and sensitive to the order of nodes 0 1 2 3 4 0 1 2 3 4 0 2 3 4 1 0 1 1 1 0 1 0 1 1 0 1 1 0 0 1 1 1 0 0 1 0 0 1 1 0

Slide 14

Slide 14 text

Deep learning with graphs A convolutional neural network (CNN) filter transforms and combines information from neighbouring pixels in an image 0 –1 0 –1 4 –1 0 –1 0 Convolution filter learned during training to extract higher level features e.g., edges

Slide 15

Slide 15 text

Convolutions on graphs Images can be seen as a regular graph; can we extend the concept of convolutions? Convolution from neighbours to central node

Slide 16

Slide 16 text

Convolutions on graphs By iterating over the entire graph each node receives information from its neighbours

Slide 17

Slide 17 text

Where do neural networks come in? Neural networks are used to decide: Message What get passed from one node to another Pooling / Aggregation How messages from all neighbours are combined Update How the node is updated given the pooled message Σ

Slide 18

Slide 18 text

Components of a convolutional graph network Message function Pooling function Update function 𝑖 𝑗 𝒗! 𝒎" = & !∈𝒩 " 𝑀% (𝒗" , 𝒗! ) 𝒗" & = 𝑈% (𝒗" , 𝒎" ) 𝒗"

Slide 19

Slide 19 text

Convolutional graph networks introduced in 2017

Slide 20

Slide 20 text

Implementation of neural network functions Message function: (no processing) Pooling function: (normalised sum) Update function: (MLP) 𝒎" = , !∈𝒩 " 𝒗! 𝒩 𝑖 𝒗! 𝒗" & = 𝜎 𝐖𝒎" + 𝐁𝒗" non-linearity weights num neighbours

Slide 21

Slide 21 text

Visual depiction of a graph convolution 𝒗! 1. Prepare messages 𝒗! 𝒗! 𝒗! 𝒗"

Slide 22

Slide 22 text

𝒗! 𝒗! 𝒗! 𝒗! Visual depiction of a graph convolution 𝒎" 1. Prepare messages 2. Pool messages 𝒗"

Slide 23

Slide 23 text

Visual depiction of a graph convolution 𝒗" & 1. Prepare messages 2. Pool messages 3. Update embedding

Slide 24

Slide 24 text

Requirements of the pooling function The pooling function must be invariant to node ordering and the number of nodes All take a variable number of inputs and provide an output that is the same, no matter the ordering 4 2 ? Function Node value Max 4 Mean 3 Sum 6

Slide 25

Slide 25 text

𝒗" & = 𝜎 𝐖 , !∈𝒩 " 𝒗! 𝒩 𝑖 + 𝐁𝒗" Training convolutional graph neural networks Feed the final node embeddings to a loss function Run an optimiser to train the weight parameters 𝐖 and 𝐁 are shared across all nodes

Slide 26

Slide 26 text

Inductive capabilities and efficiency Each node has its own network due to its connectivity Message, pool, and update functions are shared for all nodes Can increase number of nodes without increasing the number of parameters Can introduce new unseen node structures and just plug in the same matrices

Slide 27

Slide 27 text

Stacking multiple convolutional layers Only looked at a single convolution – can we stack multiple layers? 𝒗 " (%()) = 𝜎 𝐖(%) , !∈𝒩 " 𝒗 ! (%) 𝒩 𝑖 + 𝐁(%)𝒗 " (%) Convolution 𝒗 " (+) Convolution 𝒗 " ()) Convolution 𝒗 " (,) 𝒗 " (-) Weights are unique for each layer

Slide 28

Slide 28 text

Why multiple convolutions? Graph are inherently local – Nodes can only see other nodes 𝒕 convolutions away Multiple convolutions increases the “receptive field” of the nodes 0 2 3 4 1 𝒕 = 𝟏 𝒕 = 𝟑 𝒕 = 𝟐 Not seen by node 0

Slide 29

Slide 29 text

The over smoothing problem However, too many convolutions causes over smoothing — all node embeddings converge to the same value 𝒕 = 𝟎 𝒕 = 𝟏 𝒕 = 𝟐 𝒕 = 𝟑

Slide 30

Slide 30 text

What about edge embeddings Only considered node updates but graphs have edges too — can we learn something about edges from nodes? Edge embedding 𝑖 𝑗 𝒆"! 𝒎" = & !∈𝒩 " 𝑀% (𝒗" , 𝒗! , 𝒆"! ) 𝒗" & = 𝑈% (𝒗" , 𝒎" ) Update function stays the same

Slide 31

Slide 31 text

Message passing networks – significant flexibility Many options for how to treat edges in the pooling function Edge embeddings may have different dimensionality to node embeddings An option is to pool all edges and concatenate them at the end

Slide 32

Slide 32 text

Message passing networks – significant flexibility Can update nodes before edges or vice versa Or have a weave design to pass messages back and forth All flexible design choices in message passing networks

Slide 33

Slide 33 text

Convolutional graph networks for crystals Graphs are a natural representation for crystals and but we have extra design constraints Networks should be permutation and translation invariant Properties depend on atom types and coordinates not just connectivity

Slide 34

Slide 34 text

Constructing the graph from a crystal structure Must consider periodic boundaries Include all atoms within a certain cut-off as neighbours 𝑟./0 Perform the procedure for each atom in the unit cell Nodes can share multiple edges to the same neighbour due to PBC

Slide 35

Slide 35 text

Crystal graph convolutional neural networks (CGCNN) CGCNN was the first time graph convolutions were applied to crystals R Conv + ... L 1 hidden Pooling L 2 hidden Output Xie and Grossman Phys. Rev. Lett. 120, 145301 (2018)

Slide 36

Slide 36 text

Implementation of CGCNN Message function: Update function: 𝒎 " (%) = 𝒗 " (%) ⊕ 𝒗 ! (%) ⊕ 𝒆",! 𝒗 " (%()) = 𝒗 " (%) + , !∈𝒩 " 𝜎 𝐖 2 (%)𝒎 " (%) + 𝒃 2 (%) ⊙ 𝑔 𝐖3 (%)𝒎 " (%) + 𝒃3 (%) sigmoid softplus “gate” {

Slide 37

Slide 37 text

Initialisation — node and edge embeddings What to do for the initial node and edge embeddings? Nodes The element type is one-hot encoded (dimension of 119) and passed through an MLP Edges The bond distance is projected onto a Gaussian basis (40 basis functions)

Slide 38

Slide 38 text

Readout — calculating the final prediction CGCNN generates graph level predictions, how are these generated from the final node embeddings? 𝒖4 = , "∈𝒢 𝒗 " (6) 𝒢 Final pooling of all nodes SLP readout num atoms 𝐸 = 𝜎 𝐖7 𝐮4 + 𝒃7

Slide 39

Slide 39 text

CGCNN performance CGCNN shows good accuracy for such a simple model but errors are still too large for reliable science

Slide 40

Slide 40 text

Advanced message passing networks CGCNN only uses bond lengths as features. More advanced networks show improved performance MEGNet Crystal features and set2set pooling M3GNet Bond angles and dihedrals

Slide 41

Slide 41 text

Vector and tensor properties — equivariance Higher dimensionality properties (vectors, tensors) such as force and stress require equivariant models force rotate Forces should transform commensurate with the structure

Slide 42

Slide 42 text

Equivariant features This requires features that transform predictably under rotations Credit: Tess Smidt, e3nn.org/e3nn-tutorial-mrs-fall-2021

Slide 43

Slide 43 text

Equivariant graph models Higher dimensionality properties (vectors, tensors) such as forces and stresses require equivariant models e3nn High-order spherical harmonic basis Nequip MLIP tensorial features

Slide 44

Slide 44 text

A large number of graph networks exist

Slide 45

Slide 45 text

Graph networks and the MatBench dataset npj Comput. Mater. 6, 138 (2020) Graph neural networks are widely used for property predictions in chemistry but excel on larger datasets

Slide 46

Slide 46 text

Uses of graph networks https://matbench.materialsproject.org GNNs take up most of the top spots on the current leader board Many high-performance MLIPs use graphs (MACE, nequip, allegro)

Slide 47

Slide 47 text

Summary • Many datasets can be represented as graphs. • GNNs work by i) building a graph and ii) propagating information between neighbours using NNs • GNNs are scalable and can generalise well • There are many possibilities for designing GNNs