Slide 1

Slide 1 text

Survey on Invariant and Equivariant Graph Neural Networks Sugiyama-Sato-Honda Lab Kenshin Abe 2020/01/20 1

Slide 2

Slide 2 text

TL;DR ✴ What inductive bias GNNs need to have? ‣ Permutation invariance / equivariance of nodes ✴ Rapid progress during the last year (2019) ‣ Three papers from Maron+ are great ✴ This talk from ICML 2019 workshop is a good tutorial ‣ http://irregulardeep.org/Graph-invariant-networks-ICML- talk/ 2

Slide 3

Slide 3 text

Notations ✴ : a set of nodes ✴ : a set of edges (directed / undirected, weighted / unweighted) ✴ : Number of nodes ( ) ✴ : neighbor (a set of nodes adjacent to ) ✴ : network ✴ : (fixed) output feature dimension ✴ V E n = |V| N(v) v f d [n] = {1,2,...,n} 3

Slide 4

Slide 4 text

Problem Setting ✴ We want to learn from graphs 4 https://en.wikipedia.org/wiki/Social_graph Graph Graph regression / classification Node regression / classification ℝn×d ℝd

Slide 5

Slide 5 text

Problem Setting ✴ We want to learn from graphs 5 https://en.wikipedia.org/wiki/Social_graph Graph Graph regression / classification Node regression / classification ℝn×d ℝd Mainly focus on this

Slide 6

Slide 6 text

Message Passing Graph Neural Networks 6

Slide 7

Slide 7 text

Message Passing Neural Networks (MPNNs) [Gilmer+ ICML 2017] Many proposed models can be formulated in the following way ✴ Massage passing phase ‣ ‣ ✴ Readout phase ‣ Performed SOTA on molecular property prediction task. mt+1 v = ∑ w∈N(v) Mt (ht v , ht w , evw ) ht+1 v = Ut (ht v , mt+1 v ) y = R({hT v |v ∈ V}) 7 : hidden state of in -th layer : edge feature : learned functions ht v v t evw Mt , Ut , R

Slide 8

Slide 8 text

Expressive Power of MPNNs ✴ [Xu+ ICLR 2019] and [Morris+ AAAI 2019] analyzed MPGNNs power in terms of graph isomorphism ‣ MPGNNs are as strong as Weisfeiler-Lehman graph isomorphism test (WL-test) • Strong heuristics to check graph isomorphism ‣ Graph Isomorphism Network • As strong as WL-test • Simple and run in O(|E|) 8

Slide 9

Slide 9 text

Limitation of MPNNs ✴ WL-test is strong but still… ✴ Cannot distinguish a very simple counterexample 9 https://arxiv.org/pdf/1905.11136.pdf

Slide 10

Slide 10 text

Invariance / Equivariance 10

Slide 11

Slide 11 text

Graph as Tensors ✴ Hypergraph (= each edge includes a set of nodes) can be described as a tensor ( ) ‣ Information on -tuples of nodes ✴ ex. (“normal” graph) ‣ Adjacency matrix ‣ A ∈ ℝnk k : max e∈E |e| k k = 2 ( 010 001 100 ) 11 0 2 1

Slide 12

Slide 12 text

Demand for Tensor Input ✴ Want to get the same output for isomorphic graphs ‣ ‣ ✴ What condition exactly should have? f(A) A1 = ( 010 001 100 ) , A2 = ( 001 100 010 ) f(A1 ) = f(A2 ) f 12

Slide 13

Slide 13 text

Invariance and Equivariance ✴ Let be a permutation matrix and be reordering operator ‣ is a permutation of in each dimension ✴ Invariance of ‣ ✴ Equivariance of ‣ P ⋆ P ⋆ A A f : ℝnk → ℝ f(P ⋆ A) = f(A) f : ℝnk → ℝnl f(P ⋆ A) = P ⋆ f(A) 13 http://irregulardeep.org/An-introduction-to-Invariant-Graph-Networks-(1-2)/

Slide 14

Slide 14 text

Invariant Graph Networks [Maron+ ICLR 2019] ✴ Imitating other neural network model, it’s natural to construct the architecture below ‣ : Equivariant linear layer + nonlinear activation function ‣ : Invariant linear layer ‣ : Multilayer perceptron Li H M 14 http://irregulardeep.org/An-introduction-to-Invariant-Graph-Networks-(1-2)/ ℝnk0 ℝnk1 ℝnk2 ℝnkL ℝ ℝ

Slide 15

Slide 15 text

Invariant Graph Networks [Maron+ ICLR 2019] ✴ Imitating other neural network model, it’s natural to construct the architecture below ‣ : Equivariant linear layer + nonlinear activation function ‣ : Invariant linear layer ‣ : Multilayer perceptron ✴ Can we collect all equivariant linear layers? Li H M 15 http://irregulardeep.org/An-introduction-to-Invariant-Graph-Networks-(1-2)/ ℝnk0 ℝnk1 ℝnk2 ℝnkL ℝ ℝ

Slide 16

Slide 16 text

Dimension of Invariant / Equivariant Linear Layer ✴ Let be an invariant linear layer ‣ Dimension is ✴ Let be an equivariant linear layer ‣ Dimension is ✴ Where is -th Bell number ‣ Number of ways to partition distinguished elements f : ℝnk → ℝ b(k) f : ℝnk → ℝnl b(k + l) b(k) k k 16 k 1 2 3 4 5 6 7 8 9 b(k) 1 2 5 15 52 203 877 4140 21147 https://en.wikipedia.org/wiki/Bell_number

Slide 17

Slide 17 text

Proof Idea Prove “dimension of equivariant layer is ”. 1. Consider coefficient matrix and solve the fixed- point equations ‣ (for all permutation ) 2. Let be an equivalence relation over , such that ‣ and consider equivalence class 3. From each , we can construct orthogonal basis ‣ f : ℝnk → ℝnl b(k + l) X ∈ ℝnk×nl Q ⋆ X = X Q ∼ [n]l a ∼ b : ai = aj ⇔ bi = bj (∀i, j ∈ [l]) [n]l/ ∼ γ ∈ [n]l/ ∼ Bγ a = { 1 (a ∈ γ) 0 (otherwise) 17

Slide 18

Slide 18 text

Dimension of invariant / equivariant linear layer ✴ Let be an invariant linear layer ‣ Dimension is ✴ Let be an equivariant linear layer ‣ Dimension is ✴ Dimension doesn’t depend on ‣ IGN can be applied to graphs of different sizes ✴ We call IGN with max tensor order as -IGN f : ℝnk → ℝ b(k) f : ℝnk → ℝnl b(k + l) n k k 18

Slide 19

Slide 19 text

Representation power 19

Slide 20

Slide 20 text

Universality Invariant Graph Networks can approximate any invariant / equivariant function with high-order tensor. ✴ [Maron+ ICML 2019] ‣ Show invariant case by [Yarotsky+ 2018]’s polynomial ✴ [Keriven+ NeurIPS 2019] ‣ Show equivariant case (output tensor order is ) by extended Stone-Weierstrass theorem ✴ [Maehara+ 2019] ‣ Show equivariant case (for high output tensor order) by homomorphism number 1 20

Slide 21

Slide 21 text

Universality Invariant Graph Networks can approximate any invariant / equivariant function with high-order tensor. ✴ [Maron+ ICML 2019] ‣ Show invariant case by [Yarotsky+ 2018]’s polynomial ✴ [Keriven+ NeurIPS 2019] ‣ Show equivariant case (output tensor order is ) by extended Stone-Weierstrass theorem ✴ [Maehara+ 2019] ‣ Show equivariant case (for high output tensor order) by homomorphism number Architecture with high order-tensor is not practical. 1 21

Slide 22

Slide 22 text

Provably Powerful Graph Networks [Maron+ NeurIPS 2019] ✴ Proved the correspondence between -IGN and -WL ✴ Proposed a strong and scalable model 2-IGN+ k k 22 http://irregulardeep.org/How-expressive-are-Invariant-Graph-Networks-(2-2)/

Slide 23

Slide 23 text

WL-hierarchy ✴ WL test can be generalized to -dimensional version ✴ There exist a known hierarchy of -WL ‣ -WL is strictly stronger than -WL ✴ -IGN is at least as strong as -WL (Their contribution) k k (k + 1) k k k 23 http://irregulardeep.org/How-expressive-are-Invariant-Graph-Networks-(2-2)/

Slide 24

Slide 24 text

2-IGN+ ✴ Scalable and powerful model ‣ Only 2-order tensor (adjacency matrix) ‣ At least as powerful as 3-WL ✴ Intuition: Adjacency matrix multiplication counts the number of paths (cycles) 24 http://irregulardeep.org/How-expressive-are-Invariant-Graph-Networks-(2-2)/

Slide 25

Slide 25 text

Conclusion 25

Slide 26

Slide 26 text

Summary ✴ Many variant of message passing GNN made a success. ✴ Due to the theoretical limitation of message passing GNN’s representation power, Invariant Graph Network was invented. ✴ Invariant Graph Network can approximate any invariant function, but needs high-order tensor. ✴ Scalable models of Invariant Graph Network are studied for practical use. 26

Slide 27

Slide 27 text

Future Direction (My Thoughts) ✴ Generalization / Optimization ‣ Normalization technique doesn’t affect representation power but affects any of these? ✴ Beyond invariance (equivariance) ‣ [Sato+ NeurIPS 2019] connected the theory of GNN and distributed local algorithm ‣ Sometimes we need non-invariant (non-equivariant) function? ✴ Scalable model of IGN ‣ 2-IGN+ requires while MPNNs run in ‣ Polynomial invariant / equivariant layer O(n3) O(|E|) 27

Slide 28

Slide 28 text

References 1/2 ✴ http://irregulardeep.org/An-introduction-to-Invariant-Graph- Networks-(1-2)/ ✴ Gilmer, Justin & Schoenholz, Samuel & Riley, Patrick & Vinyals, Oriol & Dahl, George. (2017). Neural Message Passing for Quantum Chemistry. ✴ Morris, Christopher & Ritzert, Martin & Fey, Matthias & Hamilton, William & Lenssen, Jan & Rattan, Gaurav & Grohe, Martin. (2018). Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks. ✴ Xu, Keyulu & Hu, Weihua & Leskovec, Jure & Jegelka, Stefanie. (2018). How Powerful are Graph Neural Networks? ✴ Maron, Haggai & Ben-Hamu, Heli & Shamir, Nadav & Lipman, Yaron. (2018). Invariant and Equivariant Graph Networks. 28

Slide 29

Slide 29 text

References 2/2 ✴ Maron, Haggai & Fetaya, Ethan & Segol, Nimrod & Lipman, Yaron. (2019). On the Universality of Invariant Networks. ✴ Keriven, Nicolas & Peyré, Gabriel. (2019). Universal Invariant and Equivariant Graph Neural Networks. ✴ Maehara, Takanori & NT, Hoang. (2019). A Simple Proof of the Universality of Invariant/Equivariant Graph Neural Networks. ✴ Maron, Haggai & Ben-Hamu, Heli & Serviansky, Hadar & Lipman, Yaron. (2019). Provably Powerful Graph Networks. ✴ R. Sato, M. Yamada, and H. Kashima. Approximation Ratios of Graph Neural Networks for Combinatorial Problems. In NeurIPS 2019. 29