Journal Club GraphSAGE

Inductive Representation Learning on Large Graphs William L. Hamilton+ Tomoya
Matsumoto Feb. 6, 2019 BioInformation Engineering Lab. M1 1

Overview

Overview • GraphSAGE extends GCNs to the task of inductive
unsupervised learning. • This Network generalizes the GCN approach to use trainable aggregation functions (beyond simple convolutions). • It can classify the category of unseen nodes in evolving information graphs and generalize to completely unseen graphs. 2

Introduction

Tasks with GNN • Node Classification • Graph Classification •
Link Prediction • Edge Classification 3

Node Classiﬁcation • Transductive (Semi-supervised) Predict unlabeled data. Training data
and test data are same. • Inductive Predict unseened data. Training data and test data are diﬀerent. 4

Proposed Method -GraphSAGE-

Overview Figure 1: Visual illustration of the GraphSAGE sample and
aggregate approach 5

Algorithm Input: G(V, E) Input: input features{xv, ∀v ∈ V}
Input: depth K Input: weight matrices Wk, ∀k ∈ {1, . . . , K} Input: neighborhood function N : v → 2V Output: Vector representations zv for all v ∈ V h0 v ← xv, ∀v ∈ V for k = 1 . . . K do for v ∈ V do hk N(v) ← AGGREGATEk({hk−1 u , ∀u ∈ N(v)}) hk v ← σ(Wk · CONCAT(hk−1 v , hk N(v) )) end hk v ← hk v/||hk v||2, ∀v ∈ V end zv ← hK v , ∀v ∈ V 6

Neighborhood Function N(v) is defined as a fixed-size, uniform draw
from the set {u ∈ V : (u, v) ∈ E}, and selects different uniform samples at each iteration, k. Using this sampling, the per-batch space and time complexity is fixed at O( ∏K i=1 Si) (in this case, K = 2 and S1 · S2 ≤ 500); otherwise O(|V|). 7

Aggregator In order to train and apply the model to
arbitrarily ordered node neighborhood feature sets, an aggregator function would be symmetric and trainable and high representational capacity. In this paper, 3 aggregators were proposed. • Mean aggregator • LSTM aggregator • Pooling aggregator 8

Mean Aggregator Mean aggregator simply takes the elementwise mean of
the vector. It is nearly equivalent to the convolutional propagation rule used in the transductive GCN. (Kipf+) Inductive variant of the GCN can be derived by replacing aggregator and concat operator with the following. hk v ← σ(W · MEAN({hk−1 v } ∪ {hk−1 u , ∀u ∈ N(v)} It does not perform the concatenation operation which is viewed as skip connection. 9

LSTM Aggregator LSTM aggregator has the advantage of larger expressice
capability. However, this aggregator is NOT permutation invariant because LSTM is not inherently symmetric. It is designed for SEQUENTIAL data and NOT unordered set. 10

Pooling aggregator AGGREGATEk = max({σ(Wpool hk ui + b), ∀ui
∈ N(v)}) Pooling aggregator is symmetric and trainable. MLP can be thought of as a set of functions that compute features for each of the node representations in the neighbor set. By applying the max-pooling operator, the model eﬀectively captures diﬀerent aspects of the neighborhood set. 11

Loss Function - Unsupervised Graph-based loss function JG(zu) = −
log(σ(z⊤ u zv)) − Q · Evn∼Pn(v) log(σ(−z⊤ u zvn )) This encourages nearby nodes to have similar representations and enforces that the representations of disparate nodes are highly distinct. Reference Distributed Representations of Words and Phrases and their Compositionaliy, Tomas Mikolov+, 2013 https://arxiv.org/abs/1310.4546 12

Experiments

Datasets - Tasks • WoS Citation dataset Predict paper subject
categories. • Reddit dataset Predict which community diﬀerent Reddit posts belong to. These tasks are classifying nodes in evolving information graphs. It is especially relevant to high-throughput production systems, which constantly encounters unseen data. 13

Datasets - Tasks • PPI dataset Classsify protein cellular functions
from gene ontology in various PPI graphs. This task is generalizing across graphs, which requires learning about node roles rather than community structure. 14

Result GraphSAGE outperforms all the baselines by a signiﬁcant margin,
and the trainable, neural network aggregators provide signiﬁcant gains compared to the GCN. 15

Result For GraphSAGE, K = 2 provided a consistent boost
in accuracy around 10-15%, on average, compared to K = 1. However, K > 2 gave marginal returns in performance (0-5%) while increasing the runtime depending on the neighborhood sample size. 16

Conclusion GraphSAGE allows embeddings to be efficiently generated for unseen
nodes. It effectively trades off performance and runtime by sampling node neighborhoods. 17

Journal Club GraphSAGE

Journal Club GraphSAGE

Tomoya Matsumoto

Other Decks in Research

Featured

Transcript

Inductive Representation Learning on Large Graphs William L. Hamilton+ Tomoya

Overview

Overview • GraphSAGE extends GCNs to the task of inductive

Introduction

Tasks with GNN • Node Classiﬁcation • Graph Classiﬁcation •

Node Classiﬁcation • Transductive (Semi-supervised) Predict unlabeled data. Training data

Proposed Method -GraphSAGE-

Overview Figure 1: Visual illustration of the GraphSAGE sample and

Algorithm Input: G(V, E) Input: input features{xv, ∀v ∈ V}

Neighborhood Function N(v) is deﬁned as a ﬁxed-size, uniform draw

Aggregator In order to train and apply the model to

Mean Aggregator Mean aggregator simply takes the elementwise mean of

LSTM Aggregator LSTM aggregator has the advantage of larger expressice

Pooling aggregator AGGREGATEk = max({σ(Wpool hk ui + b), ∀ui

Loss Function - Unsupervised Graph-based loss function JG(zu) = −

Experiments

Datasets - Tasks • WoS Citation dataset Predict paper subject

Datasets - Tasks • PPI dataset Classsify protein cellular functions

Result GraphSAGE outperforms all the baselines by a signiﬁcant margin,

Result For GraphSAGE, K = 2 provided a consistent boost

Conclusion GraphSAGE allows embeddings to be eﬃciently generated for unseen