250

# Journal Club GraphSAGE

Resume for journal club in lab. ## Tomoya Matsumoto

February 06, 2019

## Transcript

1. Inductive Representation Learning on Large Graphs
William L. Hamilton+
Tomoya Matsumoto
Feb. 6, 2019
BioInformation Engineering Lab. M1
1

2. Overview

3. Overview
• GraphSAGE extends GCNs to the task of inductive
unsupervised learning.
• This Network generalizes the GCN approach to use trainable
aggregation functions (beyond simple convolutions).
• It can classify the category of unseen nodes in evolving
information graphs and generalize to completely unseen
graphs.
2

4. Introduction

• Node Classiﬁcation
• Graph Classiﬁcation
• Edge Classiﬁcation
3

6. Node Classiﬁcation
• Transductive (Semi-supervised)
Predict unlabeled data.
Training data and test data are same.
• Inductive
Predict unseened data.
Training data and test data are diﬀerent.
4

7. Proposed Method -GraphSAGE-

8. Overview
Figure 1: Visual illustration of the GraphSAGE sample and aggregate
approach
5

9. Algorithm
Input: G(V, E)
Input: input features{xv, ∀v ∈ V}
Input: depth K
Input: weight matrices Wk, ∀k ∈ {1, . . . , K}
Input: neighborhood function N : v → 2V
Output: Vector representations zv for all v ∈ V
h0
v ← xv, ∀v ∈ V
for k = 1 . . . K do
for v ∈ V do
hk
N(v)
← AGGREGATEk({hk−1
u , ∀u ∈ N(v)})
hk
v ← σ(Wk · CONCAT(hk−1
v , hk
N(v)
))
end
hk
v ← hk
v/||hk
v||2, ∀v ∈ V
end
zv ← hK
v , ∀v ∈ V
6

10. Neighborhood Function
N(v) is deﬁned as a ﬁxed-size, uniform draw from the set
{u ∈ V : (u, v) ∈ E}, and selects diﬀerent uniform samples at each
iteration, k.
Using this sampling, the per-batch space and time complexity is
ﬁxed at O(
∏K
i=1
Si) (in this case, K = 2 and S1 · S2 ≤ 500);
otherwise O(|V|).
7

11. Aggregator
In order to train and apply the model to arbitrarily ordered node
neighborhood feature sets, an aggregator function would be
symmetric and trainable and high representational capacity.
In this paper, 3 aggregators were proposed.
• Mean aggregator
• LSTM aggregator
• Pooling aggregator
8

12. Mean Aggregator
Mean aggregator simply takes the elementwise mean of the vector.
It is nearly equivalent to the convolutional propagation rule used in
the transductive GCN. (Kipf+)
Inductive variant of the GCN can be derived by replacing
aggregator and concat operator with the following.
hk
v ← σ(W · MEAN({hk−1
v } ∪ {hk−1
u , ∀u ∈ N(v)}
It does not perform the concatenation operation which is viewed as
skip connection.
9

13. LSTM Aggregator
LSTM aggregator has the advantage of larger expressice capability.
However, this aggregator is NOT permutation invariant because
LSTM is not inherently symmetric.
It is designed for SEQUENTIAL data and NOT unordered set.
10

14. Pooling aggregator
AGGREGATEk = max({σ(Wpool
hk
ui
+ b), ∀ui ∈ N(v)})
Pooling aggregator is symmetric and trainable.
MLP can be thought of as a set of functions that compute features
for each of the node representations in the neighbor set.
By applying the max-pooling operator, the model eﬀectively
captures diﬀerent aspects of the neighborhood set.
11

15. Loss Function - Unsupervised
Graph-based loss function
JG(zu) = − log(σ(z⊤
u
zv)) − Q · Evn∼Pn(v)
log(σ(−z⊤
u
zvn
))
This encourages nearby nodes to have similar representations and
enforces that the representations of disparate nodes are highly
distinct.
Reference
Distributed Representations of Words and Phrases and their
Compositionaliy, Tomas Mikolov+, 2013
https://arxiv.org/abs/1310.4546
12

16. Experiments

• WoS Citation dataset
Predict paper subject categories.
• Reddit dataset
Predict which community diﬀerent Reddit posts belong to.
These tasks are classifying nodes in evolving information graphs.
It is especially relevant to high-throughput production systems,
which constantly encounters unseen data.
13

• PPI dataset
Classsify protein cellular functions from gene ontology in
various PPI graphs.
This task is generalizing across graphs, which requires learning
about node roles rather than community structure.
14

19. Result
GraphSAGE outperforms all the baselines by a signiﬁcant margin,
and the trainable, neural network aggregators provide signiﬁcant
gains compared to the GCN.
15

20. Result
For GraphSAGE, K = 2 provided a consistent boost in accuracy
around 10-15%, on average, compared to K = 1.
However, K > 2 gave marginal returns in performance (0-5%) while
increasing the runtime depending on the neighborhood sample
size. 16

21. Conclusion
GraphSAGE allows embeddings to be eﬃciently generated for
unseen nodes.
It eﬀectively trades oﬀ performance and runtime by sampling node
neighborhoods.
17