240

# Journal Club GraphSAGE

Resume for journal club in lab.

## Tomoya Matsumoto

February 06, 2019

## Transcript

1. ### Inductive Representation Learning on Large Graphs William L. Hamilton+ Tomoya

Matsumoto Feb. 6, 2019 BioInformation Engineering Lab. M1 1

3. ### Overview • GraphSAGE extends GCNs to the task of inductive

unsupervised learning. • This Network generalizes the GCN approach to use trainable aggregation functions (beyond simple convolutions). • It can classify the category of unseen nodes in evolving information graphs and generalize to completely unseen graphs. 2

5. ### Tasks with GNN • Node Classiﬁcation • Graph Classiﬁcation •

Link Prediction • Edge Classiﬁcation 3
6. ### Node Classiﬁcation • Transductive (Semi-supervised) Predict unlabeled data. Training data

and test data are same. • Inductive Predict unseened data. Training data and test data are diﬀerent. 4

8. ### Overview Figure 1: Visual illustration of the GraphSAGE sample and

aggregate approach 5
9. ### Algorithm Input: G(V, E) Input: input features{xv, ∀v ∈ V}

Input: depth K Input: weight matrices Wk, ∀k ∈ {1, . . . , K} Input: neighborhood function N : v → 2V Output: Vector representations zv for all v ∈ V h0 v ← xv, ∀v ∈ V for k = 1 . . . K do for v ∈ V do hk N(v) ← AGGREGATEk({hk−1 u , ∀u ∈ N(v)}) hk v ← σ(Wk · CONCAT(hk−1 v , hk N(v) )) end hk v ← hk v/||hk v||2, ∀v ∈ V end zv ← hK v , ∀v ∈ V 6
10. ### Neighborhood Function N(v) is deﬁned as a ﬁxed-size, uniform draw

from the set {u ∈ V : (u, v) ∈ E}, and selects diﬀerent uniform samples at each iteration, k. Using this sampling, the per-batch space and time complexity is ﬁxed at O( ∏K i=1 Si) (in this case, K = 2 and S1 · S2 ≤ 500); otherwise O(|V|). 7
11. ### Aggregator In order to train and apply the model to

arbitrarily ordered node neighborhood feature sets, an aggregator function would be symmetric and trainable and high representational capacity. In this paper, 3 aggregators were proposed. • Mean aggregator • LSTM aggregator • Pooling aggregator 8
12. ### Mean Aggregator Mean aggregator simply takes the elementwise mean of

the vector. It is nearly equivalent to the convolutional propagation rule used in the transductive GCN. (Kipf+) Inductive variant of the GCN can be derived by replacing aggregator and concat operator with the following. hk v ← σ(W · MEAN({hk−1 v } ∪ {hk−1 u , ∀u ∈ N(v)} It does not perform the concatenation operation which is viewed as skip connection. 9
13. ### LSTM Aggregator LSTM aggregator has the advantage of larger expressice

capability. However, this aggregator is NOT permutation invariant because LSTM is not inherently symmetric. It is designed for SEQUENTIAL data and NOT unordered set. 10
14. ### Pooling aggregator AGGREGATEk = max({σ(Wpool hk ui + b), ∀ui

∈ N(v)}) Pooling aggregator is symmetric and trainable. MLP can be thought of as a set of functions that compute features for each of the node representations in the neighbor set. By applying the max-pooling operator, the model eﬀectively captures diﬀerent aspects of the neighborhood set. 11
15. ### Loss Function - Unsupervised Graph-based loss function JG(zu) = −

log(σ(z⊤ u zv)) − Q · Evn∼Pn(v) log(σ(−z⊤ u zvn )) This encourages nearby nodes to have similar representations and enforces that the representations of disparate nodes are highly distinct. Reference Distributed Representations of Words and Phrases and their Compositionaliy, Tomas Mikolov+, 2013 https://arxiv.org/abs/1310.4546 12

17. ### Datasets - Tasks • WoS Citation dataset Predict paper subject

categories. • Reddit dataset Predict which community diﬀerent Reddit posts belong to. These tasks are classifying nodes in evolving information graphs. It is especially relevant to high-throughput production systems, which constantly encounters unseen data. 13
18. ### Datasets - Tasks • PPI dataset Classsify protein cellular functions

from gene ontology in various PPI graphs. This task is generalizing across graphs, which requires learning about node roles rather than community structure. 14
19. ### Result GraphSAGE outperforms all the baselines by a signiﬁcant margin,

and the trainable, neural network aggregators provide signiﬁcant gains compared to the GCN. 15
20. ### Result For GraphSAGE, K = 2 provided a consistent boost

in accuracy around 10-15%, on average, compared to K = 1. However, K > 2 gave marginal returns in performance (0-5%) while increasing the runtime depending on the neighborhood sample size. 16
21. ### Conclusion GraphSAGE allows embeddings to be eﬃciently generated for unseen

nodes. It eﬀectively trades oﬀ performance and runtime by sampling node neighborhoods. 17