Journal Club GraphSAGE

Journal Club GraphSAGE

Resume for journal club in lab.

0d58af97d4ecd676557a49e3a132d262?s=128

Tomoya Matsumoto

February 06, 2019
Tweet

Transcript

  1. 1.

    Inductive Representation Learning on Large Graphs William L. Hamilton+ Tomoya

    Matsumoto Feb. 6, 2019 BioInformation Engineering Lab. M1 1
  2. 3.

    Overview • GraphSAGE extends GCNs to the task of inductive

    unsupervised learning. • This Network generalizes the GCN approach to use trainable aggregation functions (beyond simple convolutions). • It can classify the category of unseen nodes in evolving information graphs and generalize to completely unseen graphs. 2
  3. 5.
  4. 6.

    Node Classification • Transductive (Semi-supervised) Predict unlabeled data. Training data

    and test data are same. • Inductive Predict unseened data. Training data and test data are different. 4
  5. 9.

    Algorithm Input: G(V, E) Input: input features{xv, ∀v ∈ V}

    Input: depth K Input: weight matrices Wk, ∀k ∈ {1, . . . , K} Input: neighborhood function N : v → 2V Output: Vector representations zv for all v ∈ V h0 v ← xv, ∀v ∈ V for k = 1 . . . K do for v ∈ V do hk N(v) ← AGGREGATEk({hk−1 u , ∀u ∈ N(v)}) hk v ← σ(Wk · CONCAT(hk−1 v , hk N(v) )) end hk v ← hk v/||hk v||2, ∀v ∈ V end zv ← hK v , ∀v ∈ V 6
  6. 10.

    Neighborhood Function N(v) is defined as a fixed-size, uniform draw

    from the set {u ∈ V : (u, v) ∈ E}, and selects different uniform samples at each iteration, k. Using this sampling, the per-batch space and time complexity is fixed at O( ∏K i=1 Si) (in this case, K = 2 and S1 · S2 ≤ 500); otherwise O(|V|). 7
  7. 11.

    Aggregator In order to train and apply the model to

    arbitrarily ordered node neighborhood feature sets, an aggregator function would be symmetric and trainable and high representational capacity. In this paper, 3 aggregators were proposed. • Mean aggregator • LSTM aggregator • Pooling aggregator 8
  8. 12.

    Mean Aggregator Mean aggregator simply takes the elementwise mean of

    the vector. It is nearly equivalent to the convolutional propagation rule used in the transductive GCN. (Kipf+) Inductive variant of the GCN can be derived by replacing aggregator and concat operator with the following. hk v ← σ(W · MEAN({hk−1 v } ∪ {hk−1 u , ∀u ∈ N(v)} It does not perform the concatenation operation which is viewed as skip connection. 9
  9. 13.

    LSTM Aggregator LSTM aggregator has the advantage of larger expressice

    capability. However, this aggregator is NOT permutation invariant because LSTM is not inherently symmetric. It is designed for SEQUENTIAL data and NOT unordered set. 10
  10. 14.

    Pooling aggregator AGGREGATEk = max({σ(Wpool hk ui + b), ∀ui

    ∈ N(v)}) Pooling aggregator is symmetric and trainable. MLP can be thought of as a set of functions that compute features for each of the node representations in the neighbor set. By applying the max-pooling operator, the model effectively captures different aspects of the neighborhood set. 11
  11. 15.

    Loss Function - Unsupervised Graph-based loss function JG(zu) = −

    log(σ(z⊤ u zv)) − Q · Evn∼Pn(v) log(σ(−z⊤ u zvn )) This encourages nearby nodes to have similar representations and enforces that the representations of disparate nodes are highly distinct. Reference Distributed Representations of Words and Phrases and their Compositionaliy, Tomas Mikolov+, 2013 https://arxiv.org/abs/1310.4546 12
  12. 17.

    Datasets - Tasks • WoS Citation dataset Predict paper subject

    categories. • Reddit dataset Predict which community different Reddit posts belong to. These tasks are classifying nodes in evolving information graphs. It is especially relevant to high-throughput production systems, which constantly encounters unseen data. 13
  13. 18.

    Datasets - Tasks • PPI dataset Classsify protein cellular functions

    from gene ontology in various PPI graphs. This task is generalizing across graphs, which requires learning about node roles rather than community structure. 14
  14. 19.

    Result GraphSAGE outperforms all the baselines by a significant margin,

    and the trainable, neural network aggregators provide significant gains compared to the GCN. 15
  15. 20.

    Result For GraphSAGE, K = 2 provided a consistent boost

    in accuracy around 10-15%, on average, compared to K = 1. However, K > 2 gave marginal returns in performance (0-5%) while increasing the runtime depending on the neighborhood sample size. 16
  16. 21.

    Conclusion GraphSAGE allows embeddings to be efficiently generated for unseen

    nodes. It effectively trades off performance and runtime by sampling node neighborhoods. 17