75

# Training and leveraging graph embeddings at scale

Presented at F8 meetup, Pune

18 May, 2019 May 18, 2019

## Transcript

1. Training and leveraging
graph embeddings at
scale
Jalem Raj Rohit

Data Science at GEP

2. Embeddings: The general idea

• Consider the following sentences:

“The development was done in Javascript”
“The development was done in Python”
“The development of the Python population in the Amazon rainforest”
[0, 1, 2, 3, 4, 5]
[0, 1, 2, 3, 4, 6]
[0, 1, 7, 0, 6, 8, 4, 0, 9, 10]
• Limitations
• Not scalable

• Cannot model context

3. Word2Vec

• Word2Vec is a deep learning model for better and dense word
representations
• The lazy fox jumped over the moon

• Focus word: fox

• Context words: lazy, jumped
• The representations of the words are simple dense vectors
Example: apple = [1.286, -3.467, 0.1375 …. 1.352]
• These vectors also enable linear relationship between words:
Example: King - Man + Woman = Queen
• Helps in various NLP tasks and operations like:

• classiﬁcation

• clustering

4. Node2Vec

• Word2Vec, but for graphs
• Each node is a represented as a dense vector

• Training Process:
• Positive examples: A -> B, A -> C

• Negative Examples: B -> C

5. Node2Vec

• Objective: Maximize the loss between the positive and negative
examples
• A large enough graph’s embeddings look similar to this:

6. Practical Applications
• Recommender Systems
• Modelling and understanding social graphs

7. Pytorch BigGraph
• A distributed system for learning large graph embeddings

• Challenges with training large graphs for embeddings:
• Memory and computational constraints

• Embedding a 2-billion node graph with 128 ﬂoat parameters per
node would require 1 TB of parameters

• Impossible to hold this in memory, or even try to compute the
embeddings of this graph

8. Training
• Nodes and edges are partitioned to ﬁt into memory of the servers in a distributed
system
• Nodes are divided into P partitions (assuming 2 partitions can ﬁt into memory at
once)

• Edges are divided into P^2 buckets

• The training of bucket (i, j) require the embeddings of partitions i and j to be in
memory

9. Training