Training and leveraging graph embeddings at scale

Training and leveraging graph embeddings at scale Jalem Raj Rohit
Data Science at GEP

Embeddings: The general idea • Consider the following sentences:   
“The development was done in Javascript”  “The development was done in Python”  “The development of the Python population in the Amazon rainforest”  • Task: Vector Representation  [0, 1, 2, 3, 4, 5]  [0, 1, 2, 3, 4, 6]  [0, 1, 7, 0, 6, 8, 4, 0, 9, 10]  • Limitations • Not scalable • Cannot model context

Word2Vec • Word2Vec is a deep learning model for better
and dense word representations  • The lazy fox jumped over the moon • Focus word: fox • Context words: lazy, jumped  • The representations of the words are simple dense vectors  Example: apple = [1.286, -3.467, 0.1375 …. 1.352]  • These vectors also enable linear relationship between words:  Example: King - Man + Woman = Queen  • Helps in various NLP tasks and operations like: • classiﬁcation • clustering

Node2Vec • Word2Vec, but for graphs  • Each node is
a represented as a dense vector • Training Process: • Positive examples: A -> B, A -> C • Negative Examples: B -> C

Node2Vec • Objective: Maximize the loss between the positive and
negative examples  • A large enough graph’s embeddings look similar to this:

Practical Applications • Recommender Systems • Modelling and understanding social
graphs

Pytorch BigGraph • A distributed system for learning large graph
embeddings      • Challenges with training large graphs for embeddings:  • Memory and computational constraints • Embedding a 2-billion node graph with 128 ﬂoat parameters per node would require 1 TB of parameters • Impossible to hold this in memory, or even try to compute the embeddings of this graph

Training • Nodes and edges are partitioned to ﬁt into
memory of the servers in a distributed system  • Nodes are divided into P partitions (assuming 2 partitions can ﬁt into memory at once) • Edges are divided into P^2 buckets    • The training of bucket (i, j) require the embeddings of partitions i and j to be in memory

Training • No access to a distributed system cluster? No
problem  • BigGraph leverages Pytorch to distribute training across cores inside your local machine  • This leverages the power of multiprocessing to run the training workload

Embedding visualisation •

Sources • Code: https://github.com/facebookresearch/PyTorch-BigGraph  • Paper: https://research.fb.com/publications/pytorch-biggraph-a-large- scale-graph-embedding-system/  • Documentation:
https://torchbiggraph.readthedocs.io/en/latest/ 

Training and leveraging graph embeddings at scale

Training and leveraging graph embeddings at scale

Jalem Raj Rohit

More Decks by Jalem Raj Rohit

Other Decks in Programming

Featured

Transcript

Training and leveraging graph embeddings at scale Jalem Raj Rohit

Embeddings: The general idea • Consider the following sentences:

Word2Vec • Word2Vec is a deep learning model for better

Node2Vec • Word2Vec, but for graphs  • Each node is

Node2Vec • Objective: Maximize the loss between the positive and

Practical Applications • Recommender Systems • Modelling and understanding social

Pytorch BigGraph • A distributed system for learning large graph

Training • Nodes and edges are partitioned to ﬁt into

Training • No access to a distributed system cluster? No

Embedding visualisation •

Sources • Code: https://github.com/facebookresearch/PyTorch-BigGraph  • Paper: https://research.fb.com/publications/pytorch-biggraph-a-large- scale-graph-embedding-system/  • Documentation: