Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Training and leveraging graph embeddings at scale

Training and leveraging graph embeddings at scale

Presented at F8 meetup, Pune

18 May, 2019

Jalem Raj Rohit

May 18, 2019

More Decks by Jalem Raj Rohit

Other Decks in Programming


  1. Embeddings: The general idea • Consider the following sentences:

    “The development was done in Javascript”
 “The development was done in Python”
 “The development of the Python population in the Amazon rainforest”
 • Task: Vector Representation
 [0, 1, 2, 3, 4, 5]
 [0, 1, 2, 3, 4, 6]
 [0, 1, 7, 0, 6, 8, 4, 0, 9, 10]
 • Limitations • Not scalable • Cannot model context
  2. Word2Vec • Word2Vec is a deep learning model for better

    and dense word representations
 • The lazy fox jumped over the moon • Focus word: fox • Context words: lazy, jumped
 • The representations of the words are simple dense vectors
 Example: apple = [1.286, -3.467, 0.1375 …. 1.352]
 • These vectors also enable linear relationship between words:
 Example: King - Man + Woman = Queen
 • Helps in various NLP tasks and operations like: • classification • clustering
  3. Node2Vec • Word2Vec, but for graphs
 • Each node is

    a represented as a dense vector • Training Process: • Positive examples: A -> B, A -> C • Negative Examples: B -> C
  4. Node2Vec • Objective: Maximize the loss between the positive and

    negative examples
 • A large enough graph’s embeddings look similar to this:
  5. Pytorch BigGraph • A distributed system for learning large graph

 • Challenges with training large graphs for embeddings:
 • Memory and computational constraints • Embedding a 2-billion node graph with 128 float parameters per node would require 1 TB of parameters • Impossible to hold this in memory, or even try to compute the embeddings of this graph
  6. Training • Nodes and edges are partitioned to fit into

    memory of the servers in a distributed system
 • Nodes are divided into P partitions (assuming 2 partitions can fit into memory at once) • Edges are divided into P^2 buckets
 • The training of bucket (i, j) require the embeddings of partitions i and j to be in memory
  7. Training • No access to a distributed system cluster? No

 • BigGraph leverages Pytorch to distribute training across cores inside your local machine
 • This leverages the power of multiprocessing to run the training workload