Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Training and leveraging graph embeddings at scale

Training and leveraging graph embeddings at scale

Presented at F8 meetup, Pune

18 May, 2019

Jalem Raj Rohit

May 18, 2019

More Decks by Jalem Raj Rohit

Other Decks in Programming


  1. Training and leveraging
    graph embeddings at
    Jalem Raj Rohit

    Data Science at GEP

    View Slide

  2. Embeddings: The general idea

    • Consider the following sentences:

    “The development was done in Javascript”

    “The development was done in Python”

    “The development of the Python population in the Amazon rainforest”

    • Task: Vector Representation

    [0, 1, 2, 3, 4, 5]

    [0, 1, 2, 3, 4, 6]

    [0, 1, 7, 0, 6, 8, 4, 0, 9, 10]

    • Limitations
    • Not scalable

    • Cannot model context

    View Slide

  3. Word2Vec

    • Word2Vec is a deep learning model for better and dense word

    • The lazy fox jumped over the moon

    • Focus word: fox

    • Context words: lazy, jumped

    • The representations of the words are simple dense vectors

    Example: apple = [1.286, -3.467, 0.1375 …. 1.352]

    • These vectors also enable linear relationship between words:

    Example: King - Man + Woman = Queen

    • Helps in various NLP tasks and operations like:

    • classification

    • clustering

    View Slide

  4. Node2Vec

    • Word2Vec, but for graphs

    • Each node is a represented as a dense vector

    • Training Process:
    • Positive examples: A -> B, A -> C

    • Negative Examples: B -> C

    View Slide

  5. Node2Vec

    • Objective: Maximize the loss between the positive and negative

    • A large enough graph’s embeddings look similar to this:

    View Slide

  6. Practical Applications
    • Recommender Systems
    • Modelling and understanding social graphs

    View Slide

  7. Pytorch BigGraph
    • A distributed system for learning large graph embeddings

    • Challenges with training large graphs for embeddings:

    • Memory and computational constraints

    • Embedding a 2-billion node graph with 128 float parameters per
    node would require 1 TB of parameters

    • Impossible to hold this in memory, or even try to compute the
    embeddings of this graph

    View Slide

  8. Training
    • Nodes and edges are partitioned to fit into memory of the servers in a distributed

    • Nodes are divided into P partitions (assuming 2 partitions can fit into memory at

    • Edges are divided into P^2 buckets

    • The training of bucket (i, j) require the embeddings of partitions i and j to be in

    View Slide

  9. Training
    • No access to a distributed system cluster? No problem

    • BigGraph leverages Pytorch to distribute training across cores inside
    your local machine

    • This leverages the power of multiprocessing to run the training workload

    View Slide

  10. Embedding visualisation

    View Slide

  11. Sources
    • Code: https://github.com/facebookresearch/PyTorch-BigGraph

    • Paper: https://research.fb.com/publications/pytorch-biggraph-a-large-

    • Documentation: https://torchbiggraph.readthedocs.io/en/latest/

    View Slide