“The development was done in Javascript” “The development was done in Python” “The development of the Python population in the Amazon rainforest” • Task: Vector Representation [0, 1, 2, 3, 4, 5] [0, 1, 2, 3, 4, 6] [0, 1, 7, 0, 6, 8, 4, 0, 9, 10] • Limitations • Not scalable
• Word2Vec is a deep learning model for better and dense word representations • The lazy fox jumped over the moon
• Focus word: fox
• Context words: lazy, jumped • The representations of the words are simple dense vectors Example: apple = [1.286, -3.467, 0.1375 …. 1.352] • These vectors also enable linear relationship between words: Example: King - Man + Woman = Queen • Helps in various NLP tasks and operations like:
Training • Nodes and edges are partitioned to fit into memory of the servers in a distributed system • Nodes are divided into P partitions (assuming 2 partitions can fit into memory at once)
• Edges are divided into P^2 buckets
• The training of bucket (i, j) require the embeddings of partitions i and j to be in memory
Training • No access to a distributed system cluster? No problem • BigGraph leverages Pytorch to distribute training across cores inside your local machine • This leverages the power of multiprocessing to run the training workload