Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Next gen of word embeddings Rio 30 mins

Next gen of word embeddings Rio 30 mins

39368910dbd6371b507e0b2113dcf4fe?s=128

Lev Konstantinovskiy

April 14, 2017
Tweet

Transcript

  1. Next generation of word embeddings Lev Konstantinovskiy Community Manager at

    Gensim @teagermylk http://rare-technologies.com/
  2. Streaming Word2vec and Topic Modelling in Python

  3. Gensim Open Source Package • Numerous Industry Adopters • 170

    Code contributors, 4000 Github stars • 200 Messages per month on the mailing list • 150 People chatting on Gitter • 500 Academic citations
  4. Credits Parul Sethi Undergraduate student University of Delhi, India RaReTech

    Incubator program Added WordRank to Gensim http://rare-technologies.com/incubator/
  5. None
  6. Part 1. Different word embeddings Part 2. Theory of word2vec

  7. Business Problems

  8. Business Problems “What is Dona Flor like?” “List all female

    characters in “Dona Flor e seus dois maridos”?”
  9. Two Different Business Problems 1) What words are in the

    topic of “Dona Flor”? 2) What are the Named Entities in the text?
  10. DFDM is only 170k words so results are so-so

  11. Teodoro

  12. Pride & Prejudice It is a case universally acknowledged, that

    a single woman in defiance of a good sense, must be in use of a son.
  13. Pride & Prejudice By Lynn Cherny http://www.ghostweather.com/files/word2vecpride/ It is a

    case universally acknowledged, that a single woman in defiance of a good sense, must be in use of a son. It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
  14. Closest word to “king”? Trained on Wikipedia 17m words Attribute

    Interchangeable Both
  15. How to get the similarity you need My similar words

    must be Associated Interchangeable I want to describe the word’s Topic Function I want to Know what doc is about Recognize names Then I should run Wordrank (even on small corpus, 1m words) or Word2vec skipgram big window needs large corpus >5m words Word2vec skipgram small window or FastText or VarEmbed
  16. Part 2. Theory of Word2vec

  17. Word2vec is a big victory of unsupervised learning in industry.

    [GANs will get there in 3 years too :)] Google ran word2vec on 100billion of unlabelled words. Then shared their trained model. Thanks to Google for cutting our training time to zero!. :)
  18. The famous Google News model. Google ran word2vec on 100billion

    of unlabelled words. Then shared their trained model. Thanks to Google for cutting our training time to zero!. :)
  19. Word embeddings can be used for: - automated text tagging

    - recommendation engines - synonyms and search query expansion - machine translation - plain feature engineering
  20. What is a word embedding? ‘Word embedding’ = ‘word vectors’

    = ‘distributed representations’ It is a dense representation of words in a low-dimensional vector space. One-hot representation: king = [1 0 0 0.. 0 0 0 0 0] queen = [0 1 0 0 0 0 0 0 0] book = [0 0 1 0 0 0 0 0 0] king = [0.9457, 0.5774, 0.2224] Distributed representation:
  21. None
  22. Many other ways to get a vector for a word:

    - Factorise the co-occurence matrix (SVD/LSA) - GLoVe - EigenWords - WordRank - VarEmbed - FastText Disclaimer Word2vec is not the only word embedding in the world
  23. Use the “Distributional hypothesis”: “You shall know a word by

    the company it keeps” -J. R. Firth 1957 Richard Socher’s NLP course http://cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf How to come up with an embeddig?
  24. Usual procedure 1.Initialise random vectors 2. Pick an objective function.

    3. Do gradient descent.
  25. For the theory, take Richard Sochers’s CS224D free online class

    Richard Socher’s NLP course http://cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf
  26. “The fox jumped over the lazy dog” Maximize the likelihood

    of seeing the context words given the word over. P(the|over) P(fox|over) P(jumped|over) P(the|over) P(lazy|over) P(dog|over) word2vec algorithm Used with permission from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
  27. Probability should depend on the word vectors. Used with permission

    from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec P(fox|over) P(v fox |v over )
  28. A twist: two vectors for every word Used with permission

    from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN
  29. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT = P(v THE |v OVER )
  30. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  31. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  32. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  33. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  34. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  35. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  36. Twist: two vectors for every word Used with permission from

    @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec Should depend on whether it’s the input or the output. P(v OUT |v IN ) “The fox jumped over the lazy dog” v IN v OUT
  37. How to define P(v OUT |v IN )? First, define

    similarity. How similar are two vectors? Just dot product for unit length vectors v OUT * v IN Used with permission from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
  38. Get a probability in [0,1] out of similarity in [-1,

    1] Normalization term over all out words Used with permission from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
  39. Word2vec is great! Vector arithmetic Slide from @chrisemoody http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec -

    -
  40. @datamusing Sudeep Das http://www.slideshare.net/SparkSummit/using-data-science-to-transform-opentable-into-delgado-das More directions

  41. Consistent directions Mikolov et al. Distributed Representations of Words and

    Phrases and their Compositionality 2013
  42. Explore word2vec yourself http://rare-technologies.com/word2vec-tutorial/#app

  43. Facebook’s FastText : word is a sum of its parts

    Credit: Takahiro Kubo http://qiita.com/icoxfog417/items/42a95b279c0b7ad26589 Better than word2vec! But slower… Download and play with Portuguese model.
  44. None
  45. Thanks for listening! Lev Konstantinovskiy github.com/tmylk @teagermylk Gensim T-shirt question:

    Please answer by raising your hand. How many words are in “Dona Flor e seus dois maridos”?