Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Next gen of word embeddings Rio 30 mins

Next gen of word embeddings Rio 30 mins

Lev Konstantinovskiy

April 14, 2017
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Programming

Transcript

  1. Next generation of word embeddings
    Lev Konstantinovskiy
    Community Manager at Gensim
    @teagermylk
    http://rare-technologies.com/

    View Slide

  2. Streaming
    Word2vec and Topic Modelling in Python

    View Slide

  3. Gensim Open Source Package
    ● Numerous Industry Adopters
    ● 170 Code contributors, 4000 Github stars
    ● 200 Messages per month on the mailing list
    ● 150 People chatting on Gitter
    ● 500 Academic citations

    View Slide

  4. Credits
    Parul Sethi
    Undergraduate student
    University of Delhi, India
    RaReTech Incubator program
    Added WordRank to Gensim
    http://rare-technologies.com/incubator/

    View Slide

  5. View Slide

  6. Part 1. Different word embeddings
    Part 2. Theory of word2vec

    View Slide

  7. Business Problems

    View Slide

  8. Business Problems
    “What is Dona Flor like?”
    “List all female characters in
    “Dona Flor e seus dois maridos”?”

    View Slide

  9. Two Different
    Business Problems
    1) What words are in the topic of “Dona
    Flor”?
    2) What are the Named Entities in the text?

    View Slide

  10. DFDM is only 170k words so
    results are so-so

    View Slide

  11. Teodoro

    View Slide

  12. Pride & Prejudice
    It is a case universally acknowledged, that
    a single woman in defiance of a good
    sense, must be in use of a son.

    View Slide

  13. Pride & Prejudice
    By Lynn Cherny
    http://www.ghostweather.com/files/word2vecpride/
    It is a case universally acknowledged, that a single woman in defiance
    of a good sense, must be in use of a son.
    It is a truth universally acknowledged, that a single man in possession
    of a good fortune, must be in want of a wife.

    View Slide

  14. Closest word to “king”?
    Trained on Wikipedia 17m words
    Attribute Interchangeable Both

    View Slide

  15. How to get the similarity you need
    My similar words must
    be
    Associated Interchangeable
    I want to describe the
    word’s
    Topic Function
    I want to Know what doc is about Recognize names
    Then I should run Wordrank (even on small
    corpus, 1m words)
    or
    Word2vec skipgram big
    window needs large corpus
    >5m words
    Word2vec skipgram small
    window
    or
    FastText
    or
    VarEmbed

    View Slide

  16. Part 2. Theory of Word2vec

    View Slide

  17. Word2vec is a big victory of
    unsupervised learning in industry.
    [GANs will get there in 3 years too :)]
    Google ran word2vec on 100billion of unlabelled words.
    Then shared their trained model.
    Thanks to Google for cutting our training time to zero!. :)

    View Slide

  18. The famous Google News model.
    Google ran word2vec on 100billion of unlabelled words.
    Then shared their trained model.
    Thanks to Google for cutting our training time to zero!. :)

    View Slide

  19. Word embeddings can be used for:
    - automated text tagging
    - recommendation engines
    - synonyms and search query expansion
    - machine translation
    - plain feature engineering

    View Slide

  20. What is a word embedding?
    ‘Word embedding’ = ‘word vectors’ = ‘distributed representations’
    It is a dense representation of words in a low-dimensional vector space.
    One-hot representation:
    king = [1 0 0 0.. 0 0 0 0 0]
    queen = [0 1 0 0 0 0 0 0 0]
    book = [0 0 1 0 0 0 0 0 0]
    king = [0.9457, 0.5774, 0.2224]
    Distributed representation:

    View Slide

  21. View Slide

  22. Many other ways to get a vector for a word:
    - Factorise the co-occurence matrix (SVD/LSA)
    - GLoVe
    - EigenWords
    - WordRank
    - VarEmbed
    - FastText
    Disclaimer
    Word2vec is not the only word embedding in the world

    View Slide

  23. Use the “Distributional hypothesis”:
    “You shall know a word by the company it keeps”
    -J. R. Firth 1957
    Richard Socher’s NLP course http://cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf
    How to come up with an embeddig?

    View Slide

  24. Usual procedure
    1.Initialise random vectors
    2. Pick an objective function.
    3. Do gradient descent.

    View Slide

  25. For the theory, take Richard Sochers’s CS224D free online class
    Richard Socher’s NLP course http://cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf

    View Slide

  26. “The fox jumped over the lazy dog”
    Maximize the likelihood of seeing the context words given the word over.
    P(the|over)
    P(fox|over)
    P(jumped|over)
    P(the|over)
    P(lazy|over)
    P(dog|over)
    word2vec algorithm
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec

    View Slide

  27. Probability should depend on the word vectors.
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    P(fox|over)
    P(v
    fox
    |v
    over
    )

    View Slide

  28. A twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN

    View Slide

  29. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT
    = P(v
    THE
    |v
    OVER
    )

    View Slide

  30. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  31. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  32. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  33. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  34. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  35. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  36. Twist: two vectors for every word
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    Should depend on whether it’s the input or the output.
    P(v
    OUT
    |v
    IN
    )
    “The fox jumped over the lazy dog”
    v
    IN
    v
    OUT

    View Slide

  37. How to define P(v
    OUT
    |v
    IN
    )? First, define similarity.
    How similar are two vectors?
    Just dot product for unit length vectors
    v
    OUT
    * v
    IN
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec

    View Slide

  38. Get a probability in [0,1] out of similarity in [-1, 1]
    Normalization term over all out words
    Used with permission from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec

    View Slide

  39. Word2vec is great!
    Vector arithmetic
    Slide from @chrisemoody
    http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec
    -
    -

    View Slide

  40. @datamusing Sudeep Das
    http://www.slideshare.net/SparkSummit/using-data-science-to-transform-opentable-into-delgado-das
    More directions

    View Slide

  41. Consistent directions
    Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality 2013

    View Slide

  42. Explore word2vec yourself
    http://rare-technologies.com/word2vec-tutorial/#app

    View Slide

  43. Facebook’s FastText :
    word is a sum of its parts
    Credit: Takahiro Kubo http://qiita.com/icoxfog417/items/42a95b279c0b7ad26589
    Better than word2vec! But slower…
    Download and play with Portuguese model.

    View Slide

  44. View Slide

  45. Thanks for listening!
    Lev Konstantinovskiy
    github.com/tmylk
    @teagermylk
    Gensim T-shirt question:
    Please answer by raising your hand.
    How many words are in “Dona Flor e seus dois
    maridos”?

    View Slide