Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Next gen of word embeddings London 45 mins

Next gen of word embeddings London 45 mins

How to get interchangeable or related words
Theory behind word2vec, fasttext, wordrank

Lev Konstantinovskiy

May 07, 2017
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Programming

Transcript

  1. Next generation of word embeddings
    Lev Konstantinovskiy
    Community Manager at Gensim
    @teagermylk
    http://rare-technologies.com/

    View full-size slide

  2. Streaming
    Word2vec and Topic Modelling in Python

    View full-size slide

  3. Gensim Open Source Package
    ● Numerous Industry Adopters
    ● 170 Code contributors, 4000 Github stars
    ● 200 Messages per month on the mailing list
    ● 150 People chatting on Gitter
    ● 500 Academic citations

    View full-size slide

  4. Credits
    Parul Sethi
    Undergraduate student
    University of Delhi, India
    RaReTech Incubator program
    Added WordRank to Gensim
    http://rare-technologies.com/incubator/

    View full-size slide

  5. Which words are similar?

    View full-size slide

  6. “Coast” vs “Shore”
    ????

    View full-size slide

  7. “Coast” vs “Shore”
    similar

    View full-size slide

  8. “Clothes” vs “Closet”
    ????

    View full-size slide

  9. “Clothes” vs “Closet”
    similar
    in the sense “related”

    View full-size slide

  10. “Clothes” vs “Closet”
    different!
    Why?

    View full-size slide

  11. “Clothes” vs “Closet”
    different!
    not “interchangeable”

    View full-size slide

  12. Business Problems

    View full-size slide

  13. Business Problems
    “What does Elizabeth think about Mr Darcy?”
    “Male characters in Pride and Prejudice?”

    View full-size slide

  14. Two Different
    Business Problems
    1) What words are in the topic of “Darcy”?
    2) What are the Named Entities in the text?

    View full-size slide

  15. Pride and Prejudice models

    View full-size slide

  16. Closest word to “king”?
    Trained on Wikipedia 17m words
    Attribute Interchangeable Both

    View full-size slide

  17. Theory of word embeddings

    View full-size slide

  18. What is a word embedding?
    ‘Word embedding’ = ‘word vectors’ = ‘distributed representations’
    It is a dense representation of words in a low-dimensional vector space.
    One-hot representation:
    king = [1 0 0 0.. 0 0 0 0 0]
    queen = [0 1 0 0 0 0 0 0 0]
    book = [0 0 1 0 0 0 0 0 0]
    king = [0.9457, 0.5774, 0.2224]
    Distributed representation:

    View full-size slide

  19. Awesome 3D viz in TensorBoard

    View full-size slide

  20. How to come up with the
    vectors?

    View full-size slide

  21. Co-occurence matrix
    ... and the cute kitten purred and then ...
    ... the cute furry cat purred and miaowed ...
    ... that the cute kitten miaowed and she ...
    ... the loud furry dog ran and bit ...
    kitten context words: [cute, purred, miaowed].
    cat context words: [cute, furry, miaowed].
    dog context words: [loud, furry, ran, bit].
    Credit: Ed Grefenstette https://github.com/oxford-cs-deepnlp-2017/

    View full-size slide

  22. Co-occurence matrix
    kitten context words: [cute, purred, miaowed].
    cat context words: [cute, furry, miaowed].
    dog context words: [loud, furry, ran, bit].
    Credit: Ed Grefenstette https://github.com/oxford-cs-deepnlp-2017/
    cute furry bit
    kitten 2 0 0
    cat 1 1 0
    dog 0 1 1
    X =

    View full-size slide

  23. Dimensionality reduction
    More precisely, in word2vec u*v approximates PMI(X) - log n, in Glove log(X)
    cute furry bit …
    kitten 2 0 0 ...
    cat 1 1 0 ...
    dog 0 1 1 ...
    ... ... ... ... ...
    X = = U * V
    Dims: vocab x vocab = (vocab x small) * (small x vocab)
    First row of U is the word
    embedding of “kitten”

    View full-size slide

  24. Dimensionality reduction
    More precisely u*v approximates PMI(X) - log n, where n is the negative sampling parameter
    Co-occurence
    score in
    word2vec
    =
    U
    word
    * V
    context
    Dims: count = (small vector) * (small vector)

    View full-size slide

  25. FastText: word is a sum of
    its parts
    Co-occurence
    score in
    FastText
    = U
    subword
    *
    V
    context
    over all subwords of
    w
    going = go + oi + in + ng + goi + oin + ing

    View full-size slide

  26. FastText better than word2vec
    because morphology
    Credit: Takahiro Kubo http://qiita.com/icoxfog417/items/42a95b279c0b7ad26589
    Slower because many more vectors to consider!

    View full-size slide

  27. FastText is better at interchangeable
    words
    Related
    words
    accuracy
    Interchangeable
    words accuracy
    Training
    time

    View full-size slide

  28. Same API as word2vec.
    Out-of-vocabulary words can also be used, provided they have
    at least one character n-gram present in the training data.
    FastText Gensim Wrapper

    View full-size slide

  29. FastText slower than Python word2vec.
    Even without n-grams...
    But Python is slower than C, right? :)

    View full-size slide

  30. - Word2vec
    - FastText
    - WordRank
    - Factorise the co-occurence matrix: SVD/LSI
    - GLoVe
    - EigenWords
    - VarEmbed
    Many ways to get a vector for a word

    View full-size slide

  31. WordRank is a Ranking Algorithm
    Word2vec
    Input: Context Cute
    Output: Word Kitten
    Classification problem
    WordRank
    Input: Context Cute
    Output: Ranking
    1. Kitten
    2. Cat
    3. Dog
    Robust: Mistake at the top of the rank
    costs more than mistake at the bottom.

    View full-size slide

  32. Gensim WordRank API is same as word2vec,
    FastText

    View full-size slide

  33. Algorithm
    Train
    time
    (sec)
    Passes
    through
    corpus
    Related
    accuracy
    Related
    (WS-353)
    Interchange
    able
    accuracy
    Interchange
    able
    (SimLex-999
    )
    Word2Vec 18 6 4.69 0.37 2.77 0.17
    FastText 50 6 6.57 0.36 36.95 0.13
    WordRank 4hrs 91 15.26 0.39 4.23 0.09
    Evaluation on 1 mln words corpus

    View full-size slide

  34. Algorithm
    Train
    time
    (sec)
    Passes
    through
    corpus
    Related
    accuracy
    Related
    (WS-353)
    Interchang
    eable
    accuracy
    Interchan
    geable
    (SimLex-9
    99)
    Word2Vec 402 6 40.34 41.48 0.69 41.1
    FastText 942 6 25.75 57.33 0.67 45.2
    WordRank
    ~42
    hours
    91 54.52 39.83 0.71 44.7
    Evaluation on 17 mln words corpus

    View full-size slide

  35. Algorithm Related accuracy Related (WS-353)
    Interchangeable
    accuracy
    Interchangeable
    (SimLex-999)
    Word2Vec
    Window = 2
    21 0.63 36 0.33
    Word2Vec
    Window = 15
    40
    0.69 40 0.31
    Word2Vec
    Window = 50
    42
    0.68 34 0.26
    Big window means less “interchangeable”

    View full-size slide

  36. How to get the similarity you need
    My similar words must
    be
    Related Interchangeable
    I want to describe the
    word’s
    Topic Function
    I want to Know what doc is about Recognize names
    Then I should run Wordrank (even on small
    corpus, 1m words)
    or
    Word2vec skipgram big
    window needs large corpus
    >5m words
    Word2vec skipgram small
    window
    or
    FastText
    or
    VarEmbed

    View full-size slide

  37. How to get the similarity you need
    “Similar
    words” are
    Related Interchangeable
    Got a
    million
    words? FastText
    Word2vec
    small context
    VarEmbed
    WordRank
    Word2vec
    large context
    Yes No

    View full-size slide

  38. Thanks for listening!
    Lev Konstantinovskiy
    github.com/tmylk
    @teagermylk
    Coding sprint today 13:30 - 15:30
    “Learn NLP by running Gensim tutorials”
    “Turn your FullFact project into a Gensim
    tutorial”
    Fix an easy bug on github.

    View full-size slide