$30 off During Our Annual Pro Sale. View Details »

Get the text similarity you need with word embeddings

Get the text similarity you need with word embeddings

A 5 minute talk at PyData London on 7 Feb.

Lev Konstantinovskiy

February 08, 2017
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. Get the word similarity you need
    Lev Konstantinovskiy
    Community Manager at Gensim
    @teagermylk
    http://rare-technologies.com/

    View Slide

  2. Streaming
    We turn NLP papers into industrial Python code.

    View Slide

  3. Credits
    Parul Sethi
    Undergraduate student
    University of Delhi, India
    RaReTech Incubator program
    Added WordRank to Gensim
    http://rare-technologies.com/incubator/

    View Slide

  4. Business Problems

    View Slide

  5. Business Problems
    “What does Elizabeth think about Mr Darcy?”
    “Male characters in Pride and Prejudice?”

    View Slide

  6. Two Different
    Business Problems
    1) What words are in the topic of “Darcy”?
    2) What are the Named Entities in the text?

    View Slide

  7. P&P is only 120k words

    View Slide

  8. Closest word to “king”?
    Trained on Wikipedia 17m words
    Attribute Interchangeable Both

    View Slide

  9. Tensorflow has awesome viz!

    View Slide

  10. How to get the similarity you need
    My similar words must
    be
    Associated Interchangeable
    I want to describes the
    word’s
    Topic Function
    I want to Know what doc is about Recognize names
    Then I should run Wordrank (even on small
    corpus, 1m words)
    or
    Word2vec skipgram big
    window needs large corpus
    >5m words
    Word2vec skipgram small
    window
    or
    FastText
    or
    VarEmbed

    View Slide

  11. Rare and Frequent words are
    incomprehensible

    View Slide

  12. Thanks!
    Lev Konstantinovskiy
    github.com/tmylk
    @teagermylk
    Gensim T-shirt question:
    How many words are in
    Pride and Prejudice?

    View Slide