Get the text similarity you need with word embeddings

Get the text similarity you need with word embeddings

A 5 minute talk at PyData London on 7 Feb.

39368910dbd6371b507e0b2113dcf4fe?s=128

Lev Konstantinovskiy

February 08, 2017
Tweet

Transcript

  1. Get the word similarity you need Lev Konstantinovskiy Community Manager

    at Gensim @teagermylk http://rare-technologies.com/
  2. Streaming We turn NLP papers into industrial Python code.

  3. Credits Parul Sethi Undergraduate student University of Delhi, India RaReTech

    Incubator program Added WordRank to Gensim http://rare-technologies.com/incubator/
  4. Business Problems

  5. Business Problems “What does Elizabeth think about Mr Darcy?” “Male

    characters in Pride and Prejudice?”
  6. Two Different Business Problems 1) What words are in the

    topic of “Darcy”? 2) What are the Named Entities in the text?
  7. P&P is only 120k words

  8. Closest word to “king”? Trained on Wikipedia 17m words Attribute

    Interchangeable Both
  9. Tensorflow has awesome viz!

  10. How to get the similarity you need My similar words

    must be Associated Interchangeable I want to describes the word’s Topic Function I want to Know what doc is about Recognize names Then I should run Wordrank (even on small corpus, 1m words) or Word2vec skipgram big window needs large corpus >5m words Word2vec skipgram small window or FastText or VarEmbed
  11. Rare and Frequent words are incomprehensible

  12. Thanks! Lev Konstantinovskiy github.com/tmylk @teagermylk Gensim T-shirt question: How many

    words are in Pride and Prejudice?