Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get the text similarity you need with word embeddings

Get the text similarity you need with word embeddings

A 5 minute talk at PyData London on 7 Feb.

Lev Konstantinovskiy

February 08, 2017

More Decks by Lev Konstantinovskiy

Other Decks in Technology


  1. Get the word similarity you need Lev Konstantinovskiy Community Manager

    at Gensim @teagermylk http://rare-technologies.com/
  2. Credits Parul Sethi Undergraduate student University of Delhi, India RaReTech

    Incubator program Added WordRank to Gensim http://rare-technologies.com/incubator/
  3. Two Different Business Problems 1) What words are in the

    topic of “Darcy”? 2) What are the Named Entities in the text?
  4. How to get the similarity you need My similar words

    must be Associated Interchangeable I want to describes the word’s Topic Function I want to Know what doc is about Recognize names Then I should run Wordrank (even on small corpus, 1m words) or Word2vec skipgram big window needs large corpus >5m words Word2vec skipgram small window or FastText or VarEmbed