Get the text similarity you need with word embe...

Lev Konstantinovskiy

February 08, 2017

430

Get the text similarity you need with word embeddings

A 5 minute talk at PyData London on 7 Feb.

Lev Konstantinovskiy

February 08, 2017

Tweet

More Decks by Lev Konstantinovskiy

See All by Lev Konstantinovskiy

Building Professional Voice AI with Vocode - PyData DE 2024

0

180

Prompt Engineering 101: Beginner intro to LangChain, the shovel of our ChatGPT gold rush.

0

530

"Ensemble Programming with Pydantic" at PyCon PyData Berlin 2023 Gregor Riegler, Lev Konstantinovskiy

0

120

Sentence Embeddings for Automated Factchecking PyData London 2018

3

820

PyCon Russia 2017 - Тематическое моделирование для людей

0

330

"How to get the similarity you need with next gen of word embeddings" PyData Berlin 2017

1

430

Next gen of word embeddings London 45 mins

5

780

Next gen of word embeddings Rio 30 mins

0

340

Next generation of word embeddings

0

310

Other Decks in Technology

See All in Technology

Recoil脱却の現状と挑戦

3

460

クマ×共生 HACKATHON - 熊対策を『特別な行動」から「生活の一部」に -

0

190

完璧を目指さない小さく始める信頼性向上

PRO

0

110

経理出身PdMがAIプロダクト開発を_ハンズオンで学んだ話.pdf

1

230

Microsoft Learn MCP/Fabric データエージェント/Fabric MCP/Copilot Studio-簡単・便利なAIエージェント作ってみた -"Building Simple and Powerful AI Agents with Microsoft Learn MCP, Fabric Data Agent, Fabric MCP, and Copilot Studio"-

reireireijinjin6

1

160

ecspressoの設計思想に至る道 / sekkeinight2025

12

2.1k

VLMサービスを用いた請求書データ化検証 / SaaSxML_Session_1

0

130

Wasmで社内ツールを作って配布しよう

0

150

AI人生苦節10年で会得したAIがやること_人間がやること.pdf

1

210

Snowflake のアーキテクチャは本当に筋がよかったのか / Data Engineering Study #30

0

280

ビジネス文書に特化した基盤モデル開発 / SaaSxML_Session_2

0

140

私とAWSとの関わりの歩み～意志あるところに道は開けるかも？～

1

130

Featured

See All Featured

The Invisible Side of Design

301

51k

Code Review Best Practice

69

19k

Chrome DevTools: State of the Union 2024 - Debugging React & Beyond

7

760

How STYLIGHT went responsive

100

5.7k

Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure

47

9.6k

Raft: Consensus for Rubyists

140

7k

Balancing Empowerment & Direction

1

510

Build The Right Thing And Hit Your Dates

37

2.8k

[RailsConf 2023 Opening Keynote] The Magic of Rails

29

9.6k

The Straight Up "How To Draw Better" Workshop

235

140k

10 Git Anti Patterns You Should be Aware of

PRO

656

60k

461

140k

Transcript

Get the word similarity you need Lev Konstantinovskiy Community Manager
at Gensim @teagermylk http://rare-technologies.com/
Streaming We turn NLP papers into industrial Python code.
Credits Parul Sethi Undergraduate student University of Delhi, India RaReTech
Incubator program Added WordRank to Gensim http://rare-technologies.com/incubator/
Business Problems
Business Problems “What does Elizabeth think about Mr Darcy?” “Male
characters in Pride and Prejudice?”
Two Different Business Problems 1) What words are in the
topic of “Darcy”? 2) What are the Named Entities in the text?
P&P is only 120k words
Closest word to “king”? Trained on Wikipedia 17m words Attribute
Interchangeable Both
Tensorflow has awesome viz!
How to get the similarity you need My similar words
must be Associated Interchangeable I want to describes the word’s Topic Function I want to Know what doc is about Recognize names Then I should run Wordrank (even on small corpus, 1m words) or Word2vec skipgram big window needs large corpus >5m words Word2vec skipgram small window or FastText or VarEmbed
Rare and Frequent words are incomprehensible
Thanks! Lev Konstantinovskiy github.com/tmylk @teagermylk Gensim T-shirt question: How many
words are in Pride and Prejudice?