Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Filtering n-grams using Machine Learning
Search
vorushin
April 06, 2012
Programming
590
2
Share
Filtering n-grams using Machine Learning
My lightning talk from first Kiev AI/NLP group meeting.
vorushin
April 06, 2012
Other Decks in Programming
See All in Programming
Smarter Angular mit Transformers.js & Prompt API
christianliebel
PRO
1
100
Redox OS でのネームスペース管理と chroot の実現
isanethen
0
480
CS教育のDX AIによる育成の効率化
niftycorp
PRO
0
170
仕様漏れ実装漏れをなくすトレーサビリティAI基盤のご紹介
orgachem
PRO
8
3.6k
Geminiをパートナーに神社DXシステムを個人開発した話(いなめぐDX 開発振り返り)
fujiba
0
120
AI 開発合宿を通して得た学び
niftycorp
PRO
0
180
脱 雰囲気実装!AgentCoreを良い感じにWEBアプリケーションに組み込むために
takuyay0ne
3
420
AIと共にエンジニアとPMの “二刀流”を実現する
naruogram
0
110
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
4
2.2k
ポーリング処理廃止によるイベント駆動アーキテクチャへの移行
seitarof
3
1.3k
Goの型安全性で実現する複数プロダクトの権限管理
ishikawa_pro
2
1.4k
我々はなぜ「層」を分けるのか〜「関心の分離」と「抽象化」で手に入れる変更に強いシンプルな設計〜 #phperkaigi / PHPerKaigi 2026
shogogg
2
720
Featured
See All Featured
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
140
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1k
The Invisible Side of Design
smashingmag
302
51k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
180
My Coaching Mixtape
mlcsv
0
90
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
87
Music & Morning Musume
bryan
47
7.1k
Designing for Timeless Needs
cassininazir
0
180
Transcript
Filtering n-‐grams using Machine Learning
Unsorted unigrams. 13M closetohome CMX309FLC AZ3
Lehanga indexterm.endofrang NIC3 N1NB Mirabadi phantomd ANOTHER.EXAMPLE awful63 Zabolotsky Dispencer cremonesi kind.The ECOOP'97 4.499E OrbitzSaver jellying ENr313 paulxcs Campaoré überschreibt PüZmann nomalized Profesje Blogzerk imnot getPluginPreferencesF lag backgroundCorrect DEDeutschland at'ai
Filtered with regexps, 10M closetohome lehanga Mirabadi
phantomd Zabolotsky Dispencer cremonesi 0 jellying paulxcs Campaoré überschreibt PüZmann nomalized Profesje Blogzerk imnot DEDeutschland at'ai
Filtered with SVM, 2.5M closetohome lehanga Mirabadi
phantomd Zabolotsky Dispencer cremonesi 0 jellying paulxcs nomalized Profesje Blogzerk imnot
Data • Good data: wikaonary words •
Bad data: words filtered out by regexps • Features – length of word – count of uppercase chars (excluding first one) – count of non-‐alpha chars – probability of word given 2-‐char n-‐grams – unigram frequency
Details • scikit-‐learn – python library for machine
learning • SVM with Gaussian kernel • O(# of features * N2) – O(# of features * N3) • 100k items in training data => 5 min on 2 Ghz • F1 = 0.98
Thank you! Roman Vorushin, Grammarly Inc.
hZp://vorushin.ru hZp://twiZer.com/vorushin