Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Missspell Detection
Search
bk
February 10, 2020
Science
170
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Missspell Detection
bk
February 10, 2020
More Decks by bk
See All by bk
Befriending Kurtosis with R
bk_18
1
1k
tidy_rpart
bk_18
1
1.7k
dotdotdot_in_predict_function
bk_18
1
1.1k
Introduction_of_GoogleAnalytics_with_R
bk_18
2
1k
web scraping with polite package
bk_18
2
830
start-salesforce-with-r
bk_18
0
910
About Missing Values
bk_18
1
400
Other Decks in Science
See All in Science
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
PRO
0
270
Conversation is the New Dashboard: 属人性を排除する第4世代BIツールの勢力図
shomaekawa
1
590
知能とはなにか -ヒトとAIのあいだ-
tagtag
PRO
1
110
白金鉱業Vol.21【初学者向け発表枠】身近な例から学ぶ数理最適化の基礎 / Learning the Basics of Mathematical Optimization Through Everyday Examples
brainpadpr
1
750
(CVPR2026) Back to Basics: Let Denoising Generative Models Denoise
shumpei777
0
150
CVPR2026_VGGTとその仲間たち
mickey_0226
0
850
機械学習 - 決定木からはじめる機械学習
trycycle
PRO
0
1.5k
俺たちは本当に分かり合えるのか? ~ PdMとスクラムチームの “ずれ” を科学する
bonotake
2
2.4k
アクシズを探せ! 各勢力の位置関係についての考察
miu_crescent
PRO
1
390
生成AI・プレプリント時代における 研究成果公開の再設計 ― トップカンファレンス文化はどこへ向かうのか / Redesigning the Dissemination of Research Outputs in the Age of Generative AI and Preprints — Where Is the Top-Conference Culture Heading?
ykiyota
0
27k
あなたに水耕栽培を愛していないとは言わせない
mutsumix
1
340
共生概念の整理と AIアライメントの構想
hiroakihamada
0
220
Featured
See All Featured
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
170
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Code Reviewing Like a Champion
maltzj
528
40k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
360
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Docker and Python
trallard
47
3.9k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
Done Done
chrislema
186
16k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
HDC tutorial
michielstock
2
720
Transcript
ฤूڑʹΑΔจࣈྻޡදهݕ ϨʔϕϯγϡλΠϯڑͱδϟϩɾΟϯΫϥʔڑ
࣍ 1. ՝……………………………………p.3-10 2. ࡞ͬͨͷ……………………………p.11-16 3. ฤूڑ………………………………p.17-39 4. ݁Ռ……………………………………p.40-41 5.·ͱΊ…………………………………p.42
6.ࢀߟจݙ………………………………p.43
՝
՝ ϒϥϯυͷࡏݿ
՝ flea ख࡞ۀͰग़
՝ flea
՝ GUCCI Tote Bag Black Leather flea ग़লྗԽ
՝ GUCCHI Tote Bag Black Leather flea
՝ GUCCHI Tote Bag Black Leather flea • ग़औΓফ͠ •
ग़ऀධՁԼ • ΞΧϯτఀࢭ ϒϥϯυ໊ޡදهͷ ϖφϧςΟ
՝ AIͰͳΜͱ͔ͯ͠ Python ࣗવݴޠॲཧ
࡞ͬͨͷ
࡞ͬͨͷ ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ
ɾɾɾ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨͷ
ग़λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ
୯ޠʹղ ग़୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨͷ
ฤूڑ
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance)
ฤूڑʢϨʔϕϯγϡλΠϯڑʣ ͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ
͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ ૢ࡞ ஔ আ ૠೖ ૢ࡞ճ=ڑ ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ஔ ݩͷจࣈྻ G U T T I ൺֱ͢Δจࣈྻ G U
C C I ஔ ૢ࡞ճ = ڑ = 2 ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ஔ আ ૠೖ GUTTI GUCCI GUCCHI GUCCI GUCI GUCCI ฤूճʢڑʣ
2 1 1 ݩͷจࣈྻ ൺֱ͢Δจࣈྻ ฤूํ๏ ฤूڑʢϨʔϕϯγϡλΠϯڑʣ
ฤूڑ 1. ϨʔϕϯγϡλΠϯڑ (Levenshtein Distance) 2. δϟϩɾΟϯΫϥʔڑ (Jaro-Winkler Distance) GUCCHI
GUCCI
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ δϟϩڑɿ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) m m m m s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ δϟϩڑɿ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ max(|s1 |, |s2 |) 2 − 1 ݩͷจࣈྻɿGCCUHI →
6 ൺֱ͢ΔจࣈྻɿGUCCI → 5 max(6,5) 2 − 1 = 2 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I ۠ؒͰҰகจࣈΛݕࡧ Ұகจࣈ͕͋ΕΧϯτ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
mɿ۠ؒͷҰகจࣈ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I m = 5 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) t s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ จࣈྻͷ෦తͳҰக߹͍ΛଌΔ ͕େ͖͍ํ͕ڑ͕͍ۙ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
tɿҰகจࣈͷஔ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G
U C C I Ұகͨ͠จࣈΛநग़ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U C C I ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
tɿҰகจࣈͷஔ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U
C C I t = 2 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ ಉҰͷจࣈྻʹ͢ΔҝʹԿจࣈஔ͢Δͷ͔
Dj = 1 3 * ( m |s1 | +
m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ͞ mɿ۠ؒͷҰகจࣈ tɿҰகจࣈͷஔ = 1 3 * ( 5 6 + 5 5 + 5 − 2 2 5 ) = 79 90 = 0.8777... ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ δϟϩɾΟϯΫϥʔڑɿ ઌ಄จࣈͷҰகॏΈΛ͚ͭͯධՁ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ l δϟϩɾΟϯΫϥʔڑɿ ઌ಄จࣈͷҰகॏΈΛ͚ͭͯධՁ ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ ݩͷจࣈྻ G C C U H I
ൺֱ͢Δจࣈྻ G U C C I l = 1 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
Djw = Dj + l * 1 10 * (1
− Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈʢl <= 4ʣ = 79 90 + 1 * 1 10 * (1 − 79 90 ) = 801 900 = 0.89 ฤूڑʢδϟϩɾΟϯΫϥʔڑʣ
* https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance Levenshteinɿখ͍͞΄Ͳ͍ۙ Jaro-Winklerɿେ͖͍΄Ͳ͍ۙ ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler
GUCCHI GUCCI 1 0.97 GUTTI 2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 ฤूڑ
ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler GUCCHI GUCCI 1 0.97 GUTTI
2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 Jaro-WinklerҰக͢Δจࣈ͕ ଘࡏ͍ͯ͠Δ͜ͱΛධՁ͍ͯ͠Δɻ LevenshteinͱJaro-WinklerͰ ۙ͞ͷॱং͕ҟͳΔɻ ฤूڑ * https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance
݁Ռ
.py ͳΜ͔ಈ͍ͯΔ͔Βྑ͠ ݁Ռ * https://github.com/bk-18/Misspelled-Brand-Name-Detector
·ͱΊ • ग़࣌ͷϒϥϯυ໊ޡදهͱ͍͏՝ • Ϧετ૯ͨΓʹΑΔޡදهݕ • ϨʔϕϯγϡλΠϯڑ • δϟϩɾΟϯΫϥʔڑ
ࢀߟจݙ • ̎ͭͷจࣈྻͷྨࣅΛԽɹϨʔϕϯγϡλΠϯڑͱδϟϩɾΟ ϯΫϥʔڑͷղઆ, ਓೳͰ͋ͦͿ, http://nkdkccmbr.hateblo.jp/entry/ 2016/08/18/102727 • ฤूڑ (Levenshtein
Distance), naoyaͷͯͳμΠΞϦʔ, https:// naoya-2.hatenadiary.org/entry/20090329/1238307757 • จࣈྻྨࣅධՁ ϨʔϕϯγϡλΠϯڑ / δϟϩɾΟϯΫϥʔڑ, ਓೳͯ͠ΈΔ, http://grahamian.hatenablog.com/entry/word_similarity • Yaoshu Wang(B) , Jianbin Qin, and Wei Wang,: Efficient Approximate Entity Matching Using Jaro-Winkler Distance, Univeristy of New South Wales, http://qinjianbin.com/files/wise2017-wang.pdf
ENJOY! ENJAY! EMJOY! ENJOI! ENZYOI!