Slide 1

Slide 1 text

ฤूڑ཭ʹΑΔจࣈྻޡදهݕ஌ ϨʔϕϯγϡλΠϯڑ཭ͱδϟϩɾ΢ΟϯΫϥʔڑ཭

Slide 2

Slide 2 text

໨࣍ 1. ՝୊……………………………………p.3-10 2. ࡞ͬͨ΋ͷ……………………………p.11-16 3. ฤूڑ཭………………………………p.17-39 4. ݁Ռ……………………………………p.40-41 5.·ͱΊ…………………………………p.42 6.ࢀߟจݙ………………………………p.43

Slide 3

Slide 3 text

՝୊

Slide 4

Slide 4 text

՝୊ ϒϥϯυ඼ͷࡏݿ

Slide 5

Slide 5 text

՝୊ flea ख࡞ۀͰग़඼

Slide 6

Slide 6 text

՝୊ flea

Slide 7

Slide 7 text

՝୊ GUCCI Tote Bag Black Leather flea ग़඼লྗԽ

Slide 8

Slide 8 text

՝୊ GUCCHI Tote Bag Black Leather flea

Slide 9

Slide 9 text

՝୊ GUCCHI Tote Bag Black Leather flea • ग़඼औΓফ͠ • ग़඼ऀධՁ௿Լ • ΞΧ΢ϯτఀࢭ ϒϥϯυ໊ޡදه΁ͷ ϖφϧςΟ

Slide 10

Slide 10 text

՝୊ AIͰͳΜͱ͔ͯ͠ Python ࣗવݴޠॲཧ

Slide 11

Slide 11 text

࡞ͬͨ΋ͷ

Slide 12

Slide 12 text

࡞ͬͨ΋ͷ ग़඼λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ

Slide 13

Slide 13 text

ग़඼λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ ୯ޠʹ෼ղ ग़඼୯ޠϦετ GUCCHI Tote Bag Black Leather ࡞ͬͨ΋ͷ

Slide 14

Slide 14 text

ग़඼λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ ୯ޠʹ෼ղ ग़඼୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ࡞ͬͨ΋ͷ

Slide 15

Slide 15 text

ग़඼λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ ୯ޠʹ෼ղ ग़඼୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯౰ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨ΋ͷ

Slide 16

Slide 16 text

ग़඼λΠτϧϦετ GUCCHI Tote Bag Black Leather ɾɾɾ ɾɾɾ ɾɾɾ ɾɾɾ ୯ޠʹ෼ղ ग़඼୯ޠϦετ GUCCHI Tote Bag Black Leather ਖ਼ϒϥϯυ໊Ϧετ GUCCI VUITTON ɾɾɾ ɾɾɾ ɾɾɾ ૯౰ͨΓ ࣅͨ୯ޠΛग़ྗ ࡞ͬͨ΋ͷ

Slide 17

Slide 17 text

ฤूڑ཭

Slide 18

Slide 18 text

ฤूڑ཭ 1. ϨʔϕϯγϡλΠϯڑ཭ (Levenshtein Distance) 2. δϟϩɾ΢ΟϯΫϥʔڑ཭ (Jaro-Winkler Distance) GUCCHI GUCCI

Slide 19

Slide 19 text

ฤूڑ཭ 1. ϨʔϕϯγϡλΠϯڑ཭ (Levenshtein Distance) 2. δϟϩɾ΢ΟϯΫϥʔڑ཭ (Jaro-Winkler Distance) GUCCHI GUCCI 1. ϨʔϕϯγϡλΠϯڑ཭ (Levenshtein Distance)

Slide 20

Slide 20 text

ฤूڑ཭ʢϨʔϕϯγϡλΠϯڑ཭ʣ ͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ

Slide 21

Slide 21 text

͋Δจࣈྻ ൺֱ͢Δจࣈྻ จࣈΛૢ࡞ͯ͠Ұகͤ͞Δ ૢ࡞ ஔ׵ ࡟আ ૠೖ ૢ࡞ճ਺=ڑ཭ ฤूڑ཭ʢϨʔϕϯγϡλΠϯڑ཭ʣ

Slide 22

Slide 22 text

ஔ׵ ݩͷจࣈྻ G U T T I ൺֱ͢Δจࣈྻ G U C C I ஔ׵ ૢ࡞ճ਺ = ڑ཭ = 2 ฤूڑ཭ʢϨʔϕϯγϡλΠϯڑ཭ʣ

Slide 23

Slide 23 text

ஔ׵ ࡟আ ૠೖ GUTTI GUCCI GUCCHI GUCCI GUCI GUCCI ฤूճ਺ʢڑ཭ʣ 2 1 1 ݩͷจࣈྻ ൺֱ͢Δจࣈྻ ฤूํ๏ ฤूڑ཭ʢϨʔϕϯγϡλΠϯڑ཭ʣ

Slide 24

Slide 24 text

ฤूڑ཭ 1. ϨʔϕϯγϡλΠϯڑ཭ (Levenshtein Distance) 2. δϟϩɾ΢ΟϯΫϥʔڑ཭ (Jaro-Winkler Distance) GUCCHI GUCCI

Slide 25

Slide 25 text

Dj = 1 3 * ( m |s1 | + m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ௕͞ mɿ۠ؒ಺ͷҰகจࣈ਺ tɿҰகจࣈͷஔ׵਺ δϟϩڑ཭ɿ จࣈྻͷ෦෼తͳҰக౓߹͍ΛଌΔ ஋͕େ͖͍ํ͕ڑ཭͕͍ۙ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 26

Slide 26 text

Dj = 1 3 * ( m |s1 | + m |s2 | + m − t 2 m ) m m m m s1, s2 ɿจࣈྻͷ௕͞ mɿ۠ؒ಺ͷҰகจࣈ਺ tɿҰகจࣈͷஔ׵਺ δϟϩڑ཭ɿ จࣈྻͷ෦෼తͳҰக౓߹͍ΛଌΔ ஋͕େ͖͍ํ͕ڑ཭͕͍ۙ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 27

Slide 27 text

mɿ۠ؒ಺ͷҰகจࣈ਺ max(|s1 |, |s2 |) 2 − 1 ݩͷจࣈྻɿGCCUHI → 6 ൺֱ͢ΔจࣈྻɿGUCCI → 5 max(6,5) 2 − 1 = 2 ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 28

Slide 28 text

mɿ۠ؒ಺ͷҰகจࣈ਺ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G U C C I ۠ؒ಺ͰҰகจࣈΛݕࡧ Ұகจࣈ͕͋Ε͹Χ΢ϯτ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 29

Slide 29 text

mɿ۠ؒ಺ͷҰகจࣈ਺ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G U C C I m = 5 ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 30

Slide 30 text

Dj = 1 3 * ( m |s1 | + m |s2 | + m − t 2 m ) t s1, s2 ɿจࣈྻͷ௕͞ mɿ۠ؒ಺ͷҰகจࣈ਺ tɿҰகจࣈͷஔ׵਺ จࣈྻͷ෦෼తͳҰக౓߹͍ΛଌΔ ஋͕େ͖͍ํ͕ڑ཭͕͍ۙ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 31

Slide 31 text

tɿҰகจࣈͷஔ׵਺ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G U C C I Ұகͨ͠จࣈΛநग़ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U C C I ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 32

Slide 32 text

tɿҰகจࣈͷஔ׵਺ ݩͷจࣈྻ G C C U I ൺֱ͢Δจࣈྻ G U C C I t = 2 ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ ಉҰͷจࣈྻʹ͢ΔҝʹԿจࣈஔ׵͢Δͷ͔

Slide 33

Slide 33 text

Dj = 1 3 * ( m |s1 | + m |s2 | + m − t 2 m ) s1, s2 ɿจࣈྻͷ௕͞ mɿ۠ؒ಺ͷҰகจࣈ਺ tɿҰகจࣈͷஔ׵਺ = 1 3 * ( 5 6 + 5 5 + 5 − 2 2 5 ) = 79 90 = 0.8777... ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 34

Slide 34 text

Djw = Dj + l * 1 10 * (1 − Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈ਺ʢl <= 4ʣ δϟϩɾ΢ΟϯΫϥʔڑ཭ɿ ઌ಄਺จࣈͷҰக͸ॏΈΛ͚ͭͯධՁ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 35

Slide 35 text

Djw = Dj + l * 1 10 * (1 − Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈ਺ʢl <= 4ʣ l δϟϩɾ΢ΟϯΫϥʔڑ཭ɿ ઌ಄਺จࣈͷҰக͸ॏΈΛ͚ͭͯධՁ ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 36

Slide 36 text

lɿઌ಄͔ΒͷҰகจࣈ਺ʢl <= 4ʣ ݩͷจࣈྻ G C C U H I ൺֱ͢Δจࣈྻ G U C C I l = 1 ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 37

Slide 37 text

Djw = Dj + l * 1 10 * (1 − Dj ) Dj ɿJaro Distance lɿઌ಄͔ΒͷҰகจࣈ਺ʢl <= 4ʣ = 79 90 + 1 * 1 10 * (1 − 79 90 ) = 801 900 = 0.89 ฤूڑ཭ʢδϟϩɾ΢ΟϯΫϥʔڑ཭ʣ

Slide 38

Slide 38 text

* https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance Levenshteinɿখ͍͞΄Ͳ͍ۙ Jaro-Winklerɿେ͖͍΄Ͳ͍ۙ ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler GUCCHI GUCCI 1 0.97 GUTTI 2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 ฤूڑ཭

Slide 39

Slide 39 text

ݩͷจࣈྻ ൺֱ͢Δ จࣈྻ *Levenshtein **Jaro-Winkler GUCCHI GUCCI 1 0.97 GUTTI 2 0.79 GCCUHI 3 0.89 άον༟ࡾ 5 0.00 Jaro-Winkler͸Ұக͢Δจࣈ͕ ଘࡏ͍ͯ͠Δ͜ͱΛධՁ͍ͯ͠Δɻ LevenshteinͱJaro-WinklerͰ ۙ͞ͷॱং͕ҟͳΔɻ ฤूڑ཭ * https://github.com/ztane/python-Levenshtein/ **https://github.com/nap/jaro-winkler-distance

Slide 40

Slide 40 text

݁Ռ

Slide 41

Slide 41 text

.py ͳΜ͔ಈ͍ͯΔ͔Βྑ͠ ݁Ռ * https://github.com/bk-18/Misspelled-Brand-Name-Detector

Slide 42

Slide 42 text

·ͱΊ • ग़඼࣌ͷϒϥϯυ໊ޡදهͱ͍͏՝୊ • Ϧετ૯౰ͨΓʹΑΔޡදهݕ஌ • ϨʔϕϯγϡλΠϯڑ཭ • δϟϩɾ΢ΟϯΫϥʔڑ཭

Slide 43

Slide 43 text

ࢀߟจݙ • ̎ͭͷจࣈྻͷྨࣅ౓Λ਺஋ԽɹϨʔϕϯγϡλΠϯڑ཭ͱδϟϩɾ΢Ο ϯΫϥʔڑ཭ͷղઆ, ਓ޻஌ೳͰ͋ͦͿ, http://nkdkccmbr.hateblo.jp/entry/ 2016/08/18/102727 • ฤूڑ཭ (Levenshtein Distance), naoyaͷ͸ͯͳμΠΞϦʔ, https:// naoya-2.hatenadiary.org/entry/20090329/1238307757 • จࣈྻྨࣅ౓ධՁ ϨʔϕϯγϡλΠϯڑ཭ / δϟϩɾ΢ΟϯΫϥʔڑ཭, ਓ޻஌ೳͯ͠ΈΔ, http://grahamian.hatenablog.com/entry/word_similarity • Yaoshu Wang(B) , Jianbin Qin, and Wei Wang,: Efficient Approximate Entity Matching Using Jaro-Winkler Distance, Univeristy of New South Wales, http://qinjianbin.com/files/wise2017-wang.pdf

Slide 44

Slide 44 text

ENJOY! ENJAY! EMJOY! ENJOI! ENZYOI!