Slide 9
Slide 9 text
直感的な方法:BoW 間の Jaccard index
文の表現:BoW
• = {‘he’, ‘has’, ‘a’, ‘cat’}
• = {‘he’, ‘has’, ‘a’, ‘dog’}
文類似度:集合間の類似性尺度
• Jaccard(, ) =
| ∩ |
| ∪ |
, Otsuka(, ) =
| ∩ |
|| × ||
, …
∘ 要するに
#{shared elements}
#{total elements}
問題点:symbol 同士の類似性が無視される(いつもの)
= {‘he’, ‘has’, ‘a’, ‘cat’} and = {‘she’, ‘had’, ‘one’, ‘dog’}. The
situation here is that ∩ = ∅ and so their similarity according to
any set similarity measure is 0
8 / 29