Slide 1

Slide 1 text

Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment [email protected] ಡΈձ @ य़೔ΤϦΞ <2016/09/26> * ਤ͸ຊจ͔ΒҾ༻

Slide 2

Slide 2 text

֓ཁ • ؚҙؔ܎ʢlexical entailmentʣΛਪఆ͢ΔλεΫ • ෼ࢄදݱΛ༻͍ͨख๏Ͱ͋Δ • ͍͔ͭ͘ͷσʔληοτͰstate-of-the-artΛୡ੒ • ֶशͨ͠Ϟσϧ͸ɼֶशͷ్தͰHearst Pattern[1]Λ
 ֶश͍ͯ͠ΔΑ͏ͳৼΔ෣͍Λݟͤͨ

Slide 3

Slide 3 text

ؚҙؔ܎ • ೋͭͷจ T (text) ͱ H (hypothesis) ͕͋Δͱ͖ɼ
 T ͕ਖ਼͍͠ -> H ΋ਖ਼͍͠ͱਪ࿦Ͱ͖Δ ೣ͸ಈ෺ ੢දϠϚωί͸ಈ෺

Slide 4

Slide 4 text

୯ޠͷ෼ࢄදݱ • ग़ݱස౓ϕʔεͷ෼ࢄදݱϕΫτϧۭؒ΁ͷࣸ૾ • Gigaword, Wikipedia, BNC, ukWaCͷίʔύεΛѹॖ ͨ͠΋ͷΛίʔύεͱ͢Δ (සग़250k୯ޠ͕ର৅) • PPMI, SVDʹΑͬͯ300࣍ݩʹམͱͯ͠ਖ਼نԽ͢Δ

Slide 5

Slide 5 text

Distributional Inclusion Hypothesis • ԼҐ֓೦ͷจ຺͸ɼ্Ґ֓೦ͷจ຺ͷ෦෼ू߹ʹͳΔ ೣͷจ຺ ੢දϠϚωίͷจ຺ note that this hypothesis is mainly used in unsupervised method

Slide 6

Slide 6 text

ఆࣜԽ • ޠኮؚҙΛڭࢣ෇͖ͷ෼ྨ໰୊ͱͯ͠ղ͘ • HͱwΛೖྗͱͯ͠ɼw͕HΛؚҙ͍ͯ͠Δ͔Ͳ͏͔Λ൑ผ͢Δ • H: ݁Ռ, w: લ߲ʢͱ΋ʹ෼ࢄදݱʣ

Slide 7

Slide 7 text

ೋͭͷ෼ࢄදݱɼ࢛ͭͷϞσϧ • ϕΫτϧΛͲͷΑ͏ʹѻ͑͹Α͍͔ • ConcatͱDiff͸ޠኮͷؔ܎ͱ͍͏ΑΓ͸ʮయܕత͞ʯΛٻΊ͍ͯΔ
 ʢޠኮͷؔ܎Λଊ͍͑ͯͳ͍ʣ • Ksim͸ίαΠϯྨࣅ౓Λߟྀ͢Δ͜ͱͰޠኮͷؔ܎Λଊ͑Α͏ͱͨ͠ • ݁ہɼͲͷϞσϧ͕Ұ൪ྑ͍ͷʁʢλεΫʹΑͬͯ݁Ռ͕มΘΔʣ $PODBU ϕΫτϧΛ݁߹͢Δ %J⒎ ϕΫτϧͷࠩΛऔΔ "TZN ϕΫτϧͷࠩͱೋ৐ࠩΛऔΔ ,TJN ίαΠϯྨࣅ౓΋औΔ

Slide 8

Slide 8 text

৭ʑͳσʔληοτ -&%4
 ʢۉߧσʔλʣ 8PSE/FU͔Β࡞ΒΕͨσʔλ ϖΞͣͭ ਖ਼ྫ্ҐԼҐؔ܎ͷ୯ޠϖΞ ෛྫ্ҐԼҐؔ܎Ͱ͸ͳ͍୯ޠϖΞ SBOEPNTBNQMF #-&44
 ʢෆۉߧσʔλʣ छྨͷΧςΰϦ͔Βબ୒͞Εͨݸͷ໊ࢺ ໊֤ࢺɿԼҐޠɾ্Ґޠɾ෦෼ޠɾద౰ͳޠ ϥϯμϜ ্ҐԼҐؔ܎Λਖ਼ྫɼͦΕҎ֎ͷؔ܎Λෛྫͱͯ͠ѻ͏ .FEJDBM
 ʢෆۉߧσʔλʣ XPSEWFSCPCKFDU ͷࡾͭ૊σʔλ ݅ ؚҙʹؔ͢Δσʔλ͸݅ ༷ʑͳޠኮؔ܎ΛؚΈɼ೉͍͠ 5. ʢۉߧσʔλʣ ޠኮؔ܎σʔλʢؚ݅ҙؔ܎͸݅ʣ

Slide 9

Slide 9 text

࣮ݧ݁Ռ ͦΕͧΕͷσʔληοτʹରͯ͠ɼద੾ͳޠኮؔ܎Λ༧ଌ ϩδεςΟοΫճؼϞσϧͰ༧ଌʢධՁई౓͸F஋ʣ

Slide 10

Slide 10 text

ߟ࡯ • ͍͍ͩͨKsimϞσϧ͕ڧ͍ʢઌߦݚڀΛࢧ࣋ʣ • ͳΜ͔ConcatϞσϧ͕BLESSͰѹউ͍ͯ͠Δʢ!?ʣ • BLESSͷෛྫ͸ਖ਼ྫͷϖΞΛγϟοϑϧͯ͠࡞੒ • ਖ਼ྫʹ΋ෛྫʹ΋ಉ͡ޠ͕ग़ݱ͢Δ • ͜ͷͨΊɼ୯ޠಉ࢜ͷؔ܎ΛݟΔͱࣦഊ͢Δ

Slide 11

Slide 11 text

ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi = h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w

Slide 12

Slide 12 text

ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi = h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w

Slide 13

Slide 13 text

ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi = h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w ʊਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹ୯ޠؒͷؔ܎Ψϯແࢹɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Y^Y^Yʉ

Slide 14

Slide 14 text

ͭ·Γ • ConcatϞσϧ͸͋ΔಛఆͷλεΫʹ͓͍ͯɼ
 ޠ۟ಉ࢜ͷؔ܎Λߴਫ਼౓Ͱਪఆ͍ͯ͠Δʹ΋ؔΘΒͣɼ
 ޠ۟ͷؒͷؔ܎͸Ұ੾ߟྀ͍ͯ͠ͳ͍ • ʹʼ͡Ό͋݁ہԿͳͷʁʢʹ෼཭௒ฏ໘͸Կ͔ʣ • ෼཭௒ฏ໘͸ɼHearst Pattern*ͷΑ͏ͳ΋ͷΛֶश
 ͍ͯͨ͠ʂ * ্ҐԼҐؔ܎Λநग़͢ΔͨΊͷఆܕදݱू [1]ʢA such as B, A including B, …etcʣ

Slide 15

Slide 15 text

͜ͷಛੑΛར༻ͯ͠ʙఏҊϞσϧʙ • ओ੒෼෼ੳͷΑ͏ͳΞϧΰϦζϜΛఆٛ͠ɼ
 ෼཭௒ฏ໘ʢHearst Pattern detectorʣΛܭࢉ͠ɼ
 ͦͷ௒ฏ໘΁ࣸ૾ʢ܁Γฦ͠ߦ͏ʣ • ్தͰ࡞੒͞ΕΔ෼཭௒ฏ໘Λૉੑʹ૊ΈࠐΜͩ
 ϞσϧΛ࡞ͬͨ Fi(hHi , wi i, ˆ pi) = hHi T w, Hi T ˆ pi , wi T ˆ pi , Hi T ˆ pi wi T ˆ pi i 1 : Word Similarity 2, 3: Hearst Pattern Detector 4 : Inclusion

Slide 16

Slide 16 text

෼཭௒ฏ໘ͨͪ

Slide 17

Slide 17 text

݁Ռ • ఏҊख๏͕BLESSͱMedicalͰstate-of-the-art • Hearst Pattern DetectorΛൈ͘ͱείΞ͕Լ͕Δ • ૉੑͱͯ͠ޮ͍͍ͯΔͬΆ͍ʂ

Slide 18

Slide 18 text

·ͱΊ • ޠኮؔ܎Λਪఆ͢ΔλεΫʹؔ͢Δݚڀ • طଘͷ͍͔ͭ͘ͷओཁͳϞσϧɾσʔληοτͷ෼ੳ • ಛʹConcat Modelʹ஫໨ • Concat Model͸Hearst PatternΛݕ஌͢ΔΑ͏ͳ
 ֶशΛ͍ͯͨ͠ • ͜ͷৼΔ෣͍ͱɼ୯ޠؒͷྨࣅ౓ɼ
 ͦͯ͠Inclusion HypothesisΛૉੑͱͨ͠ϞσϧΛఏҊ • ෳ਺ͷλεΫͰstate-of-the-artΛୡ੒ͨ͠

Slide 19

Slide 19 text

ࢀߟจݙ 1. Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. 2. Levy, Omer, et al. "Do supervised distributional methods really learn lexical inference relations." Proceedings of NAACL, Denver, CO (2015).