himkt
September 26, 2016
150

# Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

## himkt

September 26, 2016

## Transcript

1. ### Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in

Distributional Vectors for Lexical Entailment [email protected] ಡΈձ @ य़೔ΤϦΞ <2016/09/26> * ਤ͸ຊจ͔ΒҾ༻
2. ### ֓ཁ • ؚҙؔ܎ʢlexical entailmentʣΛਪఆ͢ΔλεΫ • ෼ࢄදݱΛ༻͍ͨख๏Ͱ͋Δ • ͍͔ͭ͘ͷσʔληοτͰstate-of-the-artΛୡ੒ • ֶशͨ͠Ϟσϧ͸ɼֶशͷ్தͰHearst

Pattern[1]Λ  ֶश͍ͯ͠ΔΑ͏ͳৼΔ෣͍Λݟͤͨ
3. ### ؚҙؔ܎ • ೋͭͷจ T (text) ͱ H (hypothesis) ͕͋Δͱ͖ɼ  T

͕ਖ਼͍͠ -> H ΋ਖ਼͍͠ͱਪ࿦Ͱ͖Δ ೣ͸ಈ෺ ੢දϠϚωί͸ಈ෺
4. ### ୯ޠͷ෼ࢄදݱ • ग़ݱස౓ϕʔεͷ෼ࢄදݱϕΫτϧۭؒ΁ͷࣸ૾ • Gigaword, Wikipedia, BNC, ukWaCͷίʔύεΛѹॖ ͨ͠΋ͷΛίʔύεͱ͢Δ (සग़250k୯ޠ͕ର৅)

• PPMI, SVDʹΑͬͯ300࣍ݩʹམͱͯ͠ਖ਼نԽ͢Δ
5. ### Distributional Inclusion Hypothesis • ԼҐ֓೦ͷจ຺͸ɼ্Ґ֓೦ͷจ຺ͷ෦෼ू߹ʹͳΔ ೣͷจ຺ ੢දϠϚωίͷจ຺ note that this

hypothesis is mainly used in unsupervised method

7. ### ೋͭͷ෼ࢄදݱɼ࢛ͭͷϞσϧ • ϕΫτϧΛͲͷΑ͏ʹѻ͑͹Α͍͔ • ConcatͱDiff͸ޠኮͷؔ܎ͱ͍͏ΑΓ͸ʮయܕత͞ʯΛٻΊ͍ͯΔ  ʢޠኮͷؔ܎Λଊ͍͑ͯͳ͍ʣ • Ksim͸ίαΠϯྨࣅ౓Λߟྀ͢Δ͜ͱͰޠኮͷؔ܎Λଊ͑Α͏ͱͨ͠ • ݁ہɼͲͷϞσϧ͕Ұ൪ྑ͍ͷʁʢλεΫʹΑͬͯ݁Ռ͕มΘΔʣ

\$PODBU ϕΫτϧΛ݁߹͢Δ %J⒎ ϕΫτϧͷࠩΛऔΔ "TZN ϕΫτϧͷࠩͱೋ৐ࠩΛऔΔ ,TJN ίαΠϯྨࣅ౓΋औΔ
8. ### ৭ʑͳσʔληοτ -&%4  ʢۉߧσʔλʣ 8PSE/FU͔Β࡞ΒΕͨσʔλ ϖΞͣͭ  ਖ਼ྫ্ҐԼҐؔ܎ͷ୯ޠϖΞ ෛྫ্ҐԼҐؔ܎Ͱ͸ͳ͍୯ޠϖΞ SBOEPNTBNQMF #-&44

ʢෆۉߧσʔλʣ छྨͷΧςΰϦ͔Βબ୒͞Εͨݸͷ໊ࢺ ໊֤ࢺɿԼҐޠɾ্Ґޠɾ෦෼ޠɾద౰ͳޠ ϥϯμϜ  ্ҐԼҐؔ܎Λਖ਼ྫɼͦΕҎ֎ͷؔ܎Λෛྫͱͯ͠ѻ͏ .FEJDBM  ʢෆۉߧσʔλʣ XPSEWFSCPCKFDU ͷࡾͭ૊σʔλ ݅  ؚҙʹؔ͢Δσʔλ͸݅ ༷ʑͳޠኮؔ܎ΛؚΈɼ೉͍͠ 5. ʢۉߧσʔλʣ ޠኮؔ܎σʔλʢ݅ؚҙؔ܎͸݅ʣ

10. ### ߟ࡯ • ͍͍ͩͨKsimϞσϧ͕ڧ͍ʢઌߦݚڀΛࢧ࣋ʣ • ͳΜ͔ConcatϞσϧ͕BLESSͰѹউ͍ͯ͠Δʢ!?ʣ • BLESSͷෛྫ͸ਖ਼ྫͷϖΞΛγϟοϑϧͯ͠࡞੒ • ਖ਼ྫʹ΋ෛྫʹ΋ಉ͡ޠ͕ग़ݱ͢Δ •

͜ͷͨΊɼ୯ޠಉ࢜ͷؔ܎ΛݟΔͱࣦഊ͢Δ
11. ### ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi =

h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w
12. ### ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi =

h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w
13. ### ConcatϞσϧৄ͘͠ Linear (hH, wi) = ˆ pT hH, wi =

h ˆ H, ˆ wiT hH, , wi = ˆ HT H + ˆ wT w ʊਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹ୯ޠؒͷؔ܎Ψϯແࢹɹʻ ʉY^Y^Y^Y^Y^Y^Y^Y^Y^Y^Yʉ
14. ### ͭ·Γ • ConcatϞσϧ͸͋ΔಛఆͷλεΫʹ͓͍ͯɼ  ޠ۟ಉ࢜ͷؔ܎Λߴਫ਼౓Ͱਪఆ͍ͯ͠Δʹ΋ؔΘΒͣɼ  ޠ۟ͷؒͷؔ܎͸Ұ੾ߟྀ͍ͯ͠ͳ͍ • ʹʼ͡Ό͋݁ہԿͳͷʁʢʹ෼཭௒ฏ໘͸Կ͔ʣ • ෼཭௒ฏ໘͸ɼHearst Pattern*ͷΑ͏ͳ΋ͷΛֶश

͍ͯͨ͠ʂ * ্ҐԼҐؔ܎Λநग़͢ΔͨΊͷఆܕදݱू [1]ʢA such as B, A including B, …etcʣ
15. ### ͜ͷಛੑΛར༻ͯ͠ʙఏҊϞσϧʙ • ओ੒෼෼ੳͷΑ͏ͳΞϧΰϦζϜΛఆٛ͠ɼ  ෼཭௒ฏ໘ʢHearst Pattern detectorʣΛܭࢉ͠ɼ  ͦͷ௒ฏ໘΁ࣸ૾ʢ܁Γฦ͠ߦ͏ʣ • ్தͰ࡞੒͞ΕΔ෼཭௒ฏ໘Λૉੑʹ૊ΈࠐΜͩ  ϞσϧΛ࡞ͬͨ

Fi(hHi , wi i, ˆ pi) = hHi T w, Hi T ˆ pi , wi T ˆ pi , Hi T ˆ pi wi T ˆ pi i 1 : Word Similarity 2, 3: Hearst Pattern Detector 4 : Inclusion

18. ### ·ͱΊ • ޠኮؔ܎Λਪఆ͢ΔλεΫʹؔ͢Δݚڀ • طଘͷ͍͔ͭ͘ͷओཁͳϞσϧɾσʔληοτͷ෼ੳ • ಛʹConcat Modelʹ஫໨ • Concat

Model͸Hearst PatternΛݕ஌͢ΔΑ͏ͳ  ֶशΛ͍ͯͨ͠ • ͜ͷৼΔ෣͍ͱɼ୯ޠؒͷྨࣅ౓ɼ  ͦͯ͠Inclusion HypothesisΛૉੑͱͨ͠ϞσϧΛఏҊ • ෳ਺ͷλεΫͰstate-of-the-artΛୡ੒ͨ͠
19. ### ࢀߟจݙ 1. Hearst, Marti A. "Automatic acquisition of hyponyms from

large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. 2. Levy, Omer, et al. "Do supervised distributional methods really learn lexical inference relations." Proceedings of NAACL, Denver, CO (2015).