Slide 1

Slide 1 text

— ୈ10ճ࠷ઌ୺NLPษڧձ — A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors Hitoshi Manabe Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, and Sanjeev Arora. In ACL 2018. ※ εϥΠυURL : https://speakerdeck.com/manaysh/alacarte-snlp2018-1

Slide 2

Slide 2 text

֓ཁ • ΍Γ͍ͨ͜ͱ : ςΩετʹؔ͢Δ༷ʑͳૉੑΛຒΊࠐΈ͍ͨ • ௿ස౓ɾະ஌ޠ, n-gram, synsetͳͲͳͲ • ఏҊ : จ຺Λ׆༻͠ɺૉੑͷ෼ࢄදݱͷߏங • Ϟσϧࣗମͷֶश͸ࣄલֶशࡁΈ୯ޠ෼ࢄදݱͱ
 େن໛ίʔύεͷΈͰجຊతʹ͸࣮ߦՄೳ • ຒΊࠐΈ͍ͨૉੑͷจ຺Λ༩͑ɺ෼ࢄදݱΛon the flyͰ֫ಘ • ݁Ռ : ୯ޠ෼ࢄදݱͷFew-shotֶश, จॻ෼ྨλεΫͳͲͷԠ༻ • ਺ࣄྫͷจ຺Λ༩͑Δ͚ͩͰ΋ྑ͍ײ͡ͷදݱ͕ಘΒΕΔ • ͦͷଞԠ༻λεΫͰͷ༗ޮੑΛ֬ೝ

Slide 3

Slide 3 text

໰୊ઃఆ ࣄલʹඞཁͳ΋ͷ • ޠኮ Λ΋ͭେن໛ͳςΩετίʔύε • ֤ର৅ͷ୯ޠ ʹର͢Δग़ݱจ຺ ͷू߹ͱͯ͠දݱ • (ͦΕΛ࢖ͬͯ)ࣄલֶशࡁΈͷ୯ޠ෼ࢄදݱ 
 • Կ͔͠Βͷର৅ͷૉੑ(৽ޠɾsynsetͳͲ) ͱͦͷग़ݱจ຺
 Λ༩͑ͯɺର৅ૉੑͷ෼ࢄදݱ Λon the flyͰ֫ಘ CV vw f Cf V w Cw vf vBarackObama Cf

Slide 4

Slide 4 text

໰୊ઃఆ ࣄલʹඞཁͳ΋ͷ • ޠኮ Λ΋ͭେن໛ͳςΩετίʔύε • ֤ର৅ͷ୯ޠ ʹର͢Δग़ݱจ຺ ͷू߹ͱͯ͠දݱ • (ͦΕΛ࢖ͬͯ)ࣄલֶशࡁΈͷ୯ޠ෼ࢄදݱ 
 • Կ͔͠Βͷର৅ͷૉੑ(৽ޠɾsynsetͳͲ) ͱͦͷग़ݱจ຺
 Λ༩͑ͯɺର৅ૉੑͷ෼ࢄදݱ Λon the flyͰ֫ಘ CV vw f Cf V w Cw vf beef up the army we must beef up our organization … v(beef,up) Cf

Slide 5

Slide 5 text

ઌߦݚڀ Additive [Lazaridou+. 2017] • featureͷจ຺ͱؚͯ͠·ΕΔ୯ޠ෼ࢄදݱͷฏۉͰදݱ • ͨͩͷ࿨Ͱྑ͍ͷ͔ʁ • ୯ޠ෼ࢄදݱʹ͸ڞ௨ͷํ޲੒෼Λ࣋ͭɻͦͷͨΊଟ͘ͷදݱ Λ଍͢͜ͱͰ༗༻ͳ৘ใ͕ڞ௨੒෼ͷ৘ใͰຒ΋ΕΔ • ετοϓϫʔυ΍common word͸ϊΠζʹͳΔ • ղܾࡦͱͯ͠ த৺Խ+ओ੒෼࡟আ[Mu+. 2018] ΍ग़ݱස౓ͷٯ਺ ͰॏΈ෇͚[Arora+. 2017] 
 ͨͩ͠ɺे෼ʹνϡʔχϯά͕ඞཁ(…?) vadditive f = 1 |Cf | X c2Cf 1 |c| X w2c vw

Slide 6

Slide 6 text

ఏҊख๏ ఏҊख๏: a la carte • ݩͷ୯ޠ෼ࢄදݱ͕Additive + ઢܗม׵ͰۙࣅͰ͖ΔΑ͏ֶश
 
 
 (ͨͩ͠ ͸ֶश͞ΕΔࣸ૾ύϥϝʔλ) • ಋग़͞ΕΔ ͸͋Γ͖ͨΓͳํ޲੒෼ (common wordͷํ޲౳)
 Λshrinkͤ͞Δಇ͖͕͋Δ[Arora+. 2018] • ดܗࣜͰ࠷దԽͰ͖ΔͷͰಛʹ༨ܭͳνϡʔχϯά͸͍Βͳ͍
 ͨͩ͠ɺίʔύε಺ͷස౓ͰॏΈ෇͚͢ΔͳͲ޻෉͸͍ͯ͠Δ vw ⇡ Avadditive w = A ✓ 1 |Cw | X c2Cw X w02c vw0 ◆ A A

Slide 7

Slide 7 text

ఏҊख๏ • ΞϧΰϦζϜ ֤୯ޠͷ෼ࢄදݱΛେݩίʔύεͷจ຺୯ޠͷ࿨Ͱදݱ

Slide 8

Slide 8 text

ఏҊख๏ • ΞϧΰϦζϜ ݩͷදݱͱઌड़ͷεςοϓͰ֫ಘ͞Εͨදݱ͕ۙ͘ͳΔΑ͏ʹ ઢܗม׵ͷύϥϝʔλΛֶश (࣮ࡍʹ͸ग़ݱස౓ͰॏΈ෇͚) A = argmin X w2V ↵(cw)||vw Auw ||2 2

Slide 9

Slide 9 text

ఏҊख๏ • ΞϧΰϦζϜ ޙ͔Βૉੑͷग़ݱจ຺Λ༩͑ͯɺදݱ֫ಘ

Slide 10

Slide 10 text

࣮ݧ • ୯ޠ෼ࢄදݱͷFew-shot Learning • ୯ޠؒྨࣅ౓λεΫ • ͦͷଞͷૉੑΛຒΊࠐΜͰ (Synset, n-gram) Ԡ༻λεΫʹద༻ • synsetͷ෼ࢄදݱΛ࢖ͬͨޠٛᐆດੑղফλεΫ • n-gramͷ෼ࢄදݱΛ࢖ͬͨจॻ෼ྨλεΫ

Slide 11

Slide 11 text

࣮ݧઃఆ : ୯ޠؒྨࣅ౓ • ୯ޠؒྨࣅ౓λεΫ (CRW dataset) • ୯ޠϖΞͷྨࣅ౓ࢉग़ (ϖΞͷยํ͕rare word) • Westbury Wikipedia Corpus (WWC) Ͱword2vecٴͼઢܗม׵ ύϥϝʔλͷֶश (ධՁର৅ͱͳΔrare wordΛؚΜͩจ͸༧Ί ࡟আࡁΈ) • ֤Rare wordʹରͯ͠਺ඦఔ౓ͷจ຺Λޙ͔Β༩͑ͯ෼ࢄදݱΛ ࡞੒ ※ Rare word͸WWC಺Ͱͷग़ݱස౓͕512 ʙ 10,000ճͷ୯ޠΛࢦ͢

Slide 12

Slide 12 text

࣮ݧ݁Ռ : ୯ޠؒྨࣅ౓ • ਓखͱͷ૬ؔ vs จ຺਺ ※ Rare wordͷग़ݱจ຺Λશ෦ίʔύεʹؚΜͰී௨ͷword2vecͰֶशͤͨ͞
 ৔߹, 0.45ͷ૬ؔ • all-but-the-top: 
 த৺Խ + ओ੒෼ͷআڈ • SIF weighted: 
 ग़ݱස౓ʹԠͨ͡ॏΈ෇͚ฏۉ

Slide 13

Slide 13 text

࣮ݧઃఆ : ޠٛᐆດੑղফ • ޠٛᐆດੑղফλεΫ (SemCor dataset) • ೖྗ : ର৅ͷ୯ޠͱจ຺, ग़ྗ : ର৅୯ޠͷsense • ୯ޠ෼ࢄදݱ͸wikipediaͰֶश͞Εͨgloveɺઢܗม׵ύϥ ϝʔλ͸ϒϥ΢ϯίʔύεͰֶश • ֤synsetͷ෼ࢄදݱ͸synsetͷఆٛจ(glosses) + ίʔύε಺Ͱ synset͕ग़ݱ͢Δจͷશ୯ޠͷ෼ࢄදݱͷ࿨ͱઢܗม׵Ͱ֫ಘ • ςετจ಺ͷ༧ଌର৅ͷ୯ޠʹରͯ͠ɺपล୯ޠ͔Β෼ࢄදݱ Λ֫ಘ͠ɺsynsetͷ෼ࢄදݱͱͷྨࣅ౓͔ΒҰ൪͍ۙsenseΛ༧ ଌ

Slide 14

Slide 14 text

࣮ݧ݁Ռ : ޠٛᐆດੑղফ • ޠٛᐆດੑղফλεΫ (SemCor dataset) • glosses : senseͷఆٛจ͚ͩΛ༩͑Δ • MFS : ༩͑ΒΕͨ୯ޠʹ͍ͭͯɺίʔύε಺ͰҰ൪ස౓͕ߴ͍senseΛग़ྗ͢Δ ͚ͩ(͚ͩͲڧ͍) • [Ragonato+. 2017] : RNNϕʔεͷख๏ • ྨࣅ౓ϕʔεͷγϯϓϧͳख๏͚ͩͲϕʔεϥΠϯΑΓ͸ྑ͍

Slide 15

Slide 15 text

࣮ݧઃఆ : จॻ෼ྨ • จॻ෼ྨλεΫ • n-gramΛຒΊࠐΉ • ୯ޠ෼ࢄදݱ͸Amazon Product CorpusͰGloveΛֶशޙ
 ಉίʔύεͰn-gramͷ෼ࢄදݱΛߏங • จ຺ͷ෼ࢄදݱͷߏங࣌ͷwindowαΠζ͸10 • ҎԼࣜͰn-gramͷ෼ࢄදݱ͔Βจॻͷ෼ࢄදݱΛ֫ಘޙ, 
 ϩδεςΟοΫճؼͰ༧ଌ
 
 
 ※ n͕େ͖͍ͱจ຺਺͕গͳ࣭͘తʹඍົͳͷͰnͰׂΔ vT D = L X t=1 vT wt . . . 1 n L n+1 X t=1 vT (wt,...,wL+n 1)

Slide 16

Slide 16 text

࣮ݧ݁Ռ : จॻ෼ྨ • จॻ෼ྨλεΫ • 
 • DLܥͱcomparableͳ݁Ռ • ۃੑ෼ྨ(SST, IMDB)͸݁ߏྑ͍(= ϩʔΧϧͳޠॱͰे෼Ͱ͸?)

Slide 17

Slide 17 text

·ͱΊ • ະ஌ޠͳͲ༷ʑͳςΩετૉੑΛ୯ޠ෼ࢄදݱͱग़ݱจ຺͔Β
 on the flyͰ৽ͨʹ෼ࢄදݱΛߏஙՄೳʹ͢Δख๏ • จ຺಺ͷ୯ޠ෼ࢄදݱͷ࿨ + ઢܗม׵ͰૉੑΛදݱ
 • ग़ݱจ຺ϕʔεͰͷຒΊࠐΈ • จ຺ϕʔεͷΘΓʹൺֱతܭࢉػʹ༏͍͠ • αϒϫʔυܥख๏ͱͷڞଘ • ஶऀ࣮૷ : https://github.com/NLPrinceton/ALaCarte

Slide 18

Slide 18 text

ࢀߟจݙ • Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2018. Linear algebraic structure of word senses, with applications to polysemy. TACL • Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to- beat baseline for sentence embeddings. In Proc. ICLR. • Angeliki Lazaridou, Marco Marelli, and Marco Baroni. 2017. Multimodal word meaning induction from minimal exposure to natural text. Cognitive Science. • Jiaqi Mu and Pramod Viswanath. 2018. All-but-thetop: Simple and effective post-processing for word representations. In Proc. ICLR. • Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017. Neural sequence learning models for word sense disambiguation. In Proc. EMNLP.

Slide 19

Slide 19 text

Appendix

Slide 20

Slide 20 text

୯ޠ෼ࢄදݱͷۭؒʹ͍ͭͯ [Mu+. 2017] • ෼ࢄදݱͷத৺఺͕݁ߏݪ఺͔Β཭Ε͍ͯΔ • ෼ࢄ͕ࢦ਺తʹԼ͕͍ͬͯ͘
 = ΑΓ௿࣍ݩͳ෦෼ۭؒʹ഑ஔ

Slide 21

Slide 21 text

࣮ݧઃఆ • n-gram embeddingʹ͓͚ΔఆੑධՁ • additiveͳͲ୯७ͳ࿨Ͱ͸ετοϓϫʔυͷ৘ใͰຒ΋ΕΔ