Upgrade to Pro — share decks privately, control downloads, hide ads and more …

alacarte-snlp2018

32fc4d2e8305297eb991a8d3f7dd2103?s=47 Hitoshi Manabe
August 04, 2018
380

 alacarte-snlp2018

32fc4d2e8305297eb991a8d3f7dd2103?s=128

Hitoshi Manabe

August 04, 2018
Tweet

Transcript

  1. — ୈ10ճ࠷ઌ୺NLPษڧձ — A La Carte Embedding: Cheap but Effective

    Induction of Semantic Feature Vectors Hitoshi Manabe Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, and Sanjeev Arora. In ACL 2018. ※ εϥΠυURL : https://speakerdeck.com/manaysh/alacarte-snlp2018-1
  2. ֓ཁ • ΍Γ͍ͨ͜ͱ : ςΩετʹؔ͢Δ༷ʑͳૉੑΛຒΊࠐΈ͍ͨ • ௿ස౓ɾະ஌ޠ, n-gram, synsetͳͲͳͲ •

    ఏҊ : จ຺Λ׆༻͠ɺૉੑͷ෼ࢄදݱͷߏங • Ϟσϧࣗମͷֶश͸ࣄલֶशࡁΈ୯ޠ෼ࢄදݱͱ
 େن໛ίʔύεͷΈͰجຊతʹ͸࣮ߦՄೳ • ຒΊࠐΈ͍ͨૉੑͷจ຺Λ༩͑ɺ෼ࢄදݱΛon the flyͰ֫ಘ • ݁Ռ : ୯ޠ෼ࢄදݱͷFew-shotֶश, จॻ෼ྨλεΫͳͲͷԠ༻ • ਺ࣄྫͷจ຺Λ༩͑Δ͚ͩͰ΋ྑ͍ײ͡ͷදݱ͕ಘΒΕΔ • ͦͷଞԠ༻λεΫͰͷ༗ޮੑΛ֬ೝ
  3. ໰୊ઃఆ ࣄલʹඞཁͳ΋ͷ • ޠኮ Λ΋ͭେن໛ͳςΩετίʔύε • ֤ର৅ͷ୯ޠ ʹର͢Δग़ݱจ຺ ͷू߹ͱͯ͠දݱ •

    (ͦΕΛ࢖ͬͯ)ࣄલֶशࡁΈͷ୯ޠ෼ࢄදݱ 
 • Կ͔͠Βͷର৅ͷૉੑ(৽ޠɾsynsetͳͲ) ͱͦͷग़ݱจ຺
 Λ༩͑ͯɺର৅ૉੑͷ෼ࢄදݱ Λon the flyͰ֫ಘ CV vw f Cf V w Cw vf vBarackObama Cf
  4. ໰୊ઃఆ ࣄલʹඞཁͳ΋ͷ • ޠኮ Λ΋ͭେن໛ͳςΩετίʔύε • ֤ର৅ͷ୯ޠ ʹର͢Δग़ݱจ຺ ͷू߹ͱͯ͠දݱ •

    (ͦΕΛ࢖ͬͯ)ࣄલֶशࡁΈͷ୯ޠ෼ࢄදݱ 
 • Կ͔͠Βͷର৅ͷૉੑ(৽ޠɾsynsetͳͲ) ͱͦͷग़ݱจ຺
 Λ༩͑ͯɺର৅ૉੑͷ෼ࢄදݱ Λon the flyͰ֫ಘ CV vw f Cf V w Cw vf beef up the army we must beef up our organization … v(beef,up) Cf
  5. ઌߦݚڀ Additive [Lazaridou+. 2017] • featureͷจ຺ͱؚͯ͠·ΕΔ୯ޠ෼ࢄදݱͷฏۉͰදݱ • ͨͩͷ࿨Ͱྑ͍ͷ͔ʁ • ୯ޠ෼ࢄදݱʹ͸ڞ௨ͷํ޲੒෼Λ࣋ͭɻͦͷͨΊଟ͘ͷදݱ

    Λ଍͢͜ͱͰ༗༻ͳ৘ใ͕ڞ௨੒෼ͷ৘ใͰຒ΋ΕΔ • ετοϓϫʔυ΍common word͸ϊΠζʹͳΔ • ղܾࡦͱͯ͠ த৺Խ+ओ੒෼࡟আ[Mu+. 2018] ΍ग़ݱස౓ͷٯ਺ ͰॏΈ෇͚[Arora+. 2017] 
 ͨͩ͠ɺे෼ʹνϡʔχϯά͕ඞཁ(…?) vadditive f = 1 |Cf | X c2Cf 1 |c| X w2c vw
  6. ఏҊख๏ ఏҊख๏: a la carte • ݩͷ୯ޠ෼ࢄදݱ͕Additive + ઢܗม׵ͰۙࣅͰ͖ΔΑ͏ֶश
 


    
 (ͨͩ͠ ͸ֶश͞ΕΔࣸ૾ύϥϝʔλ) • ಋग़͞ΕΔ ͸͋Γ͖ͨΓͳํ޲੒෼ (common wordͷํ޲౳)
 Λshrinkͤ͞Δಇ͖͕͋Δ[Arora+. 2018] • ดܗࣜͰ࠷దԽͰ͖ΔͷͰಛʹ༨ܭͳνϡʔχϯά͸͍Βͳ͍
 ͨͩ͠ɺίʔύε಺ͷස౓ͰॏΈ෇͚͢ΔͳͲ޻෉͸͍ͯ͠Δ vw ⇡ Avadditive w = A ✓ 1 |Cw | X c2Cw X w02c vw0 ◆ A A
  7. ఏҊख๏ • ΞϧΰϦζϜ ֤୯ޠͷ෼ࢄදݱΛେݩίʔύεͷจ຺୯ޠͷ࿨Ͱදݱ

  8. ఏҊख๏ • ΞϧΰϦζϜ ݩͷදݱͱઌड़ͷεςοϓͰ֫ಘ͞Εͨදݱ͕ۙ͘ͳΔΑ͏ʹ ઢܗม׵ͷύϥϝʔλΛֶश (࣮ࡍʹ͸ग़ݱස౓ͰॏΈ෇͚) A = argmin X

    w2V ↵(cw)||vw Auw ||2 2
  9. ఏҊख๏ • ΞϧΰϦζϜ ޙ͔Βૉੑͷग़ݱจ຺Λ༩͑ͯɺදݱ֫ಘ

  10. ࣮ݧ • ୯ޠ෼ࢄදݱͷFew-shot Learning • ୯ޠؒྨࣅ౓λεΫ • ͦͷଞͷૉੑΛຒΊࠐΜͰ (Synset, n-gram)

    Ԡ༻λεΫʹద༻ • synsetͷ෼ࢄදݱΛ࢖ͬͨޠٛᐆດੑղফλεΫ • n-gramͷ෼ࢄදݱΛ࢖ͬͨจॻ෼ྨλεΫ
  11. ࣮ݧઃఆ : ୯ޠؒྨࣅ౓ • ୯ޠؒྨࣅ౓λεΫ (CRW dataset) • ୯ޠϖΞͷྨࣅ౓ࢉग़ (ϖΞͷยํ͕rare

    word) • Westbury Wikipedia Corpus (WWC) Ͱword2vecٴͼઢܗม׵ ύϥϝʔλͷֶश (ධՁର৅ͱͳΔrare wordΛؚΜͩจ͸༧Ί ࡟আࡁΈ) • ֤Rare wordʹରͯ͠਺ඦఔ౓ͷจ຺Λޙ͔Β༩͑ͯ෼ࢄදݱΛ ࡞੒ ※ Rare word͸WWC಺Ͱͷग़ݱස౓͕512 ʙ 10,000ճͷ୯ޠΛࢦ͢
  12. ࣮ݧ݁Ռ : ୯ޠؒྨࣅ౓ • ਓखͱͷ૬ؔ vs จ຺਺ ※ Rare wordͷग़ݱจ຺Λશ෦ίʔύεʹؚΜͰී௨ͷword2vecͰֶशͤͨ͞


    ৔߹, 0.45ͷ૬ؔ • all-but-the-top: 
 த৺Խ + ओ੒෼ͷআڈ • SIF weighted: 
 ग़ݱස౓ʹԠͨ͡ॏΈ෇͚ฏۉ
  13. ࣮ݧઃఆ : ޠٛᐆດੑղফ • ޠٛᐆດੑղফλεΫ (SemCor dataset) • ೖྗ :

    ର৅ͷ୯ޠͱจ຺, ग़ྗ : ର৅୯ޠͷsense • ୯ޠ෼ࢄදݱ͸wikipediaͰֶश͞Εͨgloveɺઢܗม׵ύϥ ϝʔλ͸ϒϥ΢ϯίʔύεͰֶश • ֤synsetͷ෼ࢄදݱ͸synsetͷఆٛจ(glosses) + ίʔύε಺Ͱ synset͕ग़ݱ͢Δจͷશ୯ޠͷ෼ࢄදݱͷ࿨ͱઢܗม׵Ͱ֫ಘ • ςετจ಺ͷ༧ଌର৅ͷ୯ޠʹରͯ͠ɺपล୯ޠ͔Β෼ࢄදݱ Λ֫ಘ͠ɺsynsetͷ෼ࢄදݱͱͷྨࣅ౓͔ΒҰ൪͍ۙsenseΛ༧ ଌ
  14. ࣮ݧ݁Ռ : ޠٛᐆດੑղফ • ޠٛᐆດੑղফλεΫ (SemCor dataset) • glosses :

    senseͷఆٛจ͚ͩΛ༩͑Δ • MFS : ༩͑ΒΕͨ୯ޠʹ͍ͭͯɺίʔύε಺ͰҰ൪ස౓͕ߴ͍senseΛग़ྗ͢Δ ͚ͩ(͚ͩͲڧ͍) • [Ragonato+. 2017] : RNNϕʔεͷख๏ • ྨࣅ౓ϕʔεͷγϯϓϧͳख๏͚ͩͲϕʔεϥΠϯΑΓ͸ྑ͍
  15. ࣮ݧઃఆ : จॻ෼ྨ • จॻ෼ྨλεΫ • n-gramΛຒΊࠐΉ • ୯ޠ෼ࢄදݱ͸Amazon Product

    CorpusͰGloveΛֶशޙ
 ಉίʔύεͰn-gramͷ෼ࢄදݱΛߏங • จ຺ͷ෼ࢄදݱͷߏங࣌ͷwindowαΠζ͸10 • ҎԼࣜͰn-gramͷ෼ࢄදݱ͔Βจॻͷ෼ࢄදݱΛ֫ಘޙ, 
 ϩδεςΟοΫճؼͰ༧ଌ
 
 
 ※ n͕େ͖͍ͱจ຺਺͕গͳ࣭͘తʹඍົͳͷͰnͰׂΔ vT D = L X t=1 vT wt . . . 1 n L n+1 X t=1 vT (wt,...,wL+n 1)
  16. ࣮ݧ݁Ռ : จॻ෼ྨ • จॻ෼ྨλεΫ • 
 • DLܥͱcomparableͳ݁Ռ •

    ۃੑ෼ྨ(SST, IMDB)͸݁ߏྑ͍(= ϩʔΧϧͳޠॱͰे෼Ͱ͸?)
  17. ·ͱΊ • ະ஌ޠͳͲ༷ʑͳςΩετૉੑΛ୯ޠ෼ࢄදݱͱग़ݱจ຺͔Β
 on the flyͰ৽ͨʹ෼ࢄදݱΛߏஙՄೳʹ͢Δख๏ • จ຺಺ͷ୯ޠ෼ࢄදݱͷ࿨ + ઢܗม׵ͰૉੑΛදݱ


    • ग़ݱจ຺ϕʔεͰͷຒΊࠐΈ • จ຺ϕʔεͷΘΓʹൺֱతܭࢉػʹ༏͍͠ • αϒϫʔυܥख๏ͱͷڞଘ • ஶऀ࣮૷ : https://github.com/NLPrinceton/ALaCarte
  18. ࢀߟจݙ • Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma,

    and Andrej Risteski. 2018. Linear algebraic structure of word senses, with applications to polysemy. TACL • Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to- beat baseline for sentence embeddings. In Proc. ICLR. • Angeliki Lazaridou, Marco Marelli, and Marco Baroni. 2017. Multimodal word meaning induction from minimal exposure to natural text. Cognitive Science. • Jiaqi Mu and Pramod Viswanath. 2018. All-but-thetop: Simple and effective post-processing for word representations. In Proc. ICLR. • Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017. Neural sequence learning models for word sense disambiguation. In Proc. EMNLP.
  19. Appendix

  20. ୯ޠ෼ࢄදݱͷۭؒʹ͍ͭͯ [Mu+. 2017] • ෼ࢄදݱͷத৺఺͕݁ߏݪ఺͔Β཭Ε͍ͯΔ • ෼ࢄ͕ࢦ਺తʹԼ͕͍ͬͯ͘
 = ΑΓ௿࣍ݩͳ෦෼ۭؒʹ഑ஔ

  21. ࣮ݧઃఆ • n-gram embeddingʹ͓͚ΔఆੑධՁ • additiveͳͲ୯७ͳ࿨Ͱ͸ετοϓϫʔυͷ৘ใͰຒ΋ΕΔ