Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ACL2018読み会 - Adversarial Contrastive Estimation

Avatar for y_yammt y_yammt
July 08, 2018

ACL2018読み会 - Adversarial Contrastive Estimation

Avatar for y_yammt

y_yammt

July 08, 2018
Tweet

More Decks by y_yammt

Other Decks in Research

Transcript

  1. ֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network

    (GAN) ͷ࢓૊ΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳ਺ͷϝτϦοΫ͕վળ͢Δ͜ͱ΋֬ೝɻ 3
  2. Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt

    ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛ౰ͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
  3. → ࠷খʹͳΔΑ͏ʹ
 u, vΛ࠷దԽ Skip-gramϞσϧͷ໨తؔ਺ • ςΩετͷ͋Δ৔ॴʹ͋Δwt ͱwc ͷෛͷର਺໬౓ΛऔΔ 8

    l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ෼ ΛٻΊΕ͹ύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯ഑ΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
  4. ܭࢉΛݮΒ͢޻෉ • Noise Contrastive Estimation (NCE) ͳͲɻ • Mikolovͷ࿦จͰग़ͯ͘Δ؆қ൛Noise Contrastive

    Estimation (Negative Sampling) Λ঺հ͠·͢ɻ • ࠓճ঺հ͢Δ࿦จͰ͸؆қ൛Ͱ͋ͬͯ΋ͦ͏Ͱͳͯ͘΋ͲͬͪͰ΋໰ ୊ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯ͸ʮਂ૚ֶशʹΑΔࣗવݴޠॲཧʯʹ΋ৄ͍͠આ໌͕͋ Γ·͢ɻ 9
  5. ؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10

    S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ໨తؔ਺Λม͑ͨ ਖ਼ྫ͕ى͜Δ֬཰ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬཰ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹
 (ͨͩ͠Ұ༷෼෍ͰऔΓग़͍ͯ͠Δͱ͸ݶΒͳ͍)
  6. ؆қ൛NCEΛݟ௚͢ 12 lNS = − log (u(wt )⊤v(wc )) −

    ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬཰ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬཰ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ର৅ͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ
 ൑ผՄೳͳෛྫʹͳͬͯ͠·͏
 Մೳੑ͕͋Δ
 ˠ ೉͠ΊͷෛྫΛੜ੒Ͱ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial Networksͷ࢓૊ΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
  7. NCEΛ΋͏গ͠Ұൠతʹॻ͖௚͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର৅ x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ

    y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔ਺͸ɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ͸ x ʹؔ܎ͳ͘ੜ੒ ໬౓ؔ਺
  8. Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ਺: 14 L(ω, θ; x) =

    λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ର৅ΛݩʹෛྫΛੜ੒ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ೉͍͠ෛྫग़ͯ͠΍Ζ͏
 (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ෼͚ͯ΍Ζ͏
 (Discriminator)
  9. ࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑͬͯ෇͚ͨࣅͯΔ౓߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅ౓ʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δ΋ͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ

    • ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δ΋ͷɻ • e.g. (New York, city) → True 3. ஌ࣝάϥϑͷຒΊࠐΈ • ؔ܎σʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δ΋ͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
  10. ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶश͸ޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯ΋ਅͬ౰ (justified)ͳͷͰ͸ͳ͍ͷ?
 (MLEͱൺ΂ͯऩଋ͕଎͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ

    ͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCE͸Ұఆͷ৚݅ԼͰMLEͱྨࣅͨ͠ৼΔ෣͍Λ͢Δɻ
 https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰ͸GANͷ࢓૊ΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
  11. ·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •

    ࣝผϞσϧʹରͯ͠೉͍͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜ੒ωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ΍False NegativeΛద੾ʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
  12. ײ૝ • ୯ޠຒΊࠐΈλεΫͰྨࣅ౓ͱͯ͠ଥ౰ͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ঎඼ਪનʹ͔ͭ͑ͦ͏?
 → ࣮͸RecSys 2018ͰࣅͨΑ͏ͳ಺༰͕ (΄΅ಉ࣌ظ)


    Adversarial Training of Word2Vec for Basket Completion
 https://arxiv.org/abs/1805.08720 • ࣮૷ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮૷ެ։ͯ͠ ΄͍͠ɻ 23
  13. Skip-gramϞσϧͷ໨తؔ਺ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷର਺໬౓ΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ
 ࿦จͷදهʹ߹ΘͤΔͨΊʹ͢΂ͯͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =

    − ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ͸྆ऀ͸౳Ձ
  14. Skip-gramϞσϧͷ໨తؔ਺ (2/2) 27 L = − ∑ wt ∈A p(wt

    ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢޻෉ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
  15. ؆қ൛ Noise Contrastive Estimation (1/2) • Mikolovͷ࿦จͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative

    Sampling) Λ঺հ͠·͢ɻ • ࠓճ঺հ͢Δ࿦จͰ͸؆қ൛Ͱ͋ͬͯ΋ͦ͏Ͱͳͯ͘΋Ͳͬͪ Ͱ΋໰୊ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯ͸ʮਂ૚ֶशʹΑΔࣗવݴޠॲཧʯʹ΋ৄ͍͠ આ໌͕͋Γ·͢ɻ 28
  16. ؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ

    29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ໨తؔ਺Λม͑ͨ ਖ਼ྫ͕ى͜Δ֬཰ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬཰
  17. NCEͷҰൠܗͱSkip-gramͷؔ࿈෇͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽ΋ˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,

    y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))