Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ACL2018読み会 - Adversarial Contrastive Estimation
Search
y_yammt
July 08, 2018
Research
3
3.6k
ACL2018読み会 - Adversarial Contrastive Estimation
y_yammt
July 08, 2018
Tweet
Share
More Decks by y_yammt
See All by y_yammt
EMNLP2018読み会 - Speed Reading: Learning to Read ForBackward via Shuttle
yyammt
1
1.7k
Other Decks in Research
See All in Research
When Learned Data Structures Meet Computer Vision
matsui_528
1
1.3k
ロボット学習における大規模検索技術の展開と応用
denkiwakame
1
170
国際論文を出そう!ICRA / IROS / RA-L への論文投稿の心構えとノウハウ / RSJ2025 Luncheon Seminar
koide3
10
6.3k
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types
satai
3
270
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
490
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
210
snlp2025_prevent_llm_spikes
takase
0
420
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
satai
4
480
「どう育てるか」より「どう働きたいか」〜スクラムマスターの最初の一歩〜
hirakawa51
0
1k
論文紹介:Safety Alignment Should be Made More Than Just a Few Tokens Deep
kazutoshishinoda
0
140
競合や要望に流されない─B2B SaaSでミニマム要件を決めるリアルな取り組み / Don't be swayed by competitors or requests - A real effort to determine minimum requirements for B2B SaaS
kaminashi
0
120
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1k
Featured
See All Featured
Designing for Performance
lara
610
69k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
The Cult of Friendly URLs
andyhume
79
6.7k
How GitHub (no longer) Works
holman
316
140k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
Docker and Python
trallard
47
3.7k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Writing Fast Ruby
sferik
630
62k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.2k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.3k
Transcript
2018/07/08 Yuji Yamamoto (@y_yammt) ACL2018ಡΈձ (@LINE Corp)
ࠓճհ͢Δจ • https://arxiv.org/abs/1805.03642 • Authors contributed equally. • Borealis AIΠϯλʔϯ࣌ͷՌΒ͍͠
(͏Β·)ɻ 2
֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network
(GAN) ͷΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳͷϝτϦοΫ͕վળ͢Δ͜ͱ֬ೝɻ 3
ൃදͷྲྀΕ 1. ಋೖ: Skip-gramϞσϧͱNoise Contrastive Estimation 2. ఏҊख๏: Adversarial Contrastive
Estimation 3. ࣮ݧ 4. ·ͱΊ 4
Skip-gramϞσϧͱ Noise Contrastive Estimation
Skip-gramϞσϧͬͯԿ͚ͩͬ? (1/2) • ୯ޠΛϕΫτϧʹରԠ͚ͮΔํ๏(୯ޠຒΊࠐΈ)ͷҰͭɻ • ͨ͠୯ޠΛݩʹपลʹ͋Δ୯ޠΛ͏·͘༧ଌͰ͖Α ͏ͳϕΫτϧΛੜ͢Δɻ 6 Words are
mapped to vectors wt wc pU,V (wc |wt ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘
Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt
) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
→ ࠷খʹͳΔΑ͏ʹ u, vΛ࠷దԽ Skip-gramϞσϧͷతؔ • ςΩετͷ͋Δॴʹ͋Δwt ͱwc ͷෛͷରΛऔΔ 8
l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ ΛٻΊΕύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
ܭࢉΛݮΒ͢ • Noise Contrastive Estimation (NCE) ͳͲɻ • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive
Estimation (Negative Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘ͲͬͪͰ ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠આ໌͕͋ Γ·͢ɻ 9
؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10
S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ (ͨͩ͠Ұ༷ͰऔΓग़͍ͯ͠ΔͱݶΒͳ͍)
Adversarial Contrastive Estimation (ACE)
؆қ൛NCEΛݟ͢ 12 lNS = − log (u(wt )⊤v(wc )) −
∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ରͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ ผՄೳͳෛྫʹͳͬͯ͠·͏ Մೳੑ͕͋Δ ˠ ͠ΊͷෛྫΛੜͰ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial NetworksͷΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
NCEΛ͏গ͠Ұൠతʹॻ͖͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ
y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ x ʹؔͳ͘ੜ ؔ
Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ: 14 L(ω, θ; x) =
λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ରΛݩʹෛྫΛੜ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ͍͠ෛྫग़ͯ͠Ζ͏ (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ͚ͯΖ͏ (Discriminator)
ACEͷࡉ͔͍ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ • ϊΠζͱͯ͠ False Negative (ਖ਼ྫ) ΛҾ͖ൈ͍ͨͱ͖ ͷྫ֎ॲཧ
• ͳͲͳͲ 15
࣮ݧ
࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ
• ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δͷɻ • e.g. (New York, city) → True 3. ࣝάϥϑͷຒΊࠐΈ • ؔσʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Spearman score) 18 • ӳޠ൛WikipediaΛ1ճ͚ͩ௨͠(single pass)Ͱֶशͨ͠ͷɻ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊ ࠐΈʹΑΔྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ
• ADV: ෛྫੜ͕GeneratorͷΈ (λ=0)ɻACE: GeneratorͱNSɻ • Iterationͱ? (֤IterationͰղ͍ͯΔͱ?)
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Nearest neighbors) 19
ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶशޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯਅͬ (justified)ͳͷͰͳ͍ͷ? (MLEͱൺͯऩଋ͕͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ
͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCEҰఆͷ݅ԼͰMLEͱྨࣅͨ͠ৼΔ͍Λ͢Δɻ https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰGANͷΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
·ͱΊ
·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •
ࣝผϞσϧʹର͍ͯ͠͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽFalse NegativeΛదʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
ײ • ୯ޠຒΊࠐΈλεΫͰྨࣅͱͯ͠ଥͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ਪનʹ͔ͭ͑ͦ͏? → ࣮RecSys 2018ͰࣅͨΑ͏ͳ༰͕ (΄΅ಉ࣌ظ)
Adversarial Training of Word2Vec for Basket Completion https://arxiv.org/abs/1805.08720 • ࣮ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮ެ։ͯ͠ ΄͍͠ɻ 23
ิεϥΠυ
Skip-gramϞσϧͱ จͷࣜදهͷؔ࿈͚
Skip-gramϞσϧͷతؔ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷରΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ จͷදهʹ߹ΘͤΔͨΊʹͯ͢ͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =
− ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ྆ऀՁ
Skip-gramϞσϧͷతؔ (2/2) 27 L = − ∑ wt ∈A p(wt
) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
؆қ൛ Noise Contrastive Estimation (1/2) • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative
Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘Ͳͬͪ Ͱͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠ આ໌͕͋Γ·͢ɻ 28
؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ
29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬
NCEͷҰൠܗͱSkip-gramͷؔ࿈͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,
y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))