Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ACL2018読み会 - Adversarial Contrastive Estimation
Search
y_yammt
July 08, 2018
Research
3
3.6k
ACL2018読み会 - Adversarial Contrastive Estimation
y_yammt
July 08, 2018
Tweet
Share
More Decks by y_yammt
See All by y_yammt
EMNLP2018読み会 - Speed Reading: Learning to Read ForBackward via Shuttle
yyammt
1
1.7k
Other Decks in Research
See All in Research
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
62
31k
20250725-bet-ai-day
cipepser
2
480
Nullspace MPC
mizuhoaoki
1
180
引力・斥力を制御可能なランダム部分集合の確率分布
wasyro
0
260
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
860
AIグラフィックデザインの進化:断片から統合(One Piece)へ / From Fragment to One Piece: A Survey on AI-Driven Graphic Design
shunk031
0
500
診断前の病歴テキストを対象としたLLMによるエンティティリンキング精度検証
hagino3000
1
150
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
3
120
ip71_contraflow_reconfiguration
stkmsd
0
110
カスタマーサクセスの視点からAWS Summitの展示を考える~製品開発で活用できる勘所~
masakiokuda
2
210
2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」
taiji_suzuki
25
19k
日本語新聞記事を用いた大規模言語モデルの暗記定量化 / LLMC2025
upura
0
240
Featured
See All Featured
Leading Effective Engineering Teams in the AI Era
addyosmani
1
250
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Designing Experiences People Love
moore
142
24k
Scaling GitHub
holman
463
140k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Fireside Chat
paigeccino
40
3.7k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
The Straight Up "How To Draw Better" Workshop
denniskardys
238
140k
Code Reviewing Like a Champion
maltzj
526
40k
How to Think Like a Performance Engineer
csswizardry
27
2k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.7k
Faster Mobile Websites
deanohume
310
31k
Transcript
2018/07/08 Yuji Yamamoto (@y_yammt) ACL2018ಡΈձ (@LINE Corp)
ࠓճհ͢Δจ • https://arxiv.org/abs/1805.03642 • Authors contributed equally. • Borealis AIΠϯλʔϯ࣌ͷՌΒ͍͠
(͏Β·)ɻ 2
֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network
(GAN) ͷΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳͷϝτϦοΫ͕վળ͢Δ͜ͱ֬ೝɻ 3
ൃදͷྲྀΕ 1. ಋೖ: Skip-gramϞσϧͱNoise Contrastive Estimation 2. ఏҊख๏: Adversarial Contrastive
Estimation 3. ࣮ݧ 4. ·ͱΊ 4
Skip-gramϞσϧͱ Noise Contrastive Estimation
Skip-gramϞσϧͬͯԿ͚ͩͬ? (1/2) • ୯ޠΛϕΫτϧʹରԠ͚ͮΔํ๏(୯ޠຒΊࠐΈ)ͷҰͭɻ • ͨ͠୯ޠΛݩʹपลʹ͋Δ୯ޠΛ͏·͘༧ଌͰ͖Α ͏ͳϕΫτϧΛੜ͢Δɻ 6 Words are
mapped to vectors wt wc pU,V (wc |wt ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘
Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt
) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
→ ࠷খʹͳΔΑ͏ʹ u, vΛ࠷దԽ Skip-gramϞσϧͷతؔ • ςΩετͷ͋Δॴʹ͋Δwt ͱwc ͷෛͷରΛऔΔ 8
l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ ΛٻΊΕύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
ܭࢉΛݮΒ͢ • Noise Contrastive Estimation (NCE) ͳͲɻ • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive
Estimation (Negative Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘ͲͬͪͰ ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠આ໌͕͋ Γ·͢ɻ 9
؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10
S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ (ͨͩ͠Ұ༷ͰऔΓग़͍ͯ͠ΔͱݶΒͳ͍)
Adversarial Contrastive Estimation (ACE)
؆қ൛NCEΛݟ͢ 12 lNS = − log (u(wt )⊤v(wc )) −
∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ରͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ ผՄೳͳෛྫʹͳͬͯ͠·͏ Մೳੑ͕͋Δ ˠ ͠ΊͷෛྫΛੜͰ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial NetworksͷΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
NCEΛ͏গ͠Ұൠతʹॻ͖͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ
y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ x ʹؔͳ͘ੜ ؔ
Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ: 14 L(ω, θ; x) =
λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ରΛݩʹෛྫΛੜ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ͍͠ෛྫग़ͯ͠Ζ͏ (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ͚ͯΖ͏ (Discriminator)
ACEͷࡉ͔͍ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ • ϊΠζͱͯ͠ False Negative (ਖ਼ྫ) ΛҾ͖ൈ͍ͨͱ͖ ͷྫ֎ॲཧ
• ͳͲͳͲ 15
࣮ݧ
࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ
• ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δͷɻ • e.g. (New York, city) → True 3. ࣝάϥϑͷຒΊࠐΈ • ؔσʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Spearman score) 18 • ӳޠ൛WikipediaΛ1ճ͚ͩ௨͠(single pass)Ͱֶशͨ͠ͷɻ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊ ࠐΈʹΑΔྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ
• ADV: ෛྫੜ͕GeneratorͷΈ (λ=0)ɻACE: GeneratorͱNSɻ • Iterationͱ? (֤IterationͰղ͍ͯΔͱ?)
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Nearest neighbors) 19
ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶशޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯਅͬ (justified)ͳͷͰͳ͍ͷ? (MLEͱൺͯऩଋ͕͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ
͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCEҰఆͷ݅ԼͰMLEͱྨࣅͨ͠ৼΔ͍Λ͢Δɻ https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰGANͷΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
·ͱΊ
·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •
ࣝผϞσϧʹର͍ͯ͠͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽFalse NegativeΛదʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
ײ • ୯ޠຒΊࠐΈλεΫͰྨࣅͱͯ͠ଥͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ਪનʹ͔ͭ͑ͦ͏? → ࣮RecSys 2018ͰࣅͨΑ͏ͳ༰͕ (΄΅ಉ࣌ظ)
Adversarial Training of Word2Vec for Basket Completion https://arxiv.org/abs/1805.08720 • ࣮ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮ެ։ͯ͠ ΄͍͠ɻ 23
ิεϥΠυ
Skip-gramϞσϧͱ จͷࣜදهͷؔ࿈͚
Skip-gramϞσϧͷతؔ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷରΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ จͷදهʹ߹ΘͤΔͨΊʹͯ͢ͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =
− ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ྆ऀՁ
Skip-gramϞσϧͷతؔ (2/2) 27 L = − ∑ wt ∈A p(wt
) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
؆қ൛ Noise Contrastive Estimation (1/2) • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative
Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘Ͳͬͪ Ͱͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠ આ໌͕͋Γ·͢ɻ 28
؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ
29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬
NCEͷҰൠܗͱSkip-gramͷؔ࿈͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,
y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))