Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ACL2018読み会 - Adversarial Contrastive Estimation
Search
y_yammt
July 08, 2018
Research
3
3.4k
ACL2018読み会 - Adversarial Contrastive Estimation
y_yammt
July 08, 2018
Tweet
Share
More Decks by y_yammt
See All by y_yammt
EMNLP2018読み会 - Speed Reading: Learning to Read ForBackward via Shuttle
yyammt
1
1.6k
Other Decks in Research
See All in Research
My Journey as a UX Researcher
aranciap
0
1.2k
Ground Metric Learning with applications in genomics
gpeyre
0
380
10-ot-generic-bio.pdf
gpeyre
0
140
Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
rudorudo11
0
160
LLMマルチエージェントを俯瞰する
masatoto
26
16k
継続的な研究費獲得のための考え方
moda0
2
460
Threat Intelligence and Beyond
rishikadesai_7
0
180
[輪講資料] Text Embeddings by Weakly-Supervised Contrastive Pre-training
hpprc
3
360
訓練データ作成のためのCloudCompareを利用した点群の手動ラベリング
kentaitakura
0
570
F0に基づいて伸縮された画像文字からの音声合成 [ASJ2024春]
nehi0615
0
130
FMP L3 Year 1 Project Proposal
haiinya
0
150
[Human-AI Decision Making勉強会] 説明の更新はユーザにどのような影響をもたらすか
okoso
1
220
Featured
See All Featured
From Idea to $5000 a Month in 5 Months
shpigford
378
45k
WebSockets: Embracing the real-time Web
robhawkes
59
7k
Rails Girls Zürich Keynote
gr2m
91
13k
Statistics for Hackers
jakevdp
790
220k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
126
32k
Building an army of robots
kneath
300
41k
Making Projects Easy
brettharned
109
5.5k
Raft: Consensus for Rubyists
vanstee
133
6.3k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
660
120k
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
BBQ
matthewcrist
80
8.8k
Fireside Chat
paigeccino
22
2.6k
Transcript
2018/07/08 Yuji Yamamoto (@y_yammt) ACL2018ಡΈձ (@LINE Corp)
ࠓճհ͢Δจ • https://arxiv.org/abs/1805.03642 • Authors contributed equally. • Borealis AIΠϯλʔϯ࣌ͷՌΒ͍͠
(͏Β·)ɻ 2
֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network
(GAN) ͷΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳͷϝτϦοΫ͕վળ͢Δ͜ͱ֬ೝɻ 3
ൃදͷྲྀΕ 1. ಋೖ: Skip-gramϞσϧͱNoise Contrastive Estimation 2. ఏҊख๏: Adversarial Contrastive
Estimation 3. ࣮ݧ 4. ·ͱΊ 4
Skip-gramϞσϧͱ Noise Contrastive Estimation
Skip-gramϞσϧͬͯԿ͚ͩͬ? (1/2) • ୯ޠΛϕΫτϧʹରԠ͚ͮΔํ๏(୯ޠຒΊࠐΈ)ͷҰͭɻ • ͨ͠୯ޠΛݩʹपลʹ͋Δ୯ޠΛ͏·͘༧ଌͰ͖Α ͏ͳϕΫτϧΛੜ͢Δɻ 6 Words are
mapped to vectors wt wc pU,V (wc |wt ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘
Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt
) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
→ ࠷খʹͳΔΑ͏ʹ u, vΛ࠷దԽ Skip-gramϞσϧͷతؔ • ςΩετͷ͋Δॴʹ͋Δwt ͱwc ͷෛͷରΛऔΔ 8
l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ ΛٻΊΕύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
ܭࢉΛݮΒ͢ • Noise Contrastive Estimation (NCE) ͳͲɻ • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive
Estimation (Negative Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘ͲͬͪͰ ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠આ໌͕͋ Γ·͢ɻ 9
؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10
S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ (ͨͩ͠Ұ༷ͰऔΓग़͍ͯ͠ΔͱݶΒͳ͍)
Adversarial Contrastive Estimation (ACE)
؆қ൛NCEΛݟ͢ 12 lNS = − log (u(wt )⊤v(wc )) −
∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ରͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ ผՄೳͳෛྫʹͳͬͯ͠·͏ Մೳੑ͕͋Δ ˠ ͠ΊͷෛྫΛੜͰ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial NetworksͷΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
NCEΛ͏গ͠Ұൠతʹॻ͖͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ
y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ x ʹؔͳ͘ੜ ؔ
Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ: 14 L(ω, θ; x) =
λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ରΛݩʹෛྫΛੜ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ͍͠ෛྫग़ͯ͠Ζ͏ (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ͚ͯΖ͏ (Discriminator)
ACEͷࡉ͔͍ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ • ϊΠζͱͯ͠ False Negative (ਖ਼ྫ) ΛҾ͖ൈ͍ͨͱ͖ ͷྫ֎ॲཧ
• ͳͲͳͲ 15
࣮ݧ
࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ
• ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δͷɻ • e.g. (New York, city) → True 3. ࣝάϥϑͷຒΊࠐΈ • ؔσʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Spearman score) 18 • ӳޠ൛WikipediaΛ1ճ͚ͩ௨͠(single pass)Ͱֶशͨ͠ͷɻ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊ ࠐΈʹΑΔྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ
• ADV: ෛྫੜ͕GeneratorͷΈ (λ=0)ɻACE: GeneratorͱNSɻ • Iterationͱ? (֤IterationͰղ͍ͯΔͱ?)
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Nearest neighbors) 19
ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶशޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯਅͬ (justified)ͳͷͰͳ͍ͷ? (MLEͱൺͯऩଋ͕͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ
͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCEҰఆͷ݅ԼͰMLEͱྨࣅͨ͠ৼΔ͍Λ͢Δɻ https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰGANͷΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
·ͱΊ
·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •
ࣝผϞσϧʹର͍ͯ͠͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽFalse NegativeΛదʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
ײ • ୯ޠຒΊࠐΈλεΫͰྨࣅͱͯ͠ଥͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ਪનʹ͔ͭ͑ͦ͏? → ࣮RecSys 2018ͰࣅͨΑ͏ͳ༰͕ (΄΅ಉ࣌ظ)
Adversarial Training of Word2Vec for Basket Completion https://arxiv.org/abs/1805.08720 • ࣮ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮ެ։ͯ͠ ΄͍͠ɻ 23
ิεϥΠυ
Skip-gramϞσϧͱ จͷࣜදهͷؔ࿈͚
Skip-gramϞσϧͷతؔ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷରΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ จͷදهʹ߹ΘͤΔͨΊʹͯ͢ͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =
− ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ྆ऀՁ
Skip-gramϞσϧͷతؔ (2/2) 27 L = − ∑ wt ∈A p(wt
) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
؆қ൛ Noise Contrastive Estimation (1/2) • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative
Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘Ͳͬͪ Ͱͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠ આ໌͕͋Γ·͢ɻ 28
؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ
29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬
NCEͷҰൠܗͱSkip-gramͷؔ࿈͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,
y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))