Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ACL2018読み会 - Adversarial Contrastive Estimation
Search
y_yammt
July 08, 2018
Research
3
3.6k
ACL2018読み会 - Adversarial Contrastive Estimation
y_yammt
July 08, 2018
Tweet
Share
More Decks by y_yammt
See All by y_yammt
EMNLP2018読み会 - Speed Reading: Learning to Read ForBackward via Shuttle
yyammt
1
1.7k
Other Decks in Research
See All in Research
ACL読み会2025: Can Language Models Reason about Individualistic Human Values and Preferences?
yukizenimoto
0
120
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
420
OWASP KansaiDAY 2025.09_文系OSINTハンズオン
owaspkansai
0
110
Community Driveプロジェクト(CDPJ)の中間報告
smartfukushilab1
0
170
湯村研究室の紹介2025 / yumulab2025
yumulab
0
300
ペットのかわいい瞬間を撮影する オートシャッターAIアプリへの スマートラベリングの適用
mssmkmr
0
260
生成AI による論文執筆サポート・ワークショップ 論文執筆・推敲編 / Generative AI-Assisted Paper Writing Support Workshop: Drafting and Revision Edition
ks91
PRO
0
120
Proposal of an Information Delivery Method for Electronic Paper Signage Using Human Mobility as the Communication Medium / ICCE-Asia 2025
yumulab
0
170
Akamaiのキャッシュ効率を支えるAdaptSizeについての論文を読んでみた
bootjp
1
440
2025-11-21-DA-10th-satellite
yegusa
0
110
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
0
210
SREのためのテレメトリー技術の探究 / Telemetry for SRE
yuukit
13
3k
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
135
9.7k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
260
How STYLIGHT went responsive
nonsquared
100
6k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.2k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7k
Google's AI Overviews - The New Search
badams
0
910
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Transcript
2018/07/08 Yuji Yamamoto (@y_yammt) ACL2018ಡΈձ (@LINE Corp)
ࠓճհ͢Δจ • https://arxiv.org/abs/1805.03642 • Authors contributed equally. • Borealis AIΠϯλʔϯ࣌ͷՌΒ͍͠
(͏Β·)ɻ 2
֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network
(GAN) ͷΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳͷϝτϦοΫ͕վળ͢Δ͜ͱ֬ೝɻ 3
ൃදͷྲྀΕ 1. ಋೖ: Skip-gramϞσϧͱNoise Contrastive Estimation 2. ఏҊख๏: Adversarial Contrastive
Estimation 3. ࣮ݧ 4. ·ͱΊ 4
Skip-gramϞσϧͱ Noise Contrastive Estimation
Skip-gramϞσϧͬͯԿ͚ͩͬ? (1/2) • ୯ޠΛϕΫτϧʹରԠ͚ͮΔํ๏(୯ޠຒΊࠐΈ)ͷҰͭɻ • ͨ͠୯ޠΛݩʹपลʹ͋Δ୯ޠΛ͏·͘༧ଌͰ͖Α ͏ͳϕΫτϧΛੜ͢Δɻ 6 Words are
mapped to vectors wt wc pU,V (wc |wt ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘
Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt
) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
→ ࠷খʹͳΔΑ͏ʹ u, vΛ࠷దԽ Skip-gramϞσϧͷతؔ • ςΩετͷ͋Δॴʹ͋Δwt ͱwc ͷෛͷରΛऔΔ 8
l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ ΛٻΊΕύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
ܭࢉΛݮΒ͢ • Noise Contrastive Estimation (NCE) ͳͲɻ • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive
Estimation (Negative Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘ͲͬͪͰ ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠આ໌͕͋ Γ·͢ɻ 9
؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10
S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ (ͨͩ͠Ұ༷ͰऔΓग़͍ͯ͠ΔͱݶΒͳ͍)
Adversarial Contrastive Estimation (ACE)
؆қ൛NCEΛݟ͢ 12 lNS = − log (u(wt )⊤v(wc )) −
∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ରͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ ผՄೳͳෛྫʹͳͬͯ͠·͏ Մೳੑ͕͋Δ ˠ ͠ΊͷෛྫΛੜͰ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial NetworksͷΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
NCEΛ͏গ͠Ұൠతʹॻ͖͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ
y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ x ʹؔͳ͘ੜ ؔ
Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ: 14 L(ω, θ; x) =
λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ରΛݩʹෛྫΛੜ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ͍͠ෛྫग़ͯ͠Ζ͏ (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ͚ͯΖ͏ (Discriminator)
ACEͷࡉ͔͍ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ • ϊΠζͱͯ͠ False Negative (ਖ਼ྫ) ΛҾ͖ൈ͍ͨͱ͖ ͷྫ֎ॲཧ
• ͳͲͳͲ 15
࣮ݧ
࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ
• ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δͷɻ • e.g. (New York, city) → True 3. ࣝάϥϑͷຒΊࠐΈ • ؔσʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Spearman score) 18 • ӳޠ൛WikipediaΛ1ճ͚ͩ௨͠(single pass)Ͱֶशͨ͠ͷɻ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊ ࠐΈʹΑΔྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ
• ADV: ෛྫੜ͕GeneratorͷΈ (λ=0)ɻACE: GeneratorͱNSɻ • Iterationͱ? (֤IterationͰղ͍ͯΔͱ?)
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Nearest neighbors) 19
ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶशޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯਅͬ (justified)ͳͷͰͳ͍ͷ? (MLEͱൺͯऩଋ͕͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ
͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCEҰఆͷ݅ԼͰMLEͱྨࣅͨ͠ৼΔ͍Λ͢Δɻ https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰGANͷΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
·ͱΊ
·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •
ࣝผϞσϧʹର͍ͯ͠͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽFalse NegativeΛదʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
ײ • ୯ޠຒΊࠐΈλεΫͰྨࣅͱͯ͠ଥͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ਪનʹ͔ͭ͑ͦ͏? → ࣮RecSys 2018ͰࣅͨΑ͏ͳ༰͕ (΄΅ಉ࣌ظ)
Adversarial Training of Word2Vec for Basket Completion https://arxiv.org/abs/1805.08720 • ࣮ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮ެ։ͯ͠ ΄͍͠ɻ 23
ิεϥΠυ
Skip-gramϞσϧͱ จͷࣜදهͷؔ࿈͚
Skip-gramϞσϧͷతؔ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷରΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ จͷදهʹ߹ΘͤΔͨΊʹͯ͢ͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =
− ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ྆ऀՁ
Skip-gramϞσϧͷతؔ (2/2) 27 L = − ∑ wt ∈A p(wt
) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
؆қ൛ Noise Contrastive Estimation (1/2) • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative
Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘Ͳͬͪ Ͱͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠ આ໌͕͋Γ·͢ɻ 28
؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ
29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬
NCEͷҰൠܗͱSkip-gramͷؔ࿈͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,
y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))