Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ACL2018読み会 - Adversarial Contrastive Estimation
Search
y_yammt
July 08, 2018
Research
3
3.6k
ACL2018読み会 - Adversarial Contrastive Estimation
y_yammt
July 08, 2018
Tweet
Share
More Decks by y_yammt
See All by y_yammt
EMNLP2018読み会 - Speed Reading: Learning to Read ForBackward via Shuttle
yyammt
1
1.7k
Other Decks in Research
See All in Research
Type Theory as a Formal Basis of Natural Language Semantics
daikimatsuoka
1
290
A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects
satai
4
230
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
770
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
satai
1
200
Vision and LanguageからのEmbodied AIとAI for Science
yushiku
PRO
1
530
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observation and Wikipedia
satai
3
130
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
670
SNLP2025:Can Language Models Reason about Individualistic Human Values and Preferences?
yukizenimoto
0
120
電力システム最適化入門
mickey_kubo
1
920
データxデジタルマップで拓く ミラノ発・地域共創最前線
mapconcierge4agu
0
210
[輪講] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
nk35jk
2
980
Adaptive Experimental Design for Efficient Average Treatment Effect Estimation and Treatment Choice
masakat0
0
110
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
525
40k
Automating Front-end Workflow
addyosmani
1370
200k
Become a Pro
speakerdeck
PRO
29
5.5k
Thoughts on Productivity
jonyablonski
70
4.8k
The Cult of Friendly URLs
andyhume
79
6.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
GraphQLとの向き合い方2022年版
quramy
49
14k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
A designer walks into a library…
pauljervisheath
207
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
Documentation Writing (for coders)
carmenintech
74
5k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Transcript
2018/07/08 Yuji Yamamoto (@y_yammt) ACL2018ಡΈձ (@LINE Corp)
ࠓճհ͢Δจ • https://arxiv.org/abs/1805.03642 • Authors contributed equally. • Borealis AIΠϯλʔϯ࣌ͷՌΒ͍͠
(͏Β·)ɻ 2
֓ཁ • ୯ޠຒΊࠐΈͳͲͷύϥϝʔλਪఆʹ༻͍ΒΕΔNoise Contrastive Estimation (NCE)ͷվྑɻ • ෛྫαϯϓϦϯάʹGenerative Adversarial Network
(GAN) ͷΈΛऔΓೖΕͨɻ • ࣮ݧʹΑͬͯNCEͱൺֱͯ͠ૣ͘ऩଋ͢Δ͜ͱ͕֬ೝɻ • Ԡ༻λεΫͰͷෳͷϝτϦοΫ͕վળ͢Δ͜ͱ֬ೝɻ 3
ൃදͷྲྀΕ 1. ಋೖ: Skip-gramϞσϧͱNoise Contrastive Estimation 2. ఏҊख๏: Adversarial Contrastive
Estimation 3. ࣮ݧ 4. ·ͱΊ 4
Skip-gramϞσϧͱ Noise Contrastive Estimation
Skip-gramϞσϧͬͯԿ͚ͩͬ? (1/2) • ୯ޠΛϕΫτϧʹରԠ͚ͮΔํ๏(୯ޠຒΊࠐΈ)ͷҰͭɻ • ͨ͠୯ޠΛݩʹपลʹ͋Δ୯ޠΛ͏·͘༧ଌͰ͖Α ͏ͳϕΫτϧΛੜ͢Δɻ 6 Words are
mapped to vectors wt wc pU,V (wc |wt ) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘
Skip-gramϞσϧͬͯԿ͚ͩͬ? (2/2) 7 mapped to wt wc pU,V (wc |wt
) = exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ͷ୯ޠΛͯʹ͍͘ u( ⋅ ), v( ⋅ ) ∈ ℝd U ∈ ℝA×d wt u(wt ) V ∈ ℝA′×d wc v(wc )
→ ࠷খʹͳΔΑ͏ʹ u, vΛ࠷దԽ Skip-gramϞσϧͷతؔ • ςΩετͷ͋Δॴʹ͋Δwt ͱwc ͷෛͷରΛऔΔ 8
l = − log pU,V (wc |wt ) = − log exp(u(wt )⊤v(wc )) ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ∂l ∂u(wt ) , ∂l ∂v(wc ) ภඍ ΛٻΊΕύϥϝʔλਪఆͰ͖Δ͕… ∂l ∂u(wt ) = − v(wc ) + p(wc′ |wt ) [v(wc′ )] O(A′) ޯΛٻΊΔͷʹ͔͔Δܭࢉ͕ɺ पลޠኮͷαΠζʹൺྫ͢Δ → ॏ͍ܭࢉʹͳΓ͑Δ V ∈ ℝA′×d wc v(wc )
ܭࢉΛݮΒ͢ • Noise Contrastive Estimation (NCE) ͳͲɻ • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive
Estimation (Negative Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘ͲͬͪͰ ͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠આ໌͕͋ Γ·͢ɻ 9
؆қ൛ Noise Contrastive Estimation • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ 10
S′ = { ¯ wc1 , ⋯, ¯ wck } lNS = − log (u(wt )⊤v(wc )) − ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ l = − log exp(u(wt )⊤v(wc )) + log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ (ͨͩ͠Ұ༷ͰऔΓग़͍ͯ͠ΔͱݶΒͳ͍)
Adversarial Contrastive Estimation (ACE)
؆қ൛NCEΛݟ͢ 12 lNS = − log (u(wt )⊤v(wc )) −
∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬ ϥϯμϜʹऔΓग़ͨ͠kݸͷจ຺୯ޠͷू߹ ରͱͳΔ୯ޠ(wt )Λݟͣʹ ෛྫΛ࡞ΔͷͰɺਖ਼ྫͱ༰қʹ ผՄೳͳෛྫʹͳͬͯ͠·͏ Մೳੑ͕͋Δ ˠ ͠ΊͷෛྫΛੜͰ͖ΔΑ͏ʹ͍ͨ͠ → Generative Adversarial NetworksͷΈΛೖΕΔ mapped to wt wc concentrate more ¯ wc1 ¯ wc2
NCEΛ͏গ͠Ұൠతʹॻ͖͢ • ࠷దԽ͍ͨ͠ύϥϝʔλΛ ω • ର x ͕༩͑ΒΕͨͱ͖ͷɺ • ग़ݱͨ݁͠Ռ(ਖ਼ྫ)Λ
y+ɺ • ϊΠζͱͳΔ݁Ռ(ෛྫ)Λ y- • ͱ͓͘ɻ ͜ͷͱ͖ͷଛࣦؔɺ 13 ← wt ← wc ← wc’ ← U, V L(ω; x) = p(y+|x)pnce (y−) lω (x, y+, y−) ← ࠷খԽ ෛྫ x ʹؔͳ͘ੜ ؔ
Adversarial Contrastive Estimation • ఏҊख๏ͷଛࣦؔ: 14 L(ω, θ; x) =
λp(y+|x)pnce (y−) lω (x, y+, y−) +(1 − λ)p(y+|x)gθ (y−|x) lω (x, y+, y−) ରΛݩʹෛྫΛੜ • ࠷దԽ (GAN-style minimax game): min ω max θ p+(x) L(ω, θ; x) ͍͠ෛྫग़ͯ͠Ζ͏ (Generator) ਖ਼ྫͱෛྫΛ͖ͪΜͱ ݟ͚ͯΖ͏ (Discriminator)
ACEͷࡉ͔͍ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽ • ϊΠζͱͯ͠ False Negative (ਖ਼ྫ) ΛҾ͖ൈ͍ͨͱ͖ ͷྫ֎ॲཧ
• ͳͲͳͲ 15
࣮ݧ
࣮ݧλεΫͷ֓ཁ 1. ୯ޠຒΊࠐΈ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊࠐΈʹΑ Δྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ • ࣍ϖʔδҎ߱Ͱ݁ՌΛࣔ͠·͢ɻ 2. ্Ґޠͷ༧ଌ
• ୯ޠϖΞ(word1, word2)͕༩͑ΒΕͨͱ͖ʹɺword1 is a word2 Ͱ͋Δ͔ Λ༧ଌ͢Δͷɻ • e.g. (New York, city) → True 3. ࣝάϥϑͷຒΊࠐΈ • ؔσʔλ (entity1, relation, entity2) Λֶशͯ͠ɺ͚͍ܽͯΔϦϯΫΛ༧ ଌ͢Δͷ (a.k.a. ϦϯΫ༧ଌ) • http://letra418.hatenablog.com/entry/2017/07/24/223257 17
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Spearman score) 18 • ӳޠ൛WikipediaΛ1ճ͚ͩ௨͠(single pass)Ͱֶशͨ͠ͷɻ • ୯ޠϖΞʹؔͯ͠ɺਓؒʹΑ͚ͬͯͨࣅͯΔ߹͍ͱ୯ޠຒΊ ࠐΈʹΑΔྨࣅʹ͍ͭͯͷॱং૬ؔΛٻΊͯධՁ͢Δͷɻ
• ADV: ෛྫੜ͕GeneratorͷΈ (λ=0)ɻACE: GeneratorͱNSɻ • Iterationͱ? (֤IterationͰղ͍ͯΔͱ?)
୯ޠຒΊࠐΈͷ࣮ݧ݁Ռ (Nearest neighbors) 19
ACEͷ੍ݶʹ͍ͭͯ • Generatorͷܭࢉ͕ॏ͍ɻ • ෛྫΛͭ͘ΔͷʹSoftmax͕ೖ͍ͬͯΔ͔Β(NCEͰۙࣅ͢ΔલͷࣜͱࣅͨΑ͏ͳ ܭࢉ͕ೖͬͪΌ͏)ɻ • ୯ޠຒΊࠐΈͷֶशޙଓλεΫͷͨΊͷࣄલܭࢉͳͷͰ͔͔࣌ؒͬͯਅͬ (justified)ͳͷͰͳ͍ͷ? (MLEͱൺͯऩଋ͕͍ͱ͔Ԡ༻λεΫͷϝτϦοΫ͕Α͘ͳͬͨͱ͔ݴ͑Δͱ
͍͍͔ͳ) • NCEͰຬͨ͢ੑ࣭͕ͲΕ͘Β͍ݴ͑Δͷ͔Α͘Θ͔Βͳ͍ɻ • NCEҰఆͷ݅ԼͰMLEͱྨࣅͨ͠ৼΔ͍Λ͢Δɻ https://qiita.com/Quasi-quant2010/items/a15b0d1b6428dc49c6c2 • ACEͰGANͷΈΛೖΕͨ͜ͱʹΑͬͯɺ͜Ε͕ݴ͑Δ͔Ͳ͏͔͕Α͘Θ͔ Βͳ͍ɻ 20
·ͱΊ
·ͱΊ • ؍ଌ͞ΕͨαϯϓϧͱِͷαϯϓϧΛରরͤ͞Δ͜ͱʹ Αֶͬͯश͢Δͱ͍ͬͨڭࢣ͋Γֶशʹ͍ͭͯͷվળɻ • Adversarial Contrastive Estimation (ACE) •
ࣝผϞσϧʹର͍ͯ͠͠ෛྫΛఏҊͰ͖ΔGANʹࣅ ͨઃఆͷੜωοτϫʔΫΛ༻͍ͨɻ • Generatorʹ͍ͭͯͷΤϯτϩϐʔਖ਼ଇԽFalse NegativeΛదʹॲཧ͢Δ͜ͱ͕͏·ֶ͘श͢Δͷʹ ॏཁͰ͋Δ͜ͱ͕Θ͔ͬͨɻ 22
ײ • ୯ޠຒΊࠐΈλεΫͰྨࣅͱͯ͠ଥͦ͏ͳϕΫτ ϧ͕ಘΒΕ͍ͯΔ → ਪનʹ͔ͭ͑ͦ͏? → ࣮RecSys 2018ͰࣅͨΑ͏ͳ༰͕ (΄΅ಉ࣌ظ)
Adversarial Training of Word2Vec for Basket Completion https://arxiv.org/abs/1805.08720 • ࣮ํ๏ʹ͍ͭͯෆ໌ͳͱ͜Ζ͕ଟ͍ɻ࣮ެ։ͯ͠ ΄͍͠ɻ 23
ิεϥΠυ
Skip-gramϞσϧͱ จͷࣜදهͷؔ࿈͚
Skip-gramϞσϧͷతؔ (1/2) • ςΩετதͰऔΓಘΔ୯ޠͷϖΞʹ͍ͭͯͷෛͷରΛͱΔɻ • ୯ޠͷϖΞ1ݸͷΈʹ͍ͭͯͷఆࣜԽׂ͕ͱΑ͘ݟ͔͚·͕͢ɺ จͷදهʹ߹ΘͤΔͨΊʹͯ͢ͷϖΞͰߟ͑Δ͜ͱʹ͠·͢ɻ 26 L =
− ∑ wt ∈A ∑ wc ∈A′ p(wt , wc )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) − ∑ wt ∈A ∑ wc ∈A′ freq(wt , wc )log pU,V (wc |wt ) → ࠷খʹͳΔΑ͏ʹU,V Λ࠷దԽ • ҰൠԽ͢Δͱɺ → ࠷খԽ p(wt , wc ) ∝ freq(wt , wc ) ͱஔ͘ͳΒ࠷খԽͷҙຯͰ྆ऀՁ
Skip-gramϞσϧͷతؔ (2/2) 27 L = − ∑ wt ∈A p(wt
) ∑ wc ∈A′ p(wc |wt )log pU,V (wc |wt ) = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) O(A′) ޠኮ͕ଟ͍ͱܭࢉ͕͔͔࣌ؒΔ ܭࢉΛݮΒ͢ Noise Contrastive Estimation Negative Sampling ͳͲ V ∈ ℝA′×d wc v(wc )
؆қ൛ Noise Contrastive Estimation (1/2) • MikolovͷจͰग़ͯ͘Δ؆қ൛Noise Contrastive Estimation (Negative
Sampling) Λհ͠·͢ɻ • ࠓճհ͢ΔจͰ؆қ൛Ͱ͋ͬͯͦ͏Ͱͳͯ͘Ͳͬͪ Ͱͳ͍(inconsequential)Ͱ͢ɻ • NCE, NSʹ͍ͭͯʮਂֶशʹΑΔࣗવݴޠॲཧʯʹৄ͍͠ આ໌͕͋Γ·͢ɻ 28
؆қ൛ Noise Contrastive Estimation (2/2) • 1ͭͷֶशࣄྫͱͳΔจ຺୯ޠ(wc )ͱϊΠζͱͳΔkݸͷ จ຺୯ޠ Λࣝผ͢ΔΑ͏ʹֶश͢Δɻ
29 L = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log exp(u(wt )⊤v(wc )) − log ∑ wc′∈A′ exp(u(wt )⊤v(wc′ )) S′ = { ¯ wc1 , ⋯, ¯ wck } LNS = − ∑ wt ∈A p(wt ) ∑ wc ∈A′ p(wc |wt ) log (u(wt )⊤v(wc )) + ∑ wc′∈S′ log(1 − (u(wt )⊤v(wc′ ))) తؔΛม͑ͨ ਖ਼ྫ͕ى͜Δ֬ ෛྫ(ϊΠζ)͕ ى͜Βͳ͍֬
NCEͷҰൠܗͱSkip-gramͷؔ࿈͚ • ઌʹࣔͨ͠Skip-gramͷఆࣜԽˢͷಛघܗʹͳΓ·͢ɻ 30 p+(x) [p(y+|x)pnce (y−) lω (x, y+,
y−)] p(wt ) [p(wc |wt )pnce (wc′) lU,V (wt , wc , wc′ )] lU,V (wt , wc , wc′ ) = − log (u(wt )⊤v(wc )) − k log(1 − (u(wt )⊤v(wc′ )))