Kosuke Miyoshi
September 02, 2020

# Representation Learning with Contrastive Predictive Coding

#### Kosuke Miyoshi

September 02, 2020

## Transcript

2. ### ໨࣍ w \$1\$ΛಡΜͩཧ༝ w ૬ޓ৘ใྔ w ີ౓ൺਪఆ w \$1\$ w

ີ౓ൺਪఆΛར༻ͨ͠૬ޓ৘ใྔͷ࠷େԽʹΑΔ දݱֶश
3. ### ࣗ෼ͷϞνϕʔγϣϯ w "OJNBM"*Ͱղ͚ͳ͔ͬͨ໰୊ΛͲ͏ͱ͔͘Λߟ͍͑ͨ w *OUFSOBMNPEFM4QBUJBMSFBTPOJOH w ໨Ӆ͠͞Εͯ΋ҠಈͰ͖Δ w ၆ᛌࢹ఺஍ਤ w

؀ڥͷμΠφϛΫε w Ϙʔϧ͕Ͳ͏ಈ͔͘༧ଌͰ͖Δ w \$BVTBMSFBTPOJOH

6. ### ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ\$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ \$͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ\$͸ྑ͍දݱ \$͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍

૬ޓ৘ใྔ
10. ### ૬ޓ৘ใྔ A = a0 A = a1 B = b0

B = b1 A = a0 A = a1 B = b0 B = b1 P(A, B) = P(A)P(B) ૬ޓ৘ใྔ I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B) = P(A|B)P(B)log P(B|A)P(A) P(A)P(B) = − P(B)log P(B) = H(B) = H(A) I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B)  ࠷খͷ࣌ ࠷େͷ࣌ ৚݅෇͖෼෍Q #c" ͕ %FUFSNJOJTUJDʹͳ͍ͬͯΔ "ͷ݁Ռ͕Θ͔Ε͹ #ͷ͜ͱ͕Θ͔Δ "ͷ݁Ռ͕Θ͔ͬͯ΋ #ͷ͜ͱ͕Կ΋Θ͔Βͳ͍   = 0
11. ### ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ\$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ \$͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ\$͸ྑ͍දݱ \$͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍
12. ### ∑ x ∑ c p(x, c)log p(x ∣ c) p(x)

I(x; c) = ∑ x ∑ c p(x, c)log p(x, c) p(x)p(c) ∑ x ∑ c p(x, c)log p(x|c)p(c) p(x)p(c) Y D ίϯςΩετ p(x) p(x|c) ͜ͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹ DͷදݱΛֶश͍ͨ͠ ૬ޓ৘ใྔʹ͸ີ౓ൺͷ ܗ͕ग़ͯ͘Δ

14. ### r(x) = pA (x) pB (x) {xA i }nA i=1

{xB j }nB j=1 αϯϓϧ ݸˠϥϕϧ"ׂΓ౰ͯ nA αϯϓϧ ݸˠϥϕϧ#ׂΓ౰ͯ nB લఏೋͭͷ֬཰ີ౓෼෍ͷαϯϓϧ͸ಘΒΕ͍ͯΔ͕֬཰෼෍͸Θ͔Βͳ͍ ໨తೋͭͷ෼෍ͷ֬཰ີ౓ͷൺ཰Λɺαϯϓϧ͔Βֶशͨ͠෼ྨثΛ༻͍ͯਪఆ͢Δ pA (x) pB (x) ൺ཰͕r(x)
15. ### pA (x) = p(x|y = A) pB (x) = p(x|y

= B) r(x) = pA (x) pB (x) = p(x|y = A) p(x|y = B) = p(y = A|x)p(x) p(y = A) p(y = B|x)p(x) p(y = B) = p(y = B) p(y = A) p(y = A|x) p(y = B|x) ̂ r(x) = nB nA ̂ p(y = A|x) ̂ p(y = B|x) = nB nA ̂ p(y = A|x) 1 − ̂ p(y = A|x) ີ౓ൺͷۙࣅ ෼ྨثΛֶश αϯϓϧ͕෼෍"͔Βͷ΋ͷ͔#͔Βͷ΋ͷ͔Λ෼ྨ͢Δ෼ྨثΛ ֶश͢Δͱͦͷग़ྗͷ֬཰͕ີ౓ൺʹ࢖͑Δ

17. ### ͱ ͷ૬ޓ৘ใྔΛٻΊΔͷͰ͸ͳ͘ɺ ΛΤϯίʔυͨ͠ ͱ ͷ૬ޓ৘ใྔͷ࠷େԽΛߟ͑Δ x c x z c

I(x, c) ≥ I(z, c)
18. ### fk (xt+k , ct) ∝ p (xt+k ∣ ct) p

(xt+k) fk (xt+k , ct) = exp (zT t+k Wk ct) ີ౓ൺ͸ඞͣਖ਼ͷ஋ Y D ίϯςΩετ p(x) p(x|c) ີ౓ൺΛ ͱ ͷؔ਺ ͰϞσϧԽ͢Δ z c f ͸༧ଌઌεςοϓ਺ FYd k
19. ### ℒN = − X log fk (xt+k , ct) ∑

xj ∈X fk (xj , ct) X = {x1 , ⋯xN } w /ݸͷத͔ΒQPTJUJWFTBNQMFΛબͿ\$BUFHPSJDBM෼෍ͷ 4PGUNBYDSPTTFOUSPQZMPTTͷܗ w -PTTͷ࠷దԽʹΑΓ w B  ͕ ʹൺྫ͢Δ஋ ʹͳΔ w C ૬ޓ৘ใྔ͕࠷େԽ͞ΕΔ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) ͔Βݸͷ1PTJUJWFTBNQMF ͔Β/ݸͷ/FHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ ࠷దԽ͢Δ-PTT fk (xt+k , ct) = exp (zT t+k Wk ct) QPTJUJWFTBNQMFͰͷf QPTJUJWFTBNQMF OFHBUJWFTBNQMFͷ ͷ߹ܭ f
20. ### p(x0 |ct )p(x1 ) p(x0 |ct )p(x1 ) + p(x1

|ct )p(x0 ) ྫ/ͷ৔߹ αϯϓϧ ͕͋Δ࣌ɺ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰͸ʁ QPTJUJWF͕ͲΕ͔ͷબ୒ࢶ͸ ͱ ͷೋ௨Γߟ͑ΒΕΔ x0 , x1 x0 x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x1 x0 p(x0 |ct ) p(x0 ) p(x0 |ct ) p(x0 ) + p(x1 |ct ) p(x1 ) Ͱ྆ลׂΔͱ p(x0 )p(x1 ) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k )
21. ### p(d = i|X, ct ) = p(xi |ct )∏ l≠i

p(xi ) ∑N j=1 {p(xj |ct )∏ l≠j p(xl )} લϖʔδͷઆ໌ΛҰൠԽ  αϯϓϧ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰ xi p(xi |ct ) p(xi ) QPTJUJWFTBNQMFݸ OFHBUJWFTBNQMF/ݸ = p(xi |ct )∏ l≠i p(xi ) ∏ l p(xl ) ∑N j=1 { p(xj |ct )∏ l≠j p(xl ) ∏ l p(xl ) } = p(xi |ct ) p(xi ) ∑N j=1 { p(xj |ct ) p(xj ) } ℒN = − X log fk (xt+k , ct) ∑ xj ∈X fk (xj , ct) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) -PTTͷ࠷దԽʹΑΓαϯϓϧ ͕ QPTJUJWFTBNQMFͰ͋Δ֬཰ʹऩଋ͢Δ xt+k
22. ### ℒopt N = − X log p(xt+k |ct ) p(xt+k

) p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) = X log p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) p(xt+k |ct ) p(xt+k ) = X log 1 + p(xt+k ) p(xt+k |ct ) ∑ xj ∈Xneg p(xj |ct ) p(xj ) ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1)xj p(xj |ct ) p(xj ) ] ͔Β/ݸαϯϓϦϯά͖ͯͯ͠ ΛٻΊͯ࿨ΛऔΔ p(x) p(x|c) p(x) αϯϓϦϯάΛظ଴஋ܭࢉʹஔ͖׵͑ ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1) ] xj p(xj |ct ) p(xj ) = ∫ p(xj ) p(xj |ct ) p(xj ) dxj = ∫ p(xj |ct )dxj = 1 = X log p(xt+k ) p(xt+k |ct ) N + { 1 − p(xt+k ) p(xt+k |ct ) } ≥ X log [ p(xt+k ) p(xt+k |ct ) N ] ظ଴஋ܭࢉऔΔͱΑΓখ͘͞ͳΔ = − I(xt+k , ct ) + log(N) Ͱظ଴஋ औ͍ͬͯΔ p(xt+k , ct ) I(xt+k , ct ) = ∫ p(xt+k , ct ) p(xt+k |ct ) p(xt ) X = {x1 , ⋯xN } ݸͷQPTJUJWFTBNQMFͱ/ݸͷOFHBUJWFTBNQMF C -PTTͷ࠷খԽ͕૬ޓ৘ใྔͷ࠷େԽʹͳΔઆ໌
23. ### ℒN = − X log fk (xt+k , ct) ∑

xj ∈X fk (xj , ct) X = {x1 , ⋯xN } /ݸͷαϯϓϧͷத͔ΒQPTJUJWFTBNQMFΛબͿ\$BUFHPSJDBM෼෍ͷ TPGUNBYDSPTTFOUSPQZMPTTͷ࠷దԽ ͔ΒݸͷQPTJUJWFTBNQMF ͔Β/ݸͷOFHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ MPTTΛԼ͛Δ ૬ޓ৘ใྔ͕ߴ͘ͳΔ ͕ີ౓ൺ ʹൺྫ͢Δ஋ ʹͳΔ ີ౓ൺ͕ਖ਼͘͠ٻ·Δ fk
24. ### FOD FOD zpred ztrue znoise W zT pred ztrue ˠେ͖͘

zT pred znoise ˠখ͘͞ c w ୯ʹ ͕ Λ.4&Ͱ༧ଌ͢Δ͚ͩͩͱ %FDPEFS͕ແ͍ͷͰ  ͕༧ଌ͠΍͍͢USJWJBMͳදݱʹFODPEF͞Εͯ͠·͏ શ෦ͱ͔  w ༧ଌର৅͕QPTJUJWFTBNQMFͱͳΔ࣮ࡍͷ֬཰෼෍ʹ͚ۙͮΔͨΊʹ େɺ খͱ͢ΔΑ͏ͳ੍໿͕ಇ͘ͷͰɺͦΕ͕๷͛Δ zpred ztrue z zT pred znoise zT pred znoise ௚ײతͳཧղ QPTJUJWF TBNQMF OFHBUJWF TBNQMFT z pred ztrue znoise

26. ### "VEJP Իૉ෼ྨ DMBTT ࿩ऀ෼ྨ DMBTT \$1\$ͰTFMGTVQFSWJTFEͰ ֶशͨ͠දݱD͔Βઢܗ෼ྨثΛֶश w 1\$.ԻݯΛNTFD୯ҐͰ\$POWͰFODPEF ࣍ݩ

 w (363//ͰDΛFODPEF ࣍ݩ  w \$1\$ͰTUFQઌͷ༧ଌ w / OFHBUJWFTBNQMFݸʁ

z Ct

ͷ෼ྨΛֶश z

34. ### ·ͱΊ w ෼ྨ໰୊Λར༻ͨ͠ີ౓ൺਪఆΛߦ͏͜ͱʹΑΔ ૬ޓ৘ใྔͷ࠷େԽ w ந৅తͳදݱΛ؍ଌϞσϧ %FDPEFS ͳ͠Ͱ֫ಘͰ͖Δ w ܭࢉίετ͕௿͍

w "VEJP 7JTJPO FUDͷ͍Ζ͍ΖͳυϝΠϯʹద༻Մೳ