Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Representation Learning with Contrastive Predic...

Kosuke Miyoshi
September 02, 2020

Representation Learning with Contrastive Predictive Coding

Kosuke Miyoshi

September 02, 2020
Tweet

More Decks by Kosuke Miyoshi

Other Decks in Research

Transcript

  1. ໨࣍ w $1$ΛಡΜͩཧ༝ w ૬ޓ৘ใྔ w ີ౓ൺਪఆ w $1$ w

    ີ౓ൺਪఆΛར༻ͨ͠૬ޓ৘ใྔͷ࠷େԽʹΑΔ දݱֶश
  2. ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

    c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ $͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ$͸ྑ͍දݱ $͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍
  3. ૬ޓ৘ใྔ A = a0 A = a1 B = b0

    B = b1 A = a0 A = a1 B = b0 B = b1 P(A, B) = P(A)P(B) ૬ޓ৘ใྔ I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B) = P(A|B)P(B)log P(B|A)P(A) P(A)P(B) = − P(B)log P(B) = H(B) = H(A) I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B)  ࠷খͷ࣌ ࠷େͷ࣌ ৚݅෇͖෼෍Q #c" ͕ %FUFSNJOJTUJDʹͳ͍ͬͯΔ "ͷ݁Ռ͕Θ͔Ε͹ #ͷ͜ͱ͕Θ͔Δ "ͷ݁Ռ͕Θ͔ͬͯ΋ #ͷ͜ͱ͕Կ΋Θ͔Βͳ͍   = 0
  4. ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

    c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ $͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ$͸ྑ͍දݱ $͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍
  5. ∑ x ∑ c p(x, c)log p(x ∣ c) p(x)

    I(x; c) = ∑ x ∑ c p(x, c)log p(x, c) p(x)p(c) ∑ x ∑ c p(x, c)log p(x|c)p(c) p(x)p(c) Y D ίϯςΩετ p(x) p(x|c) ͜ͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹ DͷදݱΛֶश͍ͨ͠ ૬ޓ৘ใྔʹ͸ີ౓ൺͷ ܗ͕ग़ͯ͘Δ
  6. r(x) = pA (x) pB (x) {xA i }nA i=1

    {xB j }nB j=1 αϯϓϧ ݸˠϥϕϧ"ׂΓ౰ͯ nA αϯϓϧ ݸˠϥϕϧ#ׂΓ౰ͯ nB લఏೋͭͷ֬཰ີ౓෼෍ͷαϯϓϧ͸ಘΒΕ͍ͯΔ͕֬཰෼෍͸Θ͔Βͳ͍ ໨తೋͭͷ෼෍ͷ֬཰ີ౓ͷൺ཰Λɺαϯϓϧ͔Βֶशͨ͠෼ྨثΛ༻͍ͯਪఆ͢Δ pA (x) pB (x) ൺ཰͕r(x)
  7. pA (x) = p(x|y = A) pB (x) = p(x|y

    = B) r(x) = pA (x) pB (x) = p(x|y = A) p(x|y = B) = p(y = A|x)p(x) p(y = A) p(y = B|x)p(x) p(y = B) = p(y = B) p(y = A) p(y = A|x) p(y = B|x) ̂ r(x) = nB nA ̂ p(y = A|x) ̂ p(y = B|x) = nB nA ̂ p(y = A|x) 1 − ̂ p(y = A|x) ີ౓ൺͷۙࣅ ෼ྨثΛֶश αϯϓϧ͕෼෍"͔Βͷ΋ͷ͔#͔Βͷ΋ͷ͔Λ෼ྨ͢Δ෼ྨثΛ ֶश͢Δͱͦͷग़ྗͷ֬཰͕ີ౓ൺʹ࢖͑Δ
  8. $1$

  9. fk (xt+k , ct) ∝ p (xt+k ∣ ct) p

    (xt+k) fk (xt+k , ct) = exp (zT t+k Wk ct) ີ౓ൺ͸ඞͣਖ਼ͷ஋ Y D ίϯςΩετ p(x) p(x|c) ີ౓ൺΛ ͱ ͷؔ਺ ͰϞσϧԽ͢Δ z c f ͸༧ଌઌεςοϓ਺ FYd k
  10. ℒN = − X log fk (xt+k , ct) ∑

    xj ∈X fk (xj , ct) X = {x1 , ⋯xN } w /ݸͷத͔ΒQPTJUJWFTBNQMFΛબͿ$BUFHPSJDBM෼෍ͷ 4PGUNBYDSPTTFOUSPQZMPTTͷܗ w -PTTͷ࠷దԽʹΑΓ w B  ͕ ʹൺྫ͢Δ஋ ʹͳΔ w C ૬ޓ৘ใྔ͕࠷େԽ͞ΕΔ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) ͔Βݸͷ1PTJUJWFTBNQMF ͔Β/ݸͷ/FHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ ࠷దԽ͢Δ-PTT fk (xt+k , ct) = exp (zT t+k Wk ct) QPTJUJWFTBNQMFͰͷf QPTJUJWFTBNQMF OFHBUJWFTBNQMFͷ ͷ߹ܭ f
  11. p(x0 |ct )p(x1 ) p(x0 |ct )p(x1 ) + p(x1

    |ct )p(x0 ) ྫ/ͷ৔߹ αϯϓϧ ͕͋Δ࣌ɺ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰͸ʁ QPTJUJWF͕ͲΕ͔ͷબ୒ࢶ͸ ͱ ͷೋ௨Γߟ͑ΒΕΔ x0 , x1 x0 x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x1 x0 p(x0 |ct ) p(x0 ) p(x0 |ct ) p(x0 ) + p(x1 |ct ) p(x1 ) Ͱ྆ลׂΔͱ p(x0 )p(x1 ) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k )
  12. p(d = i|X, ct ) = p(xi |ct )∏ l≠i

    p(xi ) ∑N j=1 {p(xj |ct )∏ l≠j p(xl )} લϖʔδͷઆ໌ΛҰൠԽ  αϯϓϧ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰ xi p(xi |ct ) p(xi ) QPTJUJWFTBNQMFݸ OFHBUJWFTBNQMF/ݸ = p(xi |ct )∏ l≠i p(xi ) ∏ l p(xl ) ∑N j=1 { p(xj |ct )∏ l≠j p(xl ) ∏ l p(xl ) } = p(xi |ct ) p(xi ) ∑N j=1 { p(xj |ct ) p(xj ) } ℒN = − X log fk (xt+k , ct) ∑ xj ∈X fk (xj , ct) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) -PTTͷ࠷దԽʹΑΓαϯϓϧ ͕ QPTJUJWFTBNQMFͰ͋Δ֬཰ʹऩଋ͢Δ xt+k
  13. ℒopt N = − X log p(xt+k |ct ) p(xt+k

    ) p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) = X log p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) p(xt+k |ct ) p(xt+k ) = X log 1 + p(xt+k ) p(xt+k |ct ) ∑ xj ∈Xneg p(xj |ct ) p(xj ) ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1)xj p(xj |ct ) p(xj ) ] ͔Β/ݸαϯϓϦϯά͖ͯͯ͠ ΛٻΊͯ࿨ΛऔΔ p(x) p(x|c) p(x) αϯϓϦϯάΛظ଴஋ܭࢉʹஔ͖׵͑ ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1) ] xj p(xj |ct ) p(xj ) = ∫ p(xj ) p(xj |ct ) p(xj ) dxj = ∫ p(xj |ct )dxj = 1 = X log p(xt+k ) p(xt+k |ct ) N + { 1 − p(xt+k ) p(xt+k |ct ) } ≥ X log [ p(xt+k ) p(xt+k |ct ) N ] ظ଴஋ܭࢉऔΔͱΑΓখ͘͞ͳΔ = − I(xt+k , ct ) + log(N) Ͱظ଴஋ औ͍ͬͯΔ p(xt+k , ct ) I(xt+k , ct ) = ∫ p(xt+k , ct ) p(xt+k |ct ) p(xt ) X = {x1 , ⋯xN } ݸͷQPTJUJWFTBNQMFͱ/ݸͷOFHBUJWFTBNQMF C -PTTͷ࠷খԽ͕૬ޓ৘ใྔͷ࠷େԽʹͳΔઆ໌
  14. ℒN = − X log fk (xt+k , ct) ∑

    xj ∈X fk (xj , ct) X = {x1 , ⋯xN } /ݸͷαϯϓϧͷத͔ΒQPTJUJWFTBNQMFΛબͿ$BUFHPSJDBM෼෍ͷ TPGUNBYDSPTTFOUSPQZMPTTͷ࠷దԽ ͔ΒݸͷQPTJUJWFTBNQMF ͔Β/ݸͷOFHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ MPTTΛԼ͛Δ ૬ޓ৘ใྔ͕ߴ͘ͳΔ ͕ີ౓ൺ ʹൺྫ͢Δ஋ ʹͳΔ ີ౓ൺ͕ਖ਼͘͠ٻ·Δ fk
  15. FOD FOD zpred ztrue znoise W zT pred ztrue ˠେ͖͘

    zT pred znoise ˠখ͘͞ c w ୯ʹ ͕ Λ.4&Ͱ༧ଌ͢Δ͚ͩͩͱ %FDPEFS͕ແ͍ͷͰ  ͕༧ଌ͠΍͍͢USJWJBMͳදݱʹFODPEF͞Εͯ͠·͏ શ෦ͱ͔  w ༧ଌର৅͕QPTJUJWFTBNQMFͱͳΔ࣮ࡍͷ֬཰෼෍ʹ͚ۙͮΔͨΊʹ େɺ খͱ͢ΔΑ͏ͳ੍໿͕ಇ͘ͷͰɺͦΕ͕๷͛Δ zpred ztrue z zT pred znoise zT pred znoise ௚ײతͳཧղ QPTJUJWF TBNQMF OFHBUJWF TBNQMFT z pred ztrue znoise