Representation Learning with Contrastive Predictive Coding

Representation Learning with Contrastive Predictive Coding

Bf8061e5e8837f89d56092789c03e19e?s=128

Kosuke Miyoshi

September 02, 2020
Tweet

Transcript

  1. 3FQSFTFOUBUJPO-FBSOJOH XJUI$POUSBTUJWF 1SFEJDUJWF$PEJOH  ,PTVLF.JZPTIJ 3-DPMMPRVJN WFSTJPO

  2. ໨࣍ w $1$ΛಡΜͩཧ༝ w ૬ޓ৘ใྔ w ີ౓ൺਪఆ w $1$ w

    ີ౓ൺਪఆΛར༻ͨ͠૬ޓ৘ใྔͷ࠷େԽʹΑΔ දݱֶश
  3. ࣗ෼ͷϞνϕʔγϣϯ w "OJNBM"*Ͱղ͚ͳ͔ͬͨ໰୊ΛͲ͏ͱ͔͘Λߟ͍͑ͨ w *OUFSOBMNPEFM4QBUJBMSFBTPOJOH w ໨Ӆ͠͞Εͯ΋ҠಈͰ͖Δ w ၆ᛌࢹ఺஍ਤ w

    ؀ڥͷμΠφϛΫε w Ϙʔϧ͕Ͳ͏ಈ͔͘༧ଌͰ͖Δ w $BVTBMSFBTPOJOH
  4. 4IBQJOH#FMJFG4UBUFTXJUI(FOFSBUJWF &OWJSPONFOU.PEFMTGPS3- w ௕ظ༧ଌʹΑΔ၆ᛌ஍ਤදݱͷ֫ಘ w ੜ੒ϞσϧΛ࢖ͬͨ΋ͷ w $POUSBTJWF1SFEJDUJWF$PEJOHΛ࢖ͬͨ΋ͷ ,BSM(SFHPSFUBM 

  5. 3FQSFTFOUBUJPO-FBSOJOHXJUI $POUSBTUJWF1SFEJDUJWF$PEJOH "BSPOWBOEFO0PSEFUBM 

  6. ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

    c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ $͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ$͸ྑ͍දݱ $͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍
  7. ૬ޓ৘ใྔ

  8. Τϯτϩϐʔ ) େ Τϯτϩϐʔෆ࣮֬͞౓߹͍ Τϯτϩϐʔ ) খ

  9. H(A) H(A|B) I(A, B) = H(A) − H(A|B) #Λ஌Δ͜ͱͰ"ͷෆ࣮֬౓߹͍͕ ͲΕ͘Β͍ݮΔ͔

    ૬ޓ৘ใྔ
  10. ૬ޓ৘ใྔ A = a0 A = a1 B = b0

    B = b1 A = a0 A = a1 B = b0 B = b1 P(A, B) = P(A)P(B) ૬ޓ৘ใྔ I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B) = P(A|B)P(B)log P(B|A)P(A) P(A)P(B) = − P(B)log P(B) = H(B) = H(A) I(A, B) = ∫ A ∫ B P(A, B)log P(A, B) P(A)P(B)  ࠷খͷ࣌ ࠷େͷ࣌ ৚݅෇͖෼෍Q #c" ͕ %FUFSNJOJTUJDʹͳ͍ͬͯΔ "ͷ݁Ռ͕Θ͔Ε͹ #ͷ͜ͱ͕Θ͔Δ "ͷ݁Ռ͕Θ͔ͬͯ΋ #ͷ͜ͱ͕Կ΋Θ͔Βͳ͍   = 0
  11. ؍ଌ ੜ੒ Ϟσϧ ΛϞσϦϯά p(x|c) ؍ଌϞσϧΛ࢖Θͣʹ  ૬ޓ৘ใྔ ࠷େԽ I(x;

    c) Ξϓϩʔν Y D ίϯςΩετ ΍Γ͍ͨ͜ͱ ࣌ܥྻͷίϯςΩετ$ͷදݱΛ ࣗݾڭࢣֶशͰ ֶश͍ͨ͠ Ծఆ $͔ΒະདྷͷY͕༧ଌͰ͖ΔͳΒ$͸ྑ͍දݱ $͸࣌ܥྻͷେҬతͳߏ଄͕֫ಘͰ͖͍ͯΔ %FDPEFSΛ࡞Δ %FDPEFSΛ࡞Βͳ͍
  12. ∑ x ∑ c p(x, c)log p(x ∣ c) p(x)

    I(x; c) = ∑ x ∑ c p(x, c)log p(x, c) p(x)p(c) ∑ x ∑ c p(x, c)log p(x|c)p(c) p(x)p(c) Y D ίϯςΩετ p(x) p(x|c) ͜ͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹ DͷදݱΛֶश͍ͨ͠ ૬ޓ৘ใྔʹ͸ີ౓ൺͷ ܗ͕ग़ͯ͘Δ
  13. ີ౓ൺਪఆ

  14. r(x) = pA (x) pB (x) {xA i }nA i=1

    {xB j }nB j=1 αϯϓϧ ݸˠϥϕϧ"ׂΓ౰ͯ nA αϯϓϧ ݸˠϥϕϧ#ׂΓ౰ͯ nB લఏೋͭͷ֬཰ີ౓෼෍ͷαϯϓϧ͸ಘΒΕ͍ͯΔ͕֬཰෼෍͸Θ͔Βͳ͍ ໨తೋͭͷ෼෍ͷ֬཰ີ౓ͷൺ཰Λɺαϯϓϧ͔Βֶशͨ͠෼ྨثΛ༻͍ͯਪఆ͢Δ pA (x) pB (x) ൺ཰͕r(x)
  15. pA (x) = p(x|y = A) pB (x) = p(x|y

    = B) r(x) = pA (x) pB (x) = p(x|y = A) p(x|y = B) = p(y = A|x)p(x) p(y = A) p(y = B|x)p(x) p(y = B) = p(y = B) p(y = A) p(y = A|x) p(y = B|x) ̂ r(x) = nB nA ̂ p(y = A|x) ̂ p(y = B|x) = nB nA ̂ p(y = A|x) 1 − ̂ p(y = A|x) ີ౓ൺͷۙࣅ ෼ྨثΛֶश αϯϓϧ͕෼෍"͔Βͷ΋ͷ͔#͔Βͷ΋ͷ͔Λ෼ྨ͢Δ෼ྨثΛ ֶश͢Δͱͦͷग़ྗͷ֬཰͕ີ౓ൺʹ࢖͑Δ
  16. $1$

  17. ͱ ͷ૬ޓ৘ใྔΛٻΊΔͷͰ͸ͳ͘ɺ ΛΤϯίʔυͨ͠ ͱ ͷ૬ޓ৘ใྔͷ࠷େԽΛߟ͑Δ x c x z c

    I(x, c) ≥ I(z, c)
  18. fk (xt+k , ct) ∝ p (xt+k ∣ ct) p

    (xt+k) fk (xt+k , ct) = exp (zT t+k Wk ct) ີ౓ൺ͸ඞͣਖ਼ͷ஋ Y D ίϯςΩετ p(x) p(x|c) ີ౓ൺΛ ͱ ͷؔ਺ ͰϞσϧԽ͢Δ z c f ͸༧ଌઌεςοϓ਺ FYd k
  19. ℒN = − X log fk (xt+k , ct) ∑

    xj ∈X fk (xj , ct) X = {x1 , ⋯xN } w /ݸͷத͔ΒQPTJUJWFTBNQMFΛબͿ$BUFHPSJDBM෼෍ͷ 4PGUNBYDSPTTFOUSPQZMPTTͷܗ w -PTTͷ࠷దԽʹΑΓ w B  ͕ ʹൺྫ͢Δ஋ ʹͳΔ w C ૬ޓ৘ใྔ͕࠷େԽ͞ΕΔ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) ͔Βݸͷ1PTJUJWFTBNQMF ͔Β/ݸͷ/FHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ ࠷దԽ͢Δ-PTT fk (xt+k , ct) = exp (zT t+k Wk ct) QPTJUJWFTBNQMFͰͷf QPTJUJWFTBNQMF OFHBUJWFTBNQMFͷ ͷ߹ܭ f
  20. p(x0 |ct )p(x1 ) p(x0 |ct )p(x1 ) + p(x1

    |ct )p(x0 ) ྫ/ͷ৔߹ αϯϓϧ ͕͋Δ࣌ɺ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰͸ʁ QPTJUJWF͕ͲΕ͔ͷબ୒ࢶ͸ ͱ ͷೋ௨Γߟ͑ΒΕΔ x0 , x1 x0 x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x0 x1 ͕QPTJUJWF  ͕OFHBUJWFͰ͋Δ৔߹ͷൃੜ֬཰ x1 x0 p(x0 |ct ) p(x0 ) p(x0 |ct ) p(x0 ) + p(x1 |ct ) p(x1 ) Ͱ྆ลׂΔͱ p(x0 )p(x1 ) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k )
  21. p(d = i|X, ct ) = p(xi |ct )∏ l≠i

    p(xi ) ∑N j=1 {p(xj |ct )∏ l≠j p(xl )} લϖʔδͷઆ໌ΛҰൠԽ  αϯϓϧ ͕QPTJUJWFTBNQMFͰ͋Δ֬཰ xi p(xi |ct ) p(xi ) QPTJUJWFTBNQMFݸ OFHBUJWFTBNQMF/ݸ = p(xi |ct )∏ l≠i p(xi ) ∏ l p(xl ) ∑N j=1 { p(xj |ct )∏ l≠j p(xl ) ∏ l p(xl ) } = p(xi |ct ) p(xi ) ∑N j=1 { p(xj |ct ) p(xj ) } ℒN = − X log fk (xt+k , ct) ∑ xj ∈X fk (xj , ct) B  ͕ ʹͳΔ͜ͱͷઆ໌ fk (xt+k , ct) p(xt+k |ct ) p(xt+k ) -PTTͷ࠷దԽʹΑΓαϯϓϧ ͕ QPTJUJWFTBNQMFͰ͋Δ֬཰ʹऩଋ͢Δ xt+k
  22. ℒopt N = − X log p(xt+k |ct ) p(xt+k

    ) p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) = X log p(xt+k |ct ) p(xt+k ) + ∑ xj ∈Xneg p(xj |ct ) p(xj ) p(xt+k |ct ) p(xt+k ) = X log 1 + p(xt+k ) p(xt+k |ct ) ∑ xj ∈Xneg p(xj |ct ) p(xj ) ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1)xj p(xj |ct ) p(xj ) ] ͔Β/ݸαϯϓϦϯά͖ͯͯ͠ ΛٻΊͯ࿨ΛऔΔ p(x) p(x|c) p(x) αϯϓϦϯάΛظ଴஋ܭࢉʹஔ͖׵͑ ≈ X log [ 1 + p(xt+k ) p(xt+k |ct ) (N − 1) ] xj p(xj |ct ) p(xj ) = ∫ p(xj ) p(xj |ct ) p(xj ) dxj = ∫ p(xj |ct )dxj = 1 = X log p(xt+k ) p(xt+k |ct ) N + { 1 − p(xt+k ) p(xt+k |ct ) } ≥ X log [ p(xt+k ) p(xt+k |ct ) N ] ظ଴஋ܭࢉऔΔͱΑΓখ͘͞ͳΔ = − I(xt+k , ct ) + log(N) Ͱظ଴஋ औ͍ͬͯΔ p(xt+k , ct ) I(xt+k , ct ) = ∫ p(xt+k , ct ) p(xt+k |ct ) p(xt ) X = {x1 , ⋯xN } ݸͷQPTJUJWFTBNQMFͱ/ݸͷOFHBUJWFTBNQMF C -PTTͷ࠷খԽ͕૬ޓ৘ใྔͷ࠷େԽʹͳΔઆ໌
  23. ℒN = − X log fk (xt+k , ct) ∑

    xj ∈X fk (xj , ct) X = {x1 , ⋯xN } /ݸͷαϯϓϧͷத͔ΒQPTJUJWFTBNQMFΛબͿ$BUFHPSJDBM෼෍ͷ TPGUNBYDSPTTFOUSPQZMPTTͷ࠷దԽ ͔ΒݸͷQPTJUJWFTBNQMF ͔Β/ݸͷOFHBUJWFTBNQMF p(xt+k |ct ) p(xt+k ) I(xt+k , ct ) ≥ log(N) − ℒ MPTTΛԼ͛Δ ૬ޓ৘ใྔ͕ߴ͘ͳΔ ͕ີ౓ൺ ʹൺྫ͢Δ஋ ʹͳΔ ີ౓ൺ͕ਖ਼͘͠ٻ·Δ fk
  24. FOD FOD zpred ztrue znoise W zT pred ztrue ˠେ͖͘

    zT pred znoise ˠখ͘͞ c w ୯ʹ ͕ Λ.4&Ͱ༧ଌ͢Δ͚ͩͩͱ %FDPEFS͕ແ͍ͷͰ  ͕༧ଌ͠΍͍͢USJWJBMͳදݱʹFODPEF͞Εͯ͠·͏ શ෦ͱ͔  w ༧ଌର৅͕QPTJUJWFTBNQMFͱͳΔ࣮ࡍͷ֬཰෼෍ʹ͚ۙͮΔͨΊʹ େɺ খͱ͢ΔΑ͏ͳ੍໿͕ಇ͘ͷͰɺͦΕ͕๷͛Δ zpred ztrue z zT pred znoise zT pred znoise ௚ײతͳཧղ QPTJUJWF TBNQMF OFHBUJWF TBNQMFT z pred ztrue znoise
  25. ධՁ

  26. "VEJP Իૉ෼ྨ DMBTT ࿩ऀ෼ྨ DMBTT $1$ͰTFMGTVQFSWJTFEͰ ֶशͨ͠දݱD͔Βઢܗ෼ྨثΛֶश w 1$.ԻݯΛNTFD୯ҐͰ$POWͰFODPEF ࣍ݩ

     w (363//ͰDΛFODPEF ࣍ݩ  w $1$ͰTUFQઌͷ༧ଌ w / OFHBUJWFTBNQMFݸʁ
  27. "VEJP Կεςοϓઌ·Ͱ༧ଌ͢Δ͔ /FHBUJWF4BNQMFΛͲ͔͜ΒऔΔ͔ FYDMಉҰγʔέϯε͔ΒΛআ͘

  28. "VEJP QPTJUJWFTBNQMFͷ༧ଌਫ਼౓ ͱ༧ଌTUFQઌͷؔ܎ දݱ$ΛU4/&ʹ͔͚ͨ݁Ռͷ ࿩ऀͷ৭෼͚

  29. 7JTPO w *NBHF/FUͷYͷը૾Λ0WFSMBQͤ͞ͳ͕Β YαΠζͷYݸͷύονʹ෼͚Δ w 3FT/FUͰ࣍ݩͷදݱ ʹ&ODPEF w 1JYFM3//ͷܗࣜͰ ʹͯ͠ɺԼํ޲ͷύονͷ༧ଌ

    z Ct
  30. 7JTPO w ଞͷ4FMGTVQFSWJTFEख๏ͱͷൺֱ w Yݸͷ࣍ݩͷ Λݸͷ࣍ݩͷϕΫτϧʹNBYQPPM w ͦͷ࣍ݩϕΫτϧ͔ΒMJOFBSDMBTTJpFSͰ*NBHF/FUMBCFM  ϥϕϧ

    ͷ෼ྨΛֶश z
  31. /-1 w লུ

  32. ڧԽֶश w "$ͷ3//ͷTUFQVOSPMMʹରͯ͠ɺTUFQͷ༧ଌΛ෇Ճ͢Δ $1$Ͱͷ༧ଌ͸MJOFBSͳγϯϓϧͳߏ଄  w ڧԽֶशͰ͸ޙଓͷBDUJPOʹΑͬͯ༧ଌର৅ͷදݱ΋มΘͬͯ͘Δ͸ͣ $1$c"DUJPOͱ͍͏ผ࿦จʹͯ

  33. $1$c"DUJPO w $1$Ͱͷ༧ଌΛ"DUJPOͰ৚݅෇͚ΔΑ͏ʹͨ͠ /FVSBM1SFEJDUJWF#FMJFG3FQSFTFOUBUJPOT ;IBPIBO%BOJFM(VPFUBM

  34. ·ͱΊ w ෼ྨ໰୊Λར༻ͨ͠ີ౓ൺਪఆΛߦ͏͜ͱʹΑΔ ૬ޓ৘ใྔͷ࠷େԽ w ந৅తͳදݱΛ؍ଌϞσϧ %FDPEFS ͳ͠Ͱ֫ಘͰ͖Δ w ܭࢉίετ͕௿͍

    w "VEJP 7JTJPO FUDͷ͍Ζ͍ΖͳυϝΠϯʹద༻Մೳ