Slide 1

Slide 1 text

ঢ়ଶදݱֶशͱੈքϞσϧͷ࠷ۙͷݚڀ
 ͓Αͼਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzͷ঺հ 1 ౦ژେֶ ޻ֶܥݚڀՊ म࢜՝ఔ1೥ দౢ ୡ໵ (Tatsuya Matsushima) @__tmats__

Slide 2

Slide 2 text

ࣗݾ঺հ ౦ژେֶ ޻ֶܥݚڀՊ ٕज़ܦӦઓֶུઐ߈ দඌݚڀࣨ M1 দౢ ୡ໵ (Tatsuya Matsushima) • ਓؒͱڞੜͰ͖ΔΑ͏ͳదԠతͳϩϘοτͷ։ൃͱɼ
 ͦͷΑ͏ͳϩϘοτΛ࡞Δ͜ͱͰੜ໋ੑ΍ਓؒͷ஌ೳΛ
 ߏ੒తʹཧղ͢Δ͜ͱʹڵຯ͕͋Γ·͢ɽ • ࠷ۙɼ೔ܦΫϩετϨϯυ͞ΜͰهࣄΛॻ͖·ͨ͠ • দඌݚ͕஫໨ʂ AIͷʮ਎ମੑʯΛάʔάϧ΍ϑΣΠεϒοΫ͕ݚڀ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00001/ • ϩϘοτ੍ޚʹେ੾ͳʮঢ়ଶʯදݱɹσʔλ͔Β؀ڥͷදݱΛֶͿ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00015/ 2 @__tmats__

Slide 3

Slide 3 text

͓͠ͳ͕͖ ୈ1෦: 19:00-19:25 ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ • ڧԽֶश໰୊ʹ͓͚Δঢ়ଶͷදݱΛֶश͢Δํ๏Λ·ͱΊΔɽ
 ͜ͷදݱ͸ɼ؀ڥΛԿΒ͔ͷܗͰϞσϧԽͨ͠ʮੈքϞσϧʯͱͳ͍ͬͯΔ͜ͱ͕๬·͍͠ ୈ2෦: 19:30-20:00 ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ • ۙ೥ɼʮੈքϞσϧʯ͸ਂ૚ੜ੒ϞσϧΛ༻͍࣮ͯ૷͞ΕΔ͜ͱ͕ଟ͍ɽ
 ਂ૚ੜ੒ϞσϧΛ؆ܿʹॻ͚ΔϥΠϒϥϦPixyzͷνϡʔτϦΞϧΛߦ͏ɽ ୈ3෦: 20:05-20:35 ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE • 2018೥ʹӳDeepMind͔Βൃද͞Εͨ2ͭͷੈքϞσϧʮGQNʯͱʮTD-VAEʯΛ
 PixyzʹΑΔ࣮૷ྫΛަ͑ͳ͕Βղઆ͢Δɽ 3

Slide 4

Slide 4 text

ୈ1෦: ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ 4

Slide 5

Slide 5 text

ൃද಺༰ʹ͍ͭͯ (ຊൃදͰϕʔεͱ͍ͯ͠Δ࿦จ) State Representation Learning for Control: An Overview • https://arxiv.org/abs/1802.04181 (Last revised 5 Jun 2018) • Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat • S-RL Toolboxͱ͍͏πʔϧ΋࡞੒͍ͯ͠Δ https://github.com/araffin/robotics-rl-srl • ੍ޚλεΫʹ༻͍Δঢ়ଶͷදݱֶशʹؔ͢ΔϨϏϡʔ࿦จ • UC BerkeleyΛத৺ʹ੝Μʹݚڀ͞Ε͍ͯΔ෼໺ • ೔ຊͰ͸͋Μ·Γݟͳ͍ؾ͕͢Δ • Χόʔ͞Ε͍ͯͳ͍ଞͷ࿦จ΋ຊൃදͰ͸௥Ճͨ͠ 5

Slide 6

Slide 6 text

ঢ়ଶදݱֶशͱ͸ʁ දݱֶश (representation learning) • σʔλ͔Βabstructͳಛ௃Λݟ͚ͭΔֶश ঢ়ଶදݱֶश(state representation learning, SRL) • ঢ়ଶදݱ(state representation)ͱ͸ɼ
 ֶशͨ͠ಛ௃͕௿࣍ݩͰɼ࣌ؒతʹൃల͠ɼΤʔδΣϯτͷߦಈͷӨڹΛड͚Δ΋ͷ • ͜ͷΑ͏ͳදݱ͸ϩϘςΟΫε΍੍ޚ໰୊ʹ༗ӹͰ͋Δͱߟ͑ΒΕΔ cf)࣍ݩͷढ͍ • ྫ) ը૾৘ใ͸ඇৗʹߴ࣍ݩ͕ͩɼϩϘοτͷ੍ޚͷ໨తؔ਺͸΋ͬͱ௿࣍ݩʹදݱ͞Ε͏Δ • ϚχϐϡϨʔγϣϯͷ৔߹ɼ෺ମͷ3࣍ݩͷҐஔ৘ใ • ੜͷ؍ଌσʔλ͔Β͜ͷঢ়ଶදݱΛݟ͚ͭΔख๏ͷݚڀ͕ओཁͳςʔϚ 6

Slide 7

Slide 7 text

ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • ਓؒ͸͋ΒΏΔ΋ͷΛ஌֮Ͱ͖ΔΘ͚Ͱ͸͘ɼ৘ใ(ܹࢗ)͔ΒੈքΛϞσϧԽͨ͠಺෦Ϟ σϧΛ࡞Γɼਓؒͷ஌ೳʹେ͖ͳ໾ׂΛ୲͍ͬͯΔͱࢥΘΕΔ • ੈքϞσϧͱ΋͍͏ • [DLྠಡձ]GQNͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ
 https://www.slideshare.net/DeepLearningJP2016/dlgqn-111725780 • ࠓ·ͰͷهԱ͔ΒະདྷΛ༧ଌ͢Δྗ͕஌ೳ • δΣϑɾϗʔΩϯεʰߟ͑Δ೴ɾߟ͑Δίϯϐϡʔλʱ • ֶशͨ͠಺෦ϞσϧΛ༻͍ͯະདྷΛγϛϡϨʔγϣϯ͠ͳ͕Βߦಈ͍ͯ͠Δ
 ͱߟ͑ΒΕΔ 7

Slide 8

Slide 8 text

ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • Josh TenenbaumઌੜʹΑΔMITͰͷߨٛ • MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
 https://www.youtube.com/watch?v=7ROelYvo8f0 8

Slide 9

Slide 9 text

ྑ͍දݱͱ͸ʁ ੜͷ؍ଌ৘ใͷແؔ࿈ͳ෦෼Λແࢹͯ͠ɼڧԽֶशʹར༻͢ΔͨΊʹඞཁෆՄܽͳ ৘ใΛΤϯίʔυ͢Δ͜ͱ͕ඞཁ [Böhmer et al., 2015]ʹΑΔྑ͍ঢ়ଶදݱͷఆٛ • Ϛϧίϑੑ͕͋Δ • ݱࡏͷঢ়ଶͷΈΛݟΔ͚ͩͰɼ͋ΔํࡦΛ༻͍ͯߦಈΛબ୒͢Δ͜ͱ͕Ͱ͖Δ͙Β͍े෼ͳ৘ ใΛཁ໿͍ͯ͠Δ • ํࡦͷվળͷͨΊʹར༻Ͱ͖Δ • ಉ͡Α͏ͳಛ௃Λ࣋ͭݟͨ͜ͱͷͳ͍ঢ়ଶʹɼֶशͨ͠Ձ஋ؔ਺Λ൚ԽͰ͖Δ • ௿࣍ݩͰ͋Δ 9

Slide 10

Slide 10 text

SRLͰ͸ਅͷঢ়ଶɹɹɹΛ࢖Θͣʹɼ͜ΕΛۙࣅ͢ΔΑ͏ͳঢ়ଶɹɹɹΛֶश͢Δ • աڈͷ؍ଌɹɹ͔Βݱࡏͷঢ়ଶɹ΁ͷϚοϐϯάɹɹɹɹɹɹͷֶश SRLͷҰൠԽ 10 at ∈ ot ∈ ؍ଌ ߦಈ at ot ot+1 ਅͷঢ়ଶ(ෆ໌) ˜ st ˜ st+1 ˜ st ∈ ˜ ใु ˜ st ∈ ˜ st ∈ o1:t st st = ϕ (o1:t)

Slide 11

Slide 11 text

SRLͷΞϓϩʔν SRLͷΞϓϩʔνʹ͸͍͔ͭ͘ύλʔϯ͕͋Δ • ࣗݾූ߸Խث(auto-encoder)ͷར༻ • ॱϞσϧ(forward model)ͷར༻ • ٯϞσϧ(inverse model)ͷར༻ • ࣄલ஌ࣝ(prior)ͷಋೖ 11

Slide 12

Slide 12 text

SRLͷΞϓϩʔν ࣗݾූ߸Խث(auto-encoder)ͷར༻ • ࠶ߏ੒ޡࠩͷ࠷খԽΛ͢Δ͜ͱͰɼΤϯίʔμɹͱσίʔμɹɹΛֶश • ͦͷࡍɼঢ়ଶɹ͕͋Δੑ࣭Λ࣋ͭΑ͏ʹ੍໿Λ͔͚Δ • ྫ)࣍ݩͷ੍໿ɼϊΠζͷআڈ(denoising)ɼεύʔεੑͷ੍໿ 12 st st ϕ ϕ−1 st = ϕ (ot ; θϕ) ̂ ot = ϕ−1 (st ; θϕ−1) ࠶ߏ੒ޡࠩ Τϯίʔμ σίʔμ

Slide 13

Slide 13 text

SRLͷΞϓϩʔν ॱϞσϧ(forward model)ͷར༻ • ॱϞσϧɹ͸ঢ়ଶɹͱߦಈɹΛ༻͍ͯ࣍ͷঢ়ଶɹɹΛ༧ଌ • ॱϞσϧʹઢܗม׵ͳͲͷ੍໿Λ͔͚Δ͜ͱ͕Ͱ͖Δ • Τϯίʔμɹ͸࣍ͷঢ়ଶͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ 13 ̂ st+1 = f (st , at ; θfwd) ॱϞσϧ ࣍ͷঢ়ଶͷ༧ଌޡࠩ st = ϕ (ot ; θϕ) Τϯίʔμ ϕ st at st+1 f

Slide 14

Slide 14 text

SRLͷΞϓϩʔν ٯϞσϧ(inverse model)ͷར༻ • ঢ়ଶɹͱ࣍ͷঢ়ଶɹɹ͔Β࣮ࡍʹऔΒΕͨߦಈɹΛਪఆ͢Δ • Τϯίʔμɹ͸࣮ࡍʹͱΒΕͨߦಈɹͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ 14 st st+1 at ϕ at st = ϕ (ot ; θϕ) Τϯίʔμ ̂ at = g (st , st+1 ; θinv) ٯϞσϧ ࣮ࡍʹऔΒΕͨ
 ߦಈͷ༧ଌޡࠩ

Slide 15

Slide 15 text

SRLͷΞϓϩʔν ࣄલ஌ࣝ(prior)ͷಋೖ • ಛఆͷ੍໿΍μΠφϛΫεʹؔ͢Δࣄલ஌ࣝΛར༻͢Δ • ྫ) ࣌ؒతͳ࿈ଓੑ • ࣄલ஌ࣝ͸͋Δ৚݅ɹͷ΋ͱͰɼঢ়ଶͷू߹ɹɹʹద༻͞ΕΔlossΛ௨ͯ͡ఆٛ͞ΕΔ • 15 Loss = ℒprior (s1:n ; θϕ |c) s1:n c ঢ়ଶͷۭؒࣗମʹ
 ੍໿Λ͓͘ st = ϕ (ot ; θϕ) Τϯίʔμ

Slide 16

Slide 16 text

ͳͥSRLΛߟ͑Δ΂͖ͳͷ͔? • ੜͷ؍ଌ͔Βend-to-endʹ௚઀ڧԽֶश͢Δͷ͸ίετ͕ߴ͍ • SRLͰྑ͍priorΛೖΕͯ͋͛Δ͜ͱ͕Ͱ͖Δ͔΋ • ϚϧνϞʔμϧͳ؍ଌʹ֦ு͠ಘΔ • ؔ࿈ͨ͠λεΫΛࣄલʹղ͘͜ͱͰసҠֶशʹར༻Ͱ͖Δ • ਐԽઓུ(ES)ͳͲͷɼ࣍ݩ͕୳ࡧεϐʔυʹ௚݁͢ΔΑ͏ͳΞϧΰϦζϜΛ࠾༻͢Δ͜ͱ ͕ՄೳʹͳΔ Why SRL? 16

Slide 17

Slide 17 text

طଘͷݚڀͷ঺հͱ෼ྨ 17

Slide 18

Slide 18 text

ݚڀͷ෼ྨ ෼ྨͷํ๏ • ֶशͷ໨తؔ਺ • ؍ଌۭؒɾߦಈۭؒͷઃܭ • ঢ়ଶදݱͷධՁࢦඪ • ධՁʹ༻͍ΔλεΫ 18

Slide 19

Slide 19 text

ֶशͷ໨తؔ਺ • ؍ଌͷ࠶ߏ੒ • ॱϞσϧ(forward model)ͷֶश • ٯϞσϧ(inverse model)ͷֶश • ಛ௃ͷఢରతֶशͷ׆༻ • ใुͷ׆༻ • ͦͷଞͷ໨తؔ਺ • ϋΠϒϦουͳ໨తؔ਺ 19

Slide 20

Slide 20 text

ֶशͷ໨తؔ਺ ؍ଌͷ࠶ߏ੒ • ࣍ݩѹॖͱͯ͠Α͘࢖ΘΕΔํ๏ • ྫ) PCA[Curran+ 2015]ɼDAEɼVAE[van Hoof+ 2016]ɽ • ࣗݾූ߸Խث(auto-encoder)Λ࢖͏ख๏͕ଟ͍ • ը૾ͷ؍ଌΛͦͷ··࢖͏[Mattner+ 2012] • ΦϒδΣΫτͷҐஔΛදݱ͢ΔΑ͏ʹ੍໿͢Δ ྫ)Spatial Softmax [Finn+ 2015] • ؍ଌʹ໨ཱͭಛ௃͕ଘࡏͯ͠ͳ͍ͱ୯ʹ؍ଌΛ࠶ߏ੒͢Δ͚ͩͰ͸ྑ͍දݱʹ͸ͳΒͳ͍ • ྫ)ήʔϜʹ͓͚Δখ͍͞ΞΠςϜ • ҧ͏࣌ؒεςοϓ͔Β࠶ߏ੒ͨ͠Γɼ࣌ؒൃలʹ੍ؔͯ͠໿Λ͔͚Δ͜ͱͰରԠ 20

Slide 21

Slide 21 text

ֶशͷ໨తؔ਺ ॱϞσϧ(forward model)ͷֶश • ঢ়ଶ͕࣍ͷঢ়ଶΛ༧ଌ͢Δͷʹඞཁͳ৘ใΛΤϯίʔυ͢ΔΑ͏ʹ͢Δ • ؍ଌͷ࠶ߏ੒ͱΑ͘૊Έ߹ΘͤΒΕΔ • ঢ়ଶۭؒʹ͓͚ΔભҠΛઢܗͱԾఆ͢Δ͜ͱ͕ଟ͍ 21 ̂ st+1 = Wst + Uat + V

Slide 22

Slide 22 text

(ྫ) E2C [Watter+ 2015] Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images • VAEΛ༻͍ͨॱϞσϧɽঢ়ଶ(જࡏදݱ)ɹͷભҠΛઢܗͰ͋ΔͱԾఆ. • ࣍ͷ࣌ؒεςοϓͷঢ়ଶͷ༧ଌɹɹͱͦͷঢ়ଶɹɹͷKLΛ
 ͚ۙͮΔ͜ͱͰॱϞσϧΛֶश • ΧϧϚϯϑΟϧλͱͯ͠ఆࣜԽͨ͠΋ͷ΋͋Δ(DVBF)
 [Karl+ 2016] 22 st ̂ st+1 ∼ (μ = Wst + Uat + V, σ) ઢܗ ̂ st+1 st+1

Slide 23

Slide 23 text

World Models • VAEͱMDN-RNNΛར༻ͨ͠ॱϞσϧ • Vision model (V): ߴ࣍ݩͷ؍ଌσʔλΛVAEΛ༻͍ͯ
 ௿࣍ݩͷίʔυ(ঢ়ଶ)ʹѹॖ • Memory RNN (M): աڈͷίʔυ͔Β࣍ͷεςοϓͷ
 ίʔυ(ঢ়ଶ)Λ༧ଌ [DLྠಡձ]World Models
 https://www.slideshare.net/DeepLearningJP2016/dlworld-models-95167842 (ྫ) World Model [Ha+ 2018] 23

Slide 24

Slide 24 text

ֶशͷ໨తؔ਺ ٯϞσϧ(inverse model)ͷֶश • ͱͬͨߦಈΛਪఆͰ͖ΔΑ͏ʹঢ়ଶͷදݱʹ੍໿Λ՝͢ • ྫ) Learning to Poke by Poking [Agrawal+ 2016] • ͍ͭͬͭͨҐஔ(ɹ)ɼ֯౓(ɹ)ɼڑ཭(ɹ)Λਪఆ 24 lt θt pt

Slide 25

Slide 25 text

(ྫ) ICM [Pathak+ 2017] Curiosity-driven Exploration by Self-supervised Prediction • ॱϞσϧͷ༧ଌޡࠩɹɹΛڧԽֶशͷ಺తใुͱͯ͠ར༻ • ΤʔδΣϯτͷ֎෦͔Βͷใु͕εύʔεͳͱ͖ʹ୳ࡧΛଅਐ͢Δ • ٯϞσϧʹΑΔLoss΋ར༻ [DLྠಡձ]Large-Scale Study of Curiosity-Driven Learning
 https://www.slideshare.net/DeepLearningJP2016/dllargescale-study-of-curiositydriven-learning 25 ℒfwd ( ̂ ϕ (ot+1), ̂ f ( ̂ ϕ (ot), at)) = 1 2 ̂ f ( ̂ ϕ (ot), at) − ̂ ϕ (ot+1) 2 2 ℒfwd min θP ,θI ,θF [−λπ(st ; θP) [Σt rt] + (1 − β)ℒinv + βℒfwd] ٯϞσϧ ॱϞσϧ ֎తใु

Slide 26

Slide 26 text

ֶशͷ໨తؔ਺ ಛ௃ͷఢରతֶश • ྫ) Causal InfoGAN [Kurutach+ 2018] • GANͷ໨తؔ਺ʹঢ়ଶͱGeneratorͷग़ྗ(؍ଌͷϖΞ)ͷ૬ޓ৘ใྔʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճ 26 min G,Q,ℳ max D V(G, D) − λIVLB (G, Q) ૬ޓ৘ใྔ

Slide 27

Slide 27 text

ֶशͷ໨తؔ਺ ใुͷ׆༻ • SLRʹ͓͍ͯ͸ใुΛར༻͢Δ͜ͱ͸ඞͣ͠΋ඞཁͰ͸ͳ͍͕ɼঢ়ଶΛ۠ผ͢ΔͨΊͷ௥ Ճతͳ৘ใͱͯ͠ར༻͠͏Δ • ྫ) VPN [Oh+ 2017] • ࣍ͷঢ়ଶͱͦͷঢ়ଶՁ஋΋༧ଌ 27 ߦಈ
 ※optionͷo ࣍ͷঢ়ଶ ࣍ͷঢ়ଶՁ஋ ؍ଌ ঢ়ଶ

Slide 28

Slide 28 text

ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • ࣮ੈքʹؔ͢Δࣄલ஌ࣝ(prior)Λঢ়ଶۭؒʹ൓ө͢ΔͨΊʹɼ໨తؔ਺Λ޻෉͢Δ • ͍Ζ͍Ζͳ΋ͷ͕ఏҊ͞Ε͍ͯΔ
 • Slowness prior [Lesort+ 2017, Jonschkowski+ 2017] • ॏཁͳ΋ͷ͸Ώͬ͘Γͱ࿈ଓతʹಈ͖ɼٸܹͳมԽ͕ى͜ΔՄೳੑ͸௿͍
 
 • Variability [Jonschkowski+ 2017] • ؔ܎ͷ͋Δ΋ͷ͸ಈ͘ͷͰɼঢ়ଶදݱֶश͸ಈ͍͍ͯΔ΋ͷʹ஫໨͢΂͖
 
 28 ℒSlowness (D, ϕ) = [ Δst 2 ] ℒVariabilty (D, ϕ) = [e− st1 − st2 ]

Slide 29

Slide 29 text

ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • Robotic Priors [Jonschkowski+ 2015]Ͱಋೖ͞Ε͍ͯΔ΋ͷ • Proportionality • ҧ͏ঢ়ଶͰ΋ಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓Ͱ͋Δ
 
 • Repeatability • ࣅͨঢ়ଶͰಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓ɾಉ͡ํ޲Ͱ͋Δ
 29 ℒProp (D, ϕ) = [( Δst2 − Δst1 ) 2 |at1 = at2] ℒRep (D, ϕ) = [e− st2 − st1 2 Δst2 − Δst1 2 |at1 = at2]

Slide 30

Slide 30 text

ֶशͷ໨తؔ਺ ϋΠϒϦουͳ໨తؔ਺ • ࣮ࡍ͸ࠓ·Ͱʹڍ͛ͨ໨తؔ਺ͷ͏ͪɼෳ਺Λ૊Έ߹ΘͤͯSRL͕ߦΘΕΔ͜ͱ͕ଟ͍ 30 ߦಈ/࣍ͷঢ়ଶ ͷ੍໿ ॱϞσϧ
 ※࣍ͷঢ়ଶͷ༧ଌ ٯϞσϧ ؍ଌͷ࠶ߏ੒ ࣍ͷ؍ଌͷ
 ༧ଌ ใुͷ׆༻ E2C
 [Watter+ 2015] ✔ ✔ ✔ ✔ World Model
 [Ha+ 2018] ✔ ✔ ✔ ICM
 [Pathak+ 2017] ✔ ✔ ✔ Causal InfoGAN
 [Kurutach+ 2018] ✔ ✔ ✔ ✔ VPN
 [Oh+ 2017] ✔ ✔ Robotic Priors
 [Jonschkowski+ 2015] ✔ ✔

Slide 31

Slide 31 text

؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ • ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ͸໰୊ͷෳࡶੑʹӨڹΛٴ΅͢ • Ͳͷ͘Β͍ͷ࣍ݩͷେ͖͔͞ɼߦಈ͕཭ࢄ͔ɾ࿈ଓ͔ • ௨ৗɼਅͷঢ়ଶΑΓ΋େ͖ͳঢ়ଶۭؒͷ࣍ݩΛઃܭ͢Δ͜ͱ͕ଟ͍ • ঢ়ଶΛͲͷ͙Β͍ͷ࣍ݩʹ͢Ε͹͍͍͔Α͘Θ͔Βͳ͍λεΫ΋ଟ͍ ྫ)Atari 31 ؀ڥ ؍ଌͷछྨ ؍ଌۭؒͷ࣍ݩ ঢ়ଶͷ࣍ݩ ߦಈ Robotic Priors
 [Jon-schkowski+ 2015] slot car racing ը૾ 16×16×3 2 ཭ࢄ(25) E2C
 [Watter+ 2015] cart-pole ը૾ 80×80×3 8 ཭ࢄ ICM
 [Pathak+ 2017] Mario Bros. ը૾ 42×42×3 2 ཭ࢄ(14)

Slide 32

Slide 32 text

ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΤʔδΣϯτʹ࣮ࡍʹڧԽֶशλεΫΛղ͔ͤͯɼλεΫؒͰసҠͰ͖Δ͙Β͍൚Խ͞Ε ͨදݱʹͳ͍ͬͯΔ͔Λௐ΂Δ • ΋ͬͱ΋Ұൠతͳํ๏͕ͩɼ࣮ݧίετ͕ߴ͍ • ͲͷڧԽֶशΞϧΰϦζϜΛ࢖ͬͯධՁ͢Ε͹͍͍͔Θ͔Βͳ͍ • ͳͷͰɼֶशͨ͠ঢ়ଶදݱ͕ྑ͍͔Ͳ͏͔ͷதؒతͳධՁख๏͕ཉ͍͠ • ࠷ۙ๣๏Λ࢖͏ • ࣭తධՁ • ྔతධՁ (KNN-MSE [Lesort+ 2017]) 32 KNN − MSE(s) = 1 k ∑ s′∈KNN(s,k) ˜ s − ˜ s′ 2

Slide 33

Slide 33 text

ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΋ͭΕͷͳ͍දݱ(disentangled)͔Ͳ͏͔ΛΈΔ • disentangled metric score [Higgins+ 2016] • σʔλͷഎޙͷੜ੒ཁҼ͕෼͔͍ͬͯΔલఏ • ༰ྔ͕খ͘͞VC࣍ݩͷখ͍͞൑ผثͷaccuracyΛ༻͍Δํ๏ • ਅͷঢ়ଶ΁ͷճؼϞσϧΛ࡞Δ [Jonschkowski+ 2015] • ςετηοτͷਫ਼౓ΛධՁ͢Δ 33

Slide 34

Slide 34 text

ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ 34

Slide 35

Slide 35 text

ධՁʹ༻͍ΔλεΫ SRLͰఆ൪ͷλεΫ • ৼࢠɾ౗ཱৼࢠ • ϥϯμϜͳҐஔ͔Βελʔτ͢ΔৼࢠΛཱͯΔ • Cart-Pole • ୆ंͷ͍ͭͨ౗ཱৼࢠΛཱͯΔ • ਨ௚ํ޲͔Β15°ͣΕΔ͔த৺͔Β2.4ϢχοτͿΜͣΕͯ͠·͏ͱΤϐιʔυ͕ऴྃ͢Δ 35

Slide 36

Slide 36 text

ධՁʹ༻͍ΔλεΫ SRLͰఆ൪ͷλεΫ • ϏσΦήʔϜ • ྫ) AtariɼDoomɼSuper Mario Bros. • ෺ཧγϛϡϨʔλ • ྫ) OpenAI Gymɼ DeepMind Labs • ࣮ϩϘοτ • ྫ) ϚχϐϡϨʔγϣϯ[Finn+ 2015]ɼϘλϯԡ͠[Lesort+ 2015]ɼ೺࣋[Finn+ 2015] 36

Slide 37

Slide 37 text

S-RL Toolbox SRLΞϧΰϦζϜͷධՁʹؔ͢Δ͍Ζ͍ΖΛղܾ͢Δπʔϧ [Raffin+ 2018] • https://github.com/araffin/robotics-rl-srl • ଟ༷ͳػೳ • 10छྨͷڧԽֶशΞϧΰϦζϜ • Open AI GymܗࣜͷΠϯλʔϑΣΠεΛ࣋ͭධՁ؀ڥ • ϩΨʔɾՄࢹԽπʔϧ • ϋΠύʔύϥϝʔλαʔνπʔϧ • ࣮ػͷbaxterͰूΊͨσʔληοτ • SRLͷ࣮૷ू΋SRL-Zooͱؚͯ͠·Ε͍ͯΔ • https://github.com/araffin/srl-zoo • PyTorchͰ͏Ε͍͠ 37

Slide 38

Slide 38 text

ୈ1෦ͷ͓ΘΓʹ 38

Slide 39

Slide 39 text

ײ૝ • ঢ়ଶදݱʹؔͯ͠ͲΕ͚ͩෆ࣮֬ੑ͕͋Δͷ͔ΛධՁ͢Δݚڀ͸͋ΔͷͩΖ͏͔ʁ • ྫ͑͹ɼ࠷ॳͷ1ϑϨʔϜ͚ͩݟͨͱ͖ͱɼ20ϑϨʔϜ࿈ଓͰݟͨͱ͖Ͱ͸ͦͷঢ়ଶදݱͷෆ ࣮֬ੑ͸ҟͳΔ͸ͣ • ͦͷෆ࣮֬ੑΛ൓өͨ͠policy͕࡞ΕΕ͹ޮ཰తͳ୳ࡧʹ΋ͭͳ͕Δʁ • ͨ͘͞ΜͷλεΫΛղ͔ͤͯSRLͯ͠ɼྑ͍SRLͷύϥϝʔλΛֶशͨ͠ͷͪɼfew-shot Ͱ৽͍͠λεΫʹద߹ͤ͞ΔMAMLతͳΞϓϩʔν͕༗ޮ͔΋ • ͦ΋ͦ΋ɼSRLΛ͍ͨ͠ؾ࣋ͪ͸ɼͨ͘͞ΜͷλεΫͰڞ༗Ͱ͖ΔදݱΛֶश͍͔ͨ͠Βͩͬ ͨͷͰ͸ʁ • (·͋ɼ࣮ݧίετ͕ߴ͍ͷͰɼ࿦จ಺Ͱͨ͘͞ΜͷυϝΠϯΛ࢖ͬͨڧԽֶशΛͨ͘͠ͳ͍ͷ͸Θ ͔Δ͚Ͳ΋…) 39

Slide 40

Slide 40 text

σΟεΧογϣϯ ੈքϞσϧͷֶशͱํࡦͷֶशͷ࿩ • ੈքϞσϧ͕ෆ׬શͳͱ͖ʹํࡦΛͲ͏ֶश͢Δͷ͔ʁ • ϞσϧΛΞϯαϯϒϧ͢Δํ๏ දݱֶशͱ͍͏໰୊ઃఆࣗମͷ࿩ • ݁ہɼਅͷdownstreamͷλεΫ͕Θ͔Βͳ͍ͱ͖ʹ΋ɼͳΜΒ͔ͷྑ͍දݱ͕ଘࡏͯ͠ ͍Δ͸ͣͱ͍͏ԾఆΛ͓͘ɼදݱֶशͷ໰୊ʹߦ͖ண͘ͷͰ͸ • meta-priorͷ֓೦ʹ૬౰[Bengio+ 2013] • ͜ͷ೉͠͞ͷഎܠʹ͸ɼλεΫ͕཭ࢄతʹಘΒΕΔͱ͍͏໰୊ઃఆࣗମͷԾఆ͕͋ΔΑ͏ ͳؾ΋͢Δ 40

Slide 41

Slide 41 text

ୈ1෦ͷAppendix 41

Slide 42

Slide 42 text

References [Agrawal+ 2016] Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine (2016). Learning to Poke by Poking: Experiential Learning of Intuitive Physics. https://arxiv.org/abs/1606.07419 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/document/6472238 [Böhmer+ 2015] Böhmer, W., Springenberg, J. T., Boedecker, J., Riedmiller, M., and Obermayer, K. (2015). Autonomous learning of state representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz, pages 1–10. http://www.ni.tu-berlin.de/fileadmin/fg215/articles/ boehmer15b.pdf [Curran+ 2015] William Curran, Tim Brys, Matthew Taylor, William Smart (2015). Using PCA to Efficiently Represent State Spaces. https:// arxiv.org/abs/1505.00322 [Finn+ 2015] Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel (2015). Deep Spatial Autoencoders for Visuomotor Learning. https://arxiv.org/abs/1509.06113 [Ha+ 2018] David Ha, Jürgen Schmidhuber (2018). World Models. https://arxiv.org/abs/1803.10122 [Higgins+ 2016] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner (2016). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. https:// openreview.net/forum?id=Sy2fzU9gl [Jonschkowski+ 2015] Jonschkowski, R. and Brock, O. (2015). Learning state representations with robotic priors. Auton. Robots, 39(3): 407–428. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-15-AURO.pdf [Jonschkowski+ 2017] Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller (2017). PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations. https://arxiv.org/abs/1705.09805 [Karl+ 2016] Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. https://arxiv.org/abs/1605.06432 42

Slide 43

Slide 43 text

References [Kurutach+ 2018] Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel (2018). Learning Plannable Representations with Causal InfoGAN. https://arxiv.org/abs/1807.09341 [Lake+ 2016} Building Machines That Learn and Think Like People (2016). Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman. https://arxiv.org/abs/1604.00289 [Oh+ 2017] Junhyuk Oh, Satinder Singh, Honglak Lee (2017). Value Prediction Network. https://arxiv.org/abs/1707.03497 [Pathak+ 2017] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell (2017). Curiosity-driven Exploration by Self- supervised Prediction. https://arxiv.org/abs/1705.05363 [Raffin+ 2018] Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. https://arxiv.org/abs/1809.09369 [Lesort+ 2017] Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat (2017). Unsupervised state representation learning with robotic priors: a robustness benchmark. https://arxiv.org/abs/1709.05185 [Mattner+ 2012] Mattner, J., Lange, S., and Riedmiller, M. A. (2012). Learn to swing up and balance a real pole based on raw visual input data. In Neural Information Processing - 19th International Conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012, Proceedings, Part V, pages 126–133. https://ieeexplore.ieee.org/document/7759578 [van Hoof+ 2016] van Hoof, H., Chen, N., Karl, M., van der Smagt, P., and Peters, J. (2016). Stable reinforcement learning with autoencoders for tactile and visual data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3928–3934. https://ieeexplore.ieee.org/document/7759578/ [Watter+ 2015] Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller (2015). Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. https://arxiv.org/abs/1506.07365 43

Slide 44

Slide 44 text

࣭ٙԠ౴ɾσΟεΧογϣϯ ٳܜ 44

Slide 45

Slide 45 text

ୈ2෦: ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ 45

Slide 46

Slide 46 text

ਂ૚ੜ੒Ϟσϧ 46

Slide 47

Slide 47 text

ੜ੒Ϟσϧ ੜ੒Ϟσϧ • σʔλͷ෼෍ΛϞσϧԽ͢ΔΞϓϩʔν • Ϟσϧ͔Βαϯϓϧ͢Δ͜ͱͰਓ޻తͳσʔλ఺Λੜ੒͢Δ͜ͱ͕Ͱ͖Δ 47 αϯϓϦϯά

Slide 48

Slide 48 text

ਂ૚ֶशʹ͓͚Δੜ੒Ϟσϧ ਂ૚ੜ੒Ϟσϧ (Deep Generative Model, DGM) • ෼෍ʹχϡʔϥϧωοτϫʔΫΛ༻͍Δ • VAEͱGAN͕Α͘஌ΒΕ͍ͯΔ • ͱ͘ʹɼVAE͸ࠓ·Ͱͷ؍ଌͷܥྻͷ௿࣍ݩදݱ(ঢ়ଶදݱ)Λֶश͢ΔͨΊʹ
 Α͘༻͍ΒΕ͍ͯΔ(ୈ1෦) 48 VAE ग़య: [Tschannen+ 2018] GAN ग़య: [Tschannen+ 2018]

Slide 49

Slide 49 text

VAE Variational Autoencoder (VAE) [Kingma+ 2014] • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ܇࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢ • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓ ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ͷ࠷খԽ) 49 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qKL (z|x)∥p(z))] ※ ܦݧσʔλ෼෍ɹɹͰظ଴஋ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼ΍΍ݟ׳Εͳ͍͕ී௨ͷVAEͷELBO ̂ p(x) [−log pθ (x)] = ℒVAE (θ, ϕ) − ̂ p(x) [DKL (qϕ (z|x)∥pθ (z|x))] −ℒVAE ̂ p(x) [−log pθ (x)] ℒVAE ̂ p(x) ग़య: [Tschannen+ 2018] KL߲ ࠶ߏ੒

Slide 50

Slide 50 text

VAE VAEͷloss • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯٯ఻೻
 • ୈ2߲͸ɼclosed-formʹٻΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ • Τϯίʔμͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ΛબΜͩͱ͖͸ closed-formʹܭࢉͰ͖Δ • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷڑ཭Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ
 ྫ) GANʹ͓͚Δdensity ratio trick 50 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qϕ (z|x)∥p(z))] z(i) ∼ qϕ (z|x(i)) qϕ (z|x) = (μϕ (x), diag (σϕ (x))) p(z) = (0,I) KL߲ ࠶ߏ੒

Slide 51

Slide 51 text

ఢରతֶशʹΑΔີ౓ൺਪఆ f-μΠόʔδΣϯε • ɹΛತؔ਺Ͱɼ ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼ ͱ ͷf-μΠόʔδΣϯεΛ
 
 
 ͱఆٛ͢Δɽ • ͷͱ͖ɼKL divergenceʹͳΔ • ɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ 51 f f(1) = 0 px py Df (px ∥py) = ∫ f ( px (x) py (x) ) py (x)dx f(t) = t log t Df (px ∥py) = DKL (px ∥py) px py

Slide 52

Slide 52 text

ఢରతֶशʹΑΔີ౓ൺਪఆ GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠දݱ͢Δ • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2஋෼ྨλεΫʹམͱ͠ࠐΈɼDiscriminator ͸ͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ ͋Δ֬཰Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹݸͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ 52 c ∈ {0,1} px py px (x) = p(x|c = 1) py (x) = p(x|c = 0) Sη px (x) px (x) py (x) = p(x|c = 1) p(x|c = 0) = p(c = 1|x) p(c = 0|x) ≈ Sη (x) 1 − Sη (x) px N DKL (px ∥py) = ∫ px (x)log ( px (x) py (x) ) dx ≈ 1 N N ∑ i=1 log ( Sη (x(i)) 1 − Sη (x(i))) ग़య: [Tschannen+ 2018]

Slide 53

Slide 53 text

ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyz 53

Slide 54

Slide 54 text

Pixyzͱ͸ Pixyz • ෳࡶͳਂ૚ੜ੒ϞσϧΛ؆୯ʹ࣮૷ɾར༻͢Δ͜ͱʹ
 ಛԽͨ͠PyTorchϕʔεͷϥΠϒϥϦ • ϨϙδτϦ: https://github.com/masa-su/pixyz • υΩϡϝϯτ: https://docs.pixyz.io • ౦େদඌݚ ླ໦͞Μ͕։ൃ • ڧԽֶशΞʔΩςΫνϟษڧձͷΦʔΨφΠβͷ1ਓ • ਂ૚ੜ੒ϞσϧΛهड़͢ΔϥΠϒϥϦͱͯ͠
 ֬཰ม਺ɹɹɹͷಉ࣌෼෍ɹɹɹɹΛҙ໊ࣝͯ͠෇͚ΒΕ͍ͯΔ 54 x, y, z P(x, y, z)

Slide 55

Slide 55 text

3छྨͷAPIʹΑΔ֊૚తͳߏ଄ • ֤API͕ׯব͠ͳ͍ͨΊɼࣗ༝ʹωοτϫʔΫ΍෼෍ɾ໨తؔ਺Λߏ੒ɾมߋՄೳ • طଘͷ֬཰ϞσϦϯάݴޠͰ͸ɼ֬཰෼෍ͱωοτϫʔΫΛಉ࣌ʹهड़͢Δඞཁ͕͋ͬͨ • ྫ) Edward Pixyzͷ3ͭͷAPI 55

Slide 56

Slide 56 text

1. Distribution API ֬཰෼෍ͷAPI • DistributionΫϥεΛܧঝͯ͠ωοτϫʔΫΛఆٛ͢Δ • torch.distributions ʹؚ·ΕΔ΋ͷͱ΄΅ಉ͡ॻ͖ํ • ಉ࣌෼෍ͷҼ਺෼ղΛɼ෼෍ͷֻ͚ࢉͱͯ͠௚઀هड़Ͱ͖Δ • ෼෍ͷੵͱͯ͠ߏ੒͞ΕΔ෼෍΋ɼಉ༷ʹ෼෍ͱͯ͠αϯϓϦϯά΍໬౓ܭࢉ͕Մೳ 56

Slide 57

Slide 57 text

2. Loss API DistributionΫϥεΛ΋ͱʹɼޡࠩؔ਺΍ԼքΛܭࢉ͢Δ • σʔλΛҾ਺ͱͯ͠estimateϝιουΛ࢖͏͜ͱͰ஋ΛධՁͰ͖Δ (define-and-run) • ༷ʑͳLoss͕طʹఆٛ͞Ε͍ͯΔ • ྫ) ෛͷର਺໬౓(NLL)ɼKLμΠόʔδΣϯε(KullbackLeibler)… • Lossؒͷ࢛ଇԋࢉ͕ՄೳͳͷͰɼෳࡶͳਂ૚ੜ੒Ϟσϧ͕༻ҙʹهड़Ͱ͖Δ 57 − ∑ x,y∼pdata (x,y) [ Eq(z|x,y) [log p(x, z|y) q(z|x, y) ] + α log q(y|x) ] − ∑ xu ∼pdata(xu) Eq(z|xu ,y)q(y|xu ) [ log p (xu , z|y) q(z|xu , y)q(y|xu ) ] ྫ) M2Ϟσϧ[Kingma+ 2014]

Slide 58

Slide 58 text

3. Model API Loss΍optimizerΛModelΫϥεʹ౉ͯ͠ϞσϧΛఆٛ • trainϝιουͰֶशɼtestϝιουͰධՁ ग़དྷ߹͍ͷϞσϧ΋༻ҙ͞Ε͍ͯΔ • ؆୯ͳϞσϧͰͬ͞͞ͱ࣮૷͍ͨ͠ਓ޲͚ • ྫ) VAEɼGANɼม෼ਪ࿦(VI)ɼ࠷໬ਪఆ(ML) https://docs.pixyz.io/en/latest/models.html 58

Slide 59

Slide 59 text

PixyzϋϯζΦϯ 59

Slide 60

Slide 60 text

1. Πϯετʔϧ લఏ: PyTorch͕Πϯετʔϧࡁ • ͜ͷลΛΈ͍ͯͩ͘͞ https://pytorch.org/get-started/locally/ • ;ͭ͏͸ɼɹɹɹɹɹɹɹɹɹɹɹͰΑ͍ͱࢥΘΕΔ 1) PixyzͷgithubϨϙδτϦ͔Βclone 2) pip install • কདྷɼόʔδϣϯ͕҆ఆͨ͠ΒPyPIʹొ࿥͢Δ༧ఆͩͦ͏Ͱ͢(git clone͢Δඞཁͳ͘ͳΔ) 60 git clone https://github.com/masa-su/pixyz.git pip install -e pixyz pip install torch torchvision

Slide 61

Slide 61 text

2. ࢖ͬͯΈΔ PixyzʹΑΔ࣮૷ͷجຊతͳྲྀΕ 1. ෼෍Λఆٛ͢Δ • ෼෍ͷੵ΋෼෍ͱͯ͠ΈͳͤΔʂ 2. ໨తؔ਺ɾϞσϧΛఆٛ͢Δ • Model APIɼLoss APIɼDistribution APIͷ3ͭͷॻ͖ํ͕ଘࡏ • Lossಉ࢜ͷ࢛ଇԋࢉ͕Ͱ͖Δʂ 3. ֶश͢Δ • ModelΫϥεΛܧঝͨ͠৔߹͸ɼmodel.train()ͰOKʂ 61

Slide 62

Slide 62 text

ࠓ೔ͷνϡʔτϦΞϧࢿྉ ʮश͏ΑΓ׳ΕΑʯͱ͍͏͜ͱͰ༻ҙͯ͠Έ·ͨ͠ • https://github.com/TMats/rlarch-pixyz-tutorial • 00: PixyzͰѻ͏֬཰෼෍ʹ͍ͭͯ • 01: Model APIͷVAEΫϥεΛ࢖ͬͯɼvanillaͳVAE[Kingma+ 2014]Λ࣮૷͢Δ • 02: Loss APIΛ࢖ͬͯɼΑΓෳࡶͳਂ૚ੜ੒ϞσϧΛ࣮૷͢Δ • M2Ϟσϧ[Kingma+ 2014] • ॳΊͯ͜ͷࢿྉΛར༻͢ΔͷͰɼࠓޙͷࢀߟͷͨΊʹɼ࣭໰ɾίϝϯτͳͲ͋Ε͹ͥͻ͓ ئ͍͠·͢ 62

Slide 63

Slide 63 text

Pixyzͷ͏Ε͍͠ͱ͜Ζ define-by-runͱdefine-and-runͷ͍͍ͱ͜ͲΓΛ͍ͯ͠Δ • ʮωοτϫʔΫ͸PyTorchͷΑ͏ʹॊೈʹධՁ͍͚ͨ͠ΕͲɼLoss͸ઌʹܾΊ͓͍ͯͯॻ ͍ͨ΋ͷ͕ਖ਼͍͔͠Ͳ͏͔ࣜΛݟͯ֬ೝ͍ͨ͠ʯͱ͍͏ؾ࣋ͪʹԠ͑ͯ͘ΕΔϥΠϒϥϦ • ωοτϫʔΫͱ֬཰෼෍Λॻ͘ϨΠϠΛ੾Γ཭͔ͨ͠ΒͰ͖ٕͨ • ݁Ռͱͯ͠ɼ࿦จͷॻ͔ΕͨLossͷࣜΛͦͷ··ࣸ͠औΔײ͡Ͱ࣮૷Ͱ͖Δ • ࣮ݧ͢Δͱ͖΋ɼωοτϫʔΫͷ໰୊ͱLossͷ໰୊Λ੾Γ཭࣮ͯ͠ݧͰ͖Δ Loss APIʹΑͬͯɼҟͳΔਂ૚ੜ੒ϞσϧΛಉҰͷϑϨʔϜϫʔΫͰࠞͥͯॻ͚Δ • ྫ) GANͱVAEͷLossͷ࿨͕औΕΔ 63

Slide 64

Slide 64 text

Pixyzoo Pixyzoo • PixyzΛར༻ͨ͠ਂ૚ੜ੒Ϟσϧͷ࣮૷ϨϙδτϦ • ΋ͪΖΜGan ZooΈ͍ͨͳͷΛҙ͍ࣝͯ͠Δ • https://github.com/masa-su/pixyzoo • ݱࡏɼGQNɾVIBɾFactorVAEͳͲ͕ೖ͍ͬͯΔ • ଓʑ௥Ճ͍ͨ͠ • ϓϧϦΫେ׻ܴͰ͢ • pixyzooϨϙδτϦΛforkͯ͠ϓϧϦΫΛૹ͍ͬͯͩ͘͞ 64

Slide 65

Slide 65 text

ୈ2෦ͷAppendix 65

Slide 66

Slide 66 text

References [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 66

Slide 67

Slide 67 text

࣭ٙԠ౴ɾσΟεΧογϣϯ ٳܜ 67

Slide 68

Slide 68 text

ୈ3෦: ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE 68

Slide 69

Slide 69 text

Generative Query Network (GQN) 69

Slide 70

Slide 70 text

GQNͱ͸ʁ /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH<&TMBNJ > • 4."MJ&TMBNJ %BOJMP+3F[FOEF FUBM 4DJFODF   • ͪͳΈʹ4DJFODFຊࢽͷهࣄ͸࣮૷্શ͘ࢀߟʹͳΒͳ͍4VQQMFNFOUBMΛಡΈ·͠ΐ͏ • ෳ਺ͷࢹ఺ʹ͓͚Δը૾Λ΋ͱʹɼผͷࢹ఺͔Βͷը૾Λੜ੒͢Δ(FOFSBUJWF2VFSZ /FUXPSL (2/ ΛఏҊ
 IUUQTXXXZPVUVCFDPNXBUDI UJNF@DPOUJOVFW3#+'OH/2P • ΋͠ɼࢹ఺ͷҐஔʹΑΒͳ͍ঢ়ଶͷදݱ͕֫ಘͰ͖ΔͳΒ͏Ε͍͠ ண໨͍ͯ͠Δཧ༝  • ڊେͳDPOEJUJPOBM7"&Λར༻ 70

Slide 71

Slide 71 text

GQNͱ͸ʁ /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH 71 https://deepmind.com/blog/neural-scene-representation-and-rendering/#gif-207

Slide 72

Slide 72 text

GQNͷৄ͍͠ࢿྉ <%-ྠಡձ>(2/ͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ • ౦େদඌݚླ໦͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMHRO
 /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHͷղઆ • ౦େ૬ᖒݚ.ۚࢠ͞Μ
 IUUQTXXXTMJEFTIBSFOFU.BTBZB,BOFLPOFVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHE
 <%-)BDLT>1Z5PSDI 1JYZ[ʹΑΔ(FOFSBUJWF2VFSZ/FUXPSLͷ࣮૷ • ౦େদඌݚ#୩ޱ͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMIBDLTQZUPSDIQJYZ[HFOFSBUJWFRVFSZ OFUXPSL 72

Slide 73

Slide 73 text

GQNͷ໰୊ઃఆ σʔληοτ • ɹݸͷγʔϯ ؀ڥ ͦΕͧΕʹର͠ɼɹݸͷ࠲ඪͱͦͷ࠲ඪ͔Βͷ
 3(#ը૾͔ΒͳΔର • ɹݸ໨ͷγʔϯͷɹݸ໨ͷ3(#ը૾ • ɹݸ໨ͷγʔϯͷɹݸ໨ͷࢹ఺ WJFXQPJOU  ໰୊ઃఆ • ɹݸͷ؍ଌʢจ຺ʣɹɹɹɹɹɹͱ೚ҙͷࢹ఺ʢΫΤϦʣ͕༩͑ΒΕͨ΋ͱͰ
 ରԠ͢Δ3(#ը૾ɹɹΛ༧ଌ͢Δɽ • ༗ݶͷ࣍ݩతͳ ը૾ͷ ؍ଌ͔Β͸ɼܾఆ࿦తʹ༧ଌ͢Δ͜ͱ͸Ͱ͖ͳ͍໰୊ • จ຺Ͱ৚͚݅ͮͨ֬཰Ϟσϧ ਂ૚ੜ੒Ϟσϧ ͱͯ͠ղ͘ 73 {(xk i , vk i )} (i ∈ {1,…, N}, k ∈ {1,…, K}) N K vk i xk i i k i k M x1,…,M i , v1,…,M i vq i xq i

Slide 74

Slide 74 text

લఏ: Conditional VAE Conditional VAE [Sohn+ 2015] • VAEʹ೚ҙͷ৘ใɹΛ৚͚݅ͮͨ(conditioned)Ϟσϧ • ࣄલ෼෍Λ ͱͯ͠ϞσϧԽ͢Δ͜ͱͰɼςετ࣌ʹ௚઀ɹΛਪ࿦Ͱ͖Δ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹม෼Լք:ELBO (ෛͷLoss) • ࣄલ෼෍ͱͯ͠ɹʹґଘ͠ͳ͍෼෍ɹɹΛ࢖͏όʔδϣϯ΋͋Δ[Kingma+ 2014] • PixyzνϡʔτϦΞϧʹొ৔ͨ͠M2Ϟσϧ͸͜ͷύλʔϯ 74 !(#|%, ') ' % # )(%|#, ') )(#|') y log $ %|' ≥ ) * + %, ' log $ % +, ' $(+|') /(+|%, ') = ) * + %, ' log $(%|+, ') − 23[/(+|%, ')||$(+|')] p(z|y) z ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ y p(z)

Slide 75

Slide 75 text

Generative Query Network GQNͷϞσϧ • จ຺ɹɹɹɹɹɹɹɹɹͱΫΤϦɹɹͰ৚͚݅ͮͨConditional VAE • ɹ͸ܾఆ࿦తͳม׵(දݱωοτϫʔΫ) άϥϑΟΧϧϞσϧͱͯ͠ͷղऍɹɹɹɹม෼Լք:ELBO (ෛͷLoss) 75 r = f(x1,…,M i , v1,…,M i ) vq i f !" # $ # !", &", ' ( !" #, &", ' )(#|&", ') ' &" จ຺ ΫΤϦ જࡏม਺ log $ %&|(&, * ≥ , & - %&, (&, * log . %& -, (&, * /(-|(&, *) 2 - %&, (&, * = , & - %&, (&, * log . %& -, (&, * − 56[2 - %&, (&, * ||/(-|(&, *)] ࣄલ෼෍ Τϯίʔμ σίʔμ ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ

Slide 76

Slide 76 text

Generative Query Network ม෼Լք (ෛͷଛࣦLoss) 76 ! " # $", &", ' log + $" #, &", ' − -.[0 # $", &", ' ||2(#|&", ')] KL߲ ࠶ߏ੒ Τϯίʔμͱࣄલ෼෍͕ۙͮ͘Α͏ʹֶश
 →ςετ࣌ʹࣄલ෼෍Λ࢖͑͹
 ɹΫΤϦʹର͢Δਅͷը૾ɹ͕ͳͯ͘΋
 ɹจ຺ɹͱΫΤϦɹ͔ΒରԠ͢Δ
 ɹજࡏม਺ɹΛਪ࿦Ͱ͖ΔΑ͏ʹͳΔ͸ͣʂ Τϯίʔμ͸ɼจ຺ɹͱΫΤϦɹɼ
 ରԠ͢Δը૾ɹ͔Βɼજࡏม਺ɹΛਪ࿦ɽ જࡏม਺ɹ͔ΒɼΫΤϦʹରԠ͢Δը૾ɹ͕
 ࠶ߏ੒͞ΕΔΑ͏ʹֶश
 →જࡏม਺ɹ͸ͦͷγʔϯશମΛද͢Α͏ͳ
 ɹԿΒ͔ͷදݱֶ͕श͞ΕΔ͸ͣʂ xq vq r z r vq xq z z xq z

Slide 77

Slide 77 text

ΞʔΩςΫνϟత޻෉: දݱωοτϫʔΫ දݱωοτϫʔΫΛ༻͍ͯɼɹݸͷ؍ଌɹɹɹɹɹɹΛ1ͭͷจ຺ɹʹཁ໿͢Δ • ը૾ͱ࠲ඪͷϖΞʹରͯͦ͠ΕͧΕม׵ͨ͠΋ͷʹؔͯ͠ฏۉΛͱΔ ֤ࢹ఺ʹ͓͚ΔฏۉΛͱΔ͜ͱͰɼ
 ࢹ఺ͷॱ൪ʹґଘ͠ͳ͍(permutation invariant)දݱΛಘΔ • จ຺ʹ༻͍Δࢹ఺ͷ਺Λࣗ༝ʹઃఆͰ͖Δ • 3//ͰϞσϧԽ͢Δͱॱং͕ؔ܎͢Δ ࿨(ฏۉ)Λऔ͍͍ͬͯͷʁͱ͍͏ٙ໰ • ࠷ۙɼҰԠٞ࿦͞Ε͍ͯΔΒ͍͠[Wagstaff+ 2019] 77 x1,…,M i , v1,…,M i M r rk = ψ (xk, vk) r = M ∑ k=1 rk

Slide 78

Slide 78 text

ΞʔΩςΫνϟత޻෉: DRAWͷར༻ DRAW [Gregor+ 2015] • VAEͷ͓͚Δજࡏม਺ɹ΁ͷਪ࿦ΛRNNΛ༻͍ͯෳ਺ճʹ෼͚ͯɼ
 ࣗݾճؼతʹߦ͏͜ͱͰɼϞσϧͷදݱྗΛߴΊΔ • ͜ͷͱ͖ͷELBO͸୩ޱ͞ΜͷࢿྉͰಋग़͞Ε͍ͯΔ(p11-13)
 https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query- network-126329901 • ݁࿦ͱͯ͠͸ɼ࠶ߏ੒ͷ໬౓ͱ֤εςοϓͷKLͷ࿨ʹͳΔ • ࣄલ෼෍ͱΤϯίʔμͷ྆ํʹར༻ 78 z q(z|x) = L ∏ l=1 ql (zl |x, z

Slide 79

Slide 79 text

࿦จதͷ࣮ݧ݁Ռ Roomσʔληοτ • ϥϯμϜͳ࢛͍֯෦԰ʹϥϯμϜͳ਺ʢ1~3ʣͷ༷ʑͳ෺ମΛ഑ஔ • นͷςΫενϟ: 5छྨ চͷςΫενϟ: 3छྨ ෺ମͷܗঢ়: 7छྨ • αΠζɼҐஔɼ৭͸ϥϯμϜɽϥΠτ΋ϥϯμϜ • 2ສछྨͷγʔϯΛϨϯμϦϯά • σʔληοτ(͚ͩ)͸ެ։͞Ε͍ͯΔ • Roomͷଞʹ΋਺छྨͷσʔληοτ͕ଘࡏ
 https://github.com/deepmind/gqn-datasets
 • ৽͍͠ࢹ఺Ͱͷը૾͕༧ଌͰ͖͍ͯΔ͜ͱ͕
 ఆੑతʹΘ͔Δ(ӈਤ) 79

Slide 80

Slide 80 text

࿦จதͷ࣮ݧ݁Ռ Scene Algebra • ֶशͨ͠ωοτϫʔΫΛ༻͍ͯɼจ຺ ্Ͱͷ଍͠ࢉɾҾ͖ࢉΛߦ͏ • word2vecͷΑ͏ʹ༧ଌ݁Ռ͕ԋࢉ௨Γʹͳ͓ͬͯΓɼߏ੒తͳදݱʹͳ͍ͬͯΔ • Ͳ͜·Ͱߏ੒తͳͷ͔͸ٙ໰͕࢒Δ͕… • σʔληοτશମͰมԽͷόϦΤʔγϣϯ͕ൺֱత୯७ͳͷͰͰ͖͍ͯΔͷs͔΋ 80 r

Slide 81

Slide 81 text

PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/GQN • ౦େদඌݚB4 ୩ޱ͞ΜʹΑΔ࣮૷ • Eslami͞Μ(1st Author)͔ΒϋΠύϥ௚఻ • DeepMind͸جຊతʹ࣮૷Λެ։͍ͯ͠ͳ͍ͷͰɼ
 ͓ͦΒ͘࠷΋஧࣮ͳ࣮૷ͳ͸ͣ • ࿦จͰ͸ɼK80(24GB)4ຕར༻͍ͯ͠Δͱͷ͜ͱ • खݩͰ֬ೝͯ͠ɼTitanX(12GB)4ຕʹΪϦΪϦ৐Δ͙Β͍ • ύϥϝʔλ਺ݮΒͯ͠΋ͦΜͳʹӨڹͳ͍ • චऀ(Eslami͞ΜɼRezende͞Μ)ʹ΋঺հͯ͠΋Β͍·ͨ͠ • PixyzͷDeepMindσϏϡʔ(?) 81 https://twitter.com/arkitus/status/1072845916850274304

Slide 82

Slide 82 text

PixyzʹΑΔ࣮૷ ෼෍ͷఆٛLossɾϞσϧͷఆٛ 82 ࣄલ෼෍ɾσίʔμ
 ɹgeneraton.py Τϯίʔμ ɹinference.py

Slide 83

Slide 83 text

PixyzʹΑΔ࣮૷ LossɾϞσϧͷఆٛ 83 ɹmodel.py ෼෍ͷΠϯελϯε DRAW

Slide 84

Slide 84 text

PixyzʹΑΔ࣮૷ ݱࡏɼDRAWͷ෦෼ͰforϧʔϓͷதͰ1εςοϓ͝ͱʹlossΛධՁ͍ͯ͠Δ Q. ΋ͬͱ៉ྷʹ͔͚ͳ͍ͷʁ A. ࣍ͷόʔδϣϯ(0.0.5)ͰࣗݾճؼϞσϧʹରԠ͢Δ༧ఆ • ۙ೔தʹmasterʹϚʔδ༧ఆͩͦ͏Ͱ͢ • ͔ͳΓ؆ܿʹͳΔ͸ͣ(࣍ʹ঺հ͢ΔTD-VAEͰ͸ར༻͍ͯ͠Δ) ͱ͸͍͑ɼ
 ωοτϫʔΫͱ֬཰෼෍ͷ࣮૷͕෼཭͞Ε͓ͯΓɼPixyzͷྑ͕͞ੜ͖͍ͯΔ 84

Slide 85

Slide 85 text

σΟεΧογϣϯ ݁ہજࡏදݱͱͯ͠Կ͕֫ಘ͞Ε͍ͯΔͷ͔ʁ(ঢ়ଶදݱֶशత؍఺) • ෺ମͷදݱɼγʔϯͦͷ΋ͷͷදݱɼࢹ఺ؒͷؔ܎ͱ؍ଌͷؔ܎ • ͜ΕΒΛͲ͏΍ͬͯऔΓग़ͯ͠ར༻͢Δͷ͔ʁ • ࣮ੈքʹసҠͤ͞ΔͱͲ͏ͳΔ͔ʁ • ݱ࣮ੈքͰࡱӨͨ͠ը૾Λ࢖ͬͯGQNΛֶशͤ͞ΔϓϩδΣΫτ
 https://github.com/brettgohre/still_life_rendering_gqn Ͳ͏΍ͬͯΤʔδΣϯτͷߦಈͷܾఆ(ڧԽֶश)ʹ࢖ͬͯΏ͔͘ʁ • ͲΜͳΞϓϦέʔγϣϯ͕͋ΓಘΔ͔ʁ ϝλֶशͷจ຺ • λεΫΛͲ͏ఆٛ͢Δͷ͔ɼԿͰ৚͚݅ͮΔͷ͔͕໰୊ʹͳ͖͍ͬͯͯΔ 85

Slide 86

Slide 86 text

Temporal Difference Variational Auto-Encoder (TD-VAE) 86

Slide 87

Slide 87 text

TD-VAEͱ͸ʁ 5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS<(SFHPS > • ,BSPM(SFHPS (FPSHF1BQBNBLBSJPT FUBM *$-30SBM   • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧΛఏҊͨ͠ • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧͰ͸ɼεςοϓ͝ͱʹਪ࿦Λߦ͏͜ͱ͕ओྲྀͰ͋ͬͨ
 ୈ෦ͷॱϞσϧ ͕ɼ5%7"&͸೚ҙͷεςοϓ·Ͱδϟϯϓͯ͠ਪ࿦Ͱ͖Δ • ͜ΕΛ࢖ͬͯ࣌ܥྻͷந৅ԽʹऔΓ૊Ή͜ͱ͕Ͱ͖ͳ͍ͩΖ͏͔ ண໨͍ͯ͠Δཧ༝  • 3//Λ༻͍ͨʮ৴೦ঢ়ଶʯͷಋೖͱ&-#0ͷ
 ෼ղͷ࢓ํ͕ΧΪ 87

Slide 88

Slide 88 text

TD-VAEͷৄ͍͠ࢿྉ (2/ʹൺ΂ͯ͋Μ·Γͳ͍ʜ <%-ྠಡձ>5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS • ౦େদඌݚླ໦͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMUFNQPSBMEJGGFSFODFWBSJBUJPOBMBVUPFODPEFS 88

Slide 89

Slide 89 text

ͲΜͳঢ়ଶදݱ͕޷·͍͔͠ʁ ࿦จதͰݴٴ͞Ε͍ͯΔɼΤʔδΣϯτͷঢ়ଶදݱ͕࣋ͭ΂͖ੑ࣭ σʔλͷந৅తͳঢ়ଶදݱΛֶश͠ɼ؍ଌͰ͸ͳ͘ঢ়ଶͷϨϕϧͰ༧ଌͰ͖Δ͜ͱ ͋Δ࣌ؒ·Ͱͷશͯͷ؍ଌ͕༩͑ΒΕͨ΋ͱͰɼ
 ঢ়ଶͷϑΟϧλϦϯά෼෍Λܾఆతʹίʔυͨ͠৴೦ঢ়ଶ CFMJFGTUBUF ΛֶशͰ͖Δ͜ͱ • ৴೦ঢ়ଶ͸ΤʔδΣϯτ͕࣋ͭੈքʹؔ͢Δঢ়ଶͷશͯͷ৘ใͱɼ࠷దʹߦಈ͢Δํ๏ΛؚΜ Ͱ͍Δ ਺εςοϓઌͷδϟϯϓͨ͠ະདྷΛ༧ଌ͢Δ͜ͱɽ
 ࣌ܥྻશͯΛޡࠩٯ఻೻ͤͣʹ࣌ؒతʹ཭Εͨ࣌఺͔ΒֶशͰ͖ΔΑ͏ʹ͢Δ͜ͱʹΑͬ ͯɼ࣌ܥྻతͳந৅ԽΛߦΘΕΔ͜ͱ ͜ΕΒͷੑ࣭Λຬͨ͢Ϟσϧͱͯ͠ɼ5%7"&ΛఏҊ 89

Slide 90

Slide 90 text

લఏ: ࣗݾճؼϞσϧ ࣗݾճؼϞσϧ (Autoregressive Model) • ܥྻσʔλɹɹɹɹɹɹɹΛϞσϦϯά͢Δํ๏ • νΣʔϯϧʔϧΛ༻͍ͯɼ໬౓Λ৚݅෇͖෼෍ͷੵʹ෼ղ (ࣜ͸྆ลର਺Λͱͬͨ) • RNNΛ༻͍࣮ͯ૷Ͱ͖Δ • ໰୊఺ • ؍ଌۭؒͰ͔͠༧ଌ͠ͳ͍ͨΊɼσʔλͷѹॖͨ͠දݱΛֶश͠ͳ͍ • ֤εςοϓͰσίʔυɾΤϯίʔυΛ͢ΔͨΊܭࢉྔ͕େ͖͍ • ܇࿅࣌ʹ͸࣍ͷεςοϓͷσʔλ͕ೖͬͯ͘Δ͕(ڭࢣڧ੍)ɼςετ࣌ʹ͸ࣗ਎ͷ༧ଌΛೖྗ ͢ΔͨΊෆ҆ఆ 90 x = (x1 , …, xT) log p (x1 , …, xT) = ∑ t log p (xt |x1 , …, xt−1) ht = f (ht−1 , xt)

Slide 91

Slide 91 text

લఏ: ঢ়ଶۭؒϞσϧ ঢ়ଶۭؒϞσϧ (State-space Model) • ܥྻσʔλɹɹɹɹɹɹ Λજࡏม਺(ঢ়ଶ) ɹ Λ༻͍ͯϞσϦϯά͢Δํ๏ • ɹͱɹͷಉ࣌෼෍: • Τϯίʔμ: • ɹͷೖྗͱͯ͠ɹ·ͰͷܥྻɹɹɹɹɹΛ༻͍Δ৔߹ɿϑΟϧλϦϯά
 ɹɹɹɹɹɹɹܥྻશମɹΛ༻͍Δ৔߹ɿεϜʔδϯά • ม෼Լք:ELBO (ෛͷLoss) • ঢ়ଶؒͰͷભҠΛϞσϧԽ͢Δ • ςετ࣌ʹ֤εςοϓͰͷσίʔυɾΤϯίʔυ͕ඞཁͳ͍ 91 x = (x1 , …, xT) z = (z1 , …, zT) x z p(x, z) = ∏ t p (zt |zt−1) p (xt |zt) q(z|x) = ∏ t q (zt |zt−1 , ϕt (x)) log p(x) ≥ z∼q(z|x) [∑ t log p (xt |zt) + log p (zt |zt−1) − log q (zt |zt−1 , ϕt (x)) ] σίʔμ ঢ়ଶભҠ ϕt t x (x1 , …, xt) !"#$ %"#$ !" %"

Slide 92

Slide 92 text

ϑΟϧλϦϯά෼෍ͷಋೖ ঢ়ଶۭؒϞσϧͰ͸ɼঢ়ଶɹΛಘΔͨΊʹલͷεςοϓͷঢ়ଶɹ ͕ඞཁ • ͦͷͨΊʹ͸࣍ʑʹɹɹɹɹɹɹɹͷϦαϯϓϦϯά͕ඞཁ ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖ • ؍ଌͷܥྻɹɹɹɹͷΈʹґଘ͢ΔΑ͏ʹ͢Δ • POMDPͷڧԽֶशʹ͓͚Δ৴೦ঢ়ଶʹ૬౰ 92 zt zt−1 !"#$ %"#$ !" %" zt−1 , zt−2 , …, z1 p(zt |x1 , …, xt ) (x1 , …, xt) !"#$ !" %"#$ %" & %" !$ , . . , !"

Slide 93

Slide 93 text

ϑΟϧλϦϯά෼෍ͷಋೖ ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖͯ͠ELBOΛಋग़ • ϑΟϧλϦϯά෼෍ʹΑͬͯɼજࡏม਺͸ ͷ2͚ͭͩͰදݱͰ͖Δ 93 log(x) = ∑ t log p(xt |xx

Slide 94

Slide 94 text

ϑΟϧλϦϯά෼෍ͷ࣮૷ TD-VAEͰ͸ϑΟϧλϦϯά෼෍ΛRNNΛ༻͍࣮ͯ૷͍ͯ͠Δ • ৴೦ঢ়ଶΛද͢ม਺Λɹɹͱͯ͠ɼ֤εςοϓͷ৴೦ঢ়ଶΛɹɹɹɹɹɹͱϞσϧԽ • ৴೦ঢ়ଶɹ͸աڈͷ؍ଌͷܥྻ ͷ৘ใΛؚΜͰ͍Δͱߟ͑ΒΕΔ • ͜ͷͱ͖ɼม෼Լք:ELBO (ෛͷLoss)͸ 94 bt bt = f (bt−1 , xt) bt (x1 , …, xt) pB(zt |bt)q(zt−1 |zt , bt−1 , bt) [log p (xt |zt) + log pB (zt−1 |bt−1) + log p (zt |zt−1) −log pB (zt |bt) − log q (zt−1 |zt , bt−1 , bt)] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ !"#$ %"#$ !" %" &"#$ &"

Slide 95

Slide 95 text

࣌ؒεςοϓͷδϟϯϓ ࠓ·Ͱͷٞ࿦Λ1εςοϓͷભҠ͔Βɼ਺εςοϓͷભҠʹ֦ு͢Δ • දه͕มΘΔ͚ͩɼม෼Լք:ELBO(ෛͷLoss)͸ • ֶश࣌͸ɼδϟϯϓ͢Δεςοϓ਺ ΛɹɹɹͷൣғͰαϯϓϦϯάֶͯ͠श • ঢ়ଶભҠͷೖྗʹɹΛՃ͑Δ • ςετ࣌͸ɼ
 ɹˠϑΟϧλϦϯά෼෍→ →ঢ়ଶભҠ→ →σίʔμ→ ͱͯ͠༧ଌ͕Ͱ͖Δ 95 pB (zt2 |bt2 )q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ xt1 zt1 zt2 ̂ xt2 δ = t2 − t1 [1,D] δ p(zt2 |z t 1 , δ)

Slide 96

Slide 96 text

TD-VAEͷֶश TD-VAEͷม෼Լք: ELBO (ෛͷLoss) • ᶃϑΟϧλϦϯά෼෍ɹɹɹ ͔ΒɹΛαϯϓϧ • ᶄͦΕΛ࢖ͬͯɼΤϯίʔμɹɹɹɹ ͔ΒɹɹΛαϯϓϧ • ୈ2߲ͱୈ4߲͸ɼΤϯίʔμͱϑΟϧλϦϯά෼෍ͷKLμΠόʔδΣϯεʹͳΔ 96 zt2 ∼pB (zt2 |bt2 ),zt1 ∼q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ᶃϑΟϧλϦϯά෼෍͔Β
 ɹαϯϓϧ ᶄΤϯίʔμ͔Β
 ɹαϯϓϧ pB (zt2 |bt2 ) zt2 zt1 q(zt1 |zt2 , bt1 , bt2 ) KL [q(zt1 |zt2 , bt1 , bt2 )||pB (zt1 |bt1 )]

Slide 97

Slide 97 text

PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/TD-VAE • ModelΫϥεΛܧঝ • ࣗݾճؼ༻ͷ IterativeLoss Λ༻͍͍ͯΔ • Pixyz v0.0.5Ҏ্͕ඞཁ • ΤϯίʔμपΓͷLoss͕
 ਧͬඈͿࣗମ͕ى͖͍ͯΔͷͰ
 ϋΠύϥνϡʔχϯά͕ඞཁ͔ʁ • GQNͷͱ͖΋େมͩͬͨ 97 ෼෍ Loss

Slide 98

Slide 98 text

࿦จதͷ࣮ݧ ෦෼؍ଌMiniPacman • 1εςοϓ͝ͱʹ༧ଌ͢ΔϞσϧͰɼఏҊख๏ͷΤϯίʔμͱଞͷΤϯίʔμΛൺֱ
 ɹɹTD-VAEͷΤϯίʔμ
 ɹɹfilteringϞσϧͷΤϯίʔμ
 ɹɹmean-fieldϞσϧͷΤϯίʔμ • ELBOͱෛͷର਺໬౓ʹؔͯ͠TD-VAEͷΤϯίʔμ͕ྑ͍͜ͱΛࣔ͢ 98 q (zt−1 |zt , bt−1 , bt) q (zt |zt−1 , bt) q (zt |bt) !"#$ %"#$ !" %" &"#$ &"

Slide 99

Slide 99 text

࿦จதͷ࣮ݧ MovingMNIST • ਺ࣈ͕ࠨӈʹಈ͘MNISTͰɼεςοϓΛεςοϓΛඈ͹ͨ͠༧ଌΛͤ͞Δ࣮ݧ • 1͔Β4εςοϓͷؒͰඈ͹ֶͯ͠श • ͳΜ͔஌ͬͯΔMovingMNIST͡Όͳ͍Α͏ͳؾ͕͢Δ….. • ଞʹ΋DeepMind LabΛ༻͍࣮ͨݧΛ͍ͯ͠Δ • ΞʔΩςΫνϟʹConvDRAW[Gregor+ 2016]Λ࢖ͬͨͱͷ͜ͱ 99

Slide 100

Slide 100 text

σΟεΧογϣϯ TDͷҙຯ͢Δͱ͜Ζ • Τϯίʔμ͕աڈ΁ͷਪ࿦ʹͳ͍ͬͯΔ෦෼͕Temporal DIfferenceͬΆ͍ • ҰԠɼ4.3અʹهड़͸͋Δ ࣌ؒํ޲ͷந৅Խ • ࣌ܥྻͷந৅Խ͸ڧԽֶशʹͱͬͯେ͖ͳ՝୊ • TD-VAEͰ͸ɼ௚઀తʹ͸ߦಈΛѻ͍ͬͯͳ͍ • τϧΫϨϕϧͷ੍ޚͷ࣌ܥྻ͔Βɼ࣌ܥྻతʹந৅Խ͞ΕͨߦಈϓϦϛςΟϒ͕࡞ΒΕͦͷϓϦ ϛςΟϒ্Ͱ୳ࡧͰ͖Δͱ୳ࡧޮ཰తʹ΋ྑͦ͞͏ͩࣗ͠વͳؾ΋͢Δ TD-VAEͷRNNʹશͯΛୗ͍ͯ͠Δײ • RNNͷදݱྗͷ໰୊ 100

Slide 101

Slide 101 text

ୈ3෦ͷAppendix 101

Slide 102

Slide 102 text

References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http:// science.sciencemag.org/content/360/6394/1204 [Gregor+ 2015] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra. DRAW: A Recurrent Neural Network For Image Generation. https://arxiv.org/abs/1502.04623 [Gregor+ 2016] Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra. Towards Conceptual Compression. https://arxiv.org/abs/1604.08772 [Gregor+ 2019] Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. Temporal Difference Variational Auto-Encoder. https://openreview.net/forum?id=S1x4ghC9tQ [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Sohn+ 2015] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 3483–3491, 2015. https://papers.nips.cc/ paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 [Wagstaff+ 2019] Edward Wagstaff, Fabian B. Fuchs, Martin Engelcke, Ingmar Posner, Michael Osborne. On the Limitations of Representing Functions on Sets. https://arxiv.org/abs/1901.09006 102

Slide 103

Slide 103 text

࣭ٙԠ౴ɾσΟεΧογϣϯ 103

Slide 104

Slide 104 text

ँࣙ ຊൃදʹ͋ͨΓɼಛʹҎԼͷํʑͷ͝ڠྗΛ͍͖ͨͩ·ͨ͠ ླ໦խେ͞Μ • (2/ɾ5%7"&ͷྠಡࢿྉͷఏڙ • 1JYZ[։ൃɼ࣮૷ͷ૬ஊ ୩ޱঘฏ͞Μ • (2/࣮૷ େม͋Γ͕ͱ͏͍͟͝·ͨ͠ 104