Upgrade to Pro — share decks privately, control downloads, hide ads and more …

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

1) 強化学習のための状態表現学習と世界モデル

強化学習問題において,「状態」は所与のものとして考えがちであるが,必ずしもエージェントの観測そのものを用いることが良いとは限らない.例えば,部分観測問題であれば,エージェントが過去の観測を何らかの形で記憶して利用することが有益であろう.そのため,効率的な強化学習のためには,エージェントの過去の観測から有益な「状態」の表現を学習するようにモデルを設計することが有望である.このような状態表現や状態遷移を学習し,エージェントの環境のモデリングを行うモデルは「世界モデル」[1]や,「内部モデル」と呼ばれており,近年,画像など高次元の入力に対応するために状態表現学習に深層生成モデルを用いる研究が数多く発表されている.これらの研究を,2018年にarXivに投稿されたレビュー論文[2]に基づきながら整理して議論する.

2) 深層生成モデルライブラリPixyzハンズオン

様々な深層生成モデルを簡潔に記述することのできる,PyTorchベースのライブラリであるPixyz[3]のハンズオンを行う(PyTorchが使用可能なラップトップがあると便利だと思います).

3) 最近の世界モデル研究紹介: GQN・TD-VAE

英DeepMind社から2018年に発表された世界モデル関連の研究である,Generative Query Network (GQN)[4] とTemoral Difference Variational Auto-Encoder (TD-VAE) [5]の2つのモデルに関して,Pixyzによる実装例を交えながら説明を行う.これらのモデルの応用やその先の展望を議論したい.

Tatsuya Matsushima

February 05, 2019
Tweet

More Decks by Tatsuya Matsushima

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ ౦ژେֶ ޻ֶܥݚڀՊ ٕज़ܦӦઓֶུઐ߈ দඌݚڀࣨ M1 দౢ ୡ໵ (Tatsuya Matsushima)

    • ਓؒͱڞੜͰ͖ΔΑ͏ͳదԠతͳϩϘοτͷ։ൃͱɼ
 ͦͷΑ͏ͳϩϘοτΛ࡞Δ͜ͱͰੜ໋ੑ΍ਓؒͷ஌ೳΛ
 ߏ੒తʹཧղ͢Δ͜ͱʹڵຯ͕͋Γ·͢ɽ • ࠷ۙɼ೔ܦΫϩετϨϯυ͞ΜͰهࣄΛॻ͖·ͨ͠ • দඌݚ͕஫໨ʂ AIͷʮ਎ମੑʯΛάʔάϧ΍ϑΣΠεϒοΫ͕ݚڀ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00001/ • ϩϘοτ੍ޚʹେ੾ͳʮঢ়ଶʯදݱɹσʔλ͔Β؀ڥͷදݱΛֶͿ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00015/ 2 @__tmats__
  2. ͓͠ͳ͕͖ ୈ1෦: 19:00-19:25 ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ • ڧԽֶश໰୊ʹ͓͚Δঢ়ଶͷදݱΛֶश͢Δํ๏Λ·ͱΊΔɽ
 ͜ͷදݱ͸ɼ؀ڥΛԿΒ͔ͷܗͰϞσϧԽͨ͠ʮੈքϞσϧʯͱͳ͍ͬͯΔ͜ͱ͕๬·͍͠ ୈ2෦: 19:30-20:00 ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ

    • ۙ೥ɼʮੈքϞσϧʯ͸ਂ૚ੜ੒ϞσϧΛ༻͍࣮ͯ૷͞ΕΔ͜ͱ͕ଟ͍ɽ
 ਂ૚ੜ੒ϞσϧΛ؆ܿʹॻ͚ΔϥΠϒϥϦPixyzͷνϡʔτϦΞϧΛߦ͏ɽ ୈ3෦: 20:05-20:35 ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE • 2018೥ʹӳDeepMind͔Βൃද͞Εͨ2ͭͷੈքϞσϧʮGQNʯͱʮTD-VAEʯΛ
 PixyzʹΑΔ࣮૷ྫΛަ͑ͳ͕Βղઆ͢Δɽ 3
  3. ൃද಺༰ʹ͍ͭͯ (ຊൃදͰϕʔεͱ͍ͯ͠Δ࿦จ) State Representation Learning for Control: An Overview •

    https://arxiv.org/abs/1802.04181 (Last revised 5 Jun 2018) • Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat • S-RL Toolboxͱ͍͏πʔϧ΋࡞੒͍ͯ͠Δ https://github.com/araffin/robotics-rl-srl • ੍ޚλεΫʹ༻͍Δঢ়ଶͷදݱֶशʹؔ͢ΔϨϏϡʔ࿦จ • UC BerkeleyΛத৺ʹ੝Μʹݚڀ͞Ε͍ͯΔ෼໺ • ೔ຊͰ͸͋Μ·Γݟͳ͍ؾ͕͢Δ • Χόʔ͞Ε͍ͯͳ͍ଞͷ࿦จ΋ຊൃදͰ͸௥Ճͨ͠ 5
  4. ঢ়ଶදݱֶशͱ͸ʁ දݱֶश (representation learning) • σʔλ͔Βabstructͳಛ௃Λݟ͚ͭΔֶश ঢ়ଶදݱֶश(state representation learning, SRL)

    • ঢ়ଶදݱ(state representation)ͱ͸ɼ
 ֶशͨ͠ಛ௃͕௿࣍ݩͰɼ࣌ؒతʹൃల͠ɼΤʔδΣϯτͷߦಈͷӨڹΛड͚Δ΋ͷ • ͜ͷΑ͏ͳදݱ͸ϩϘςΟΫε΍੍ޚ໰୊ʹ༗ӹͰ͋Δͱߟ͑ΒΕΔ cf)࣍ݩͷढ͍ • ྫ) ը૾৘ใ͸ඇৗʹߴ࣍ݩ͕ͩɼϩϘοτͷ੍ޚͷ໨తؔ਺͸΋ͬͱ௿࣍ݩʹදݱ͞Ε͏Δ • ϚχϐϡϨʔγϣϯͷ৔߹ɼ෺ମͷ3࣍ݩͷҐஔ৘ใ • ੜͷ؍ଌσʔλ͔Β͜ͷঢ়ଶදݱΛݟ͚ͭΔख๏ͷݚڀ͕ओཁͳςʔϚ 6
  5. ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • ਓؒ͸͋ΒΏΔ΋ͷΛ஌֮Ͱ͖ΔΘ͚Ͱ͸͘ɼ৘ใ(ܹࢗ)͔ΒੈքΛϞσϧԽͨ͠಺෦Ϟ σϧΛ࡞Γɼਓؒͷ஌ೳʹେ͖ͳ໾ׂΛ୲͍ͬͯΔͱࢥΘΕΔ • ੈքϞσϧͱ΋͍͏

    • [DLྠಡձ]GQNͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ
 https://www.slideshare.net/DeepLearningJP2016/dlgqn-111725780 • ࠓ·ͰͷهԱ͔ΒະདྷΛ༧ଌ͢Δྗ͕஌ೳ • δΣϑɾϗʔΩϯεʰߟ͑Δ೴ɾߟ͑Δίϯϐϡʔλʱ • ֶशͨ͠಺෦ϞσϧΛ༻͍ͯະདྷΛγϛϡϨʔγϣϯ͠ͳ͕Βߦಈ͍ͯ͠Δ
 ͱߟ͑ΒΕΔ 7
  6. ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • Josh TenenbaumઌੜʹΑΔMITͰͷߨٛ • MIT

    AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
 https://www.youtube.com/watch?v=7ROelYvo8f0 8
  7. ྑ͍දݱͱ͸ʁ ੜͷ؍ଌ৘ใͷແؔ࿈ͳ෦෼Λແࢹͯ͠ɼڧԽֶशʹར༻͢ΔͨΊʹඞཁෆՄܽͳ ৘ใΛΤϯίʔυ͢Δ͜ͱ͕ඞཁ [Böhmer et al., 2015]ʹΑΔྑ͍ঢ়ଶදݱͷఆٛ • Ϛϧίϑੑ͕͋Δ •

    ݱࡏͷঢ়ଶͷΈΛݟΔ͚ͩͰɼ͋ΔํࡦΛ༻͍ͯߦಈΛબ୒͢Δ͜ͱ͕Ͱ͖Δ͙Β͍े෼ͳ৘ ใΛཁ໿͍ͯ͠Δ • ํࡦͷվળͷͨΊʹར༻Ͱ͖Δ • ಉ͡Α͏ͳಛ௃Λ࣋ͭݟͨ͜ͱͷͳ͍ঢ়ଶʹɼֶशͨ͠Ձ஋ؔ਺Λ൚ԽͰ͖Δ • ௿࣍ݩͰ͋Δ 9
  8. ֶशͷ໨తؔ਺ • ؍ଌͷ࠶ߏ੒ • ॱϞσϧ(forward model)ͷֶश • ٯϞσϧ(inverse model)ͷֶश •

    ಛ௃ͷఢରతֶशͷ׆༻ • ใुͷ׆༻ • ͦͷଞͷ໨తؔ਺ • ϋΠϒϦουͳ໨తؔ਺ 19
  9. ֶशͷ໨తؔ਺ ؍ଌͷ࠶ߏ੒ • ࣍ݩѹॖͱͯ͠Α͘࢖ΘΕΔํ๏ • ྫ) PCA[Curran+ 2015]ɼDAEɼVAE[van Hoof+ 2016]ɽ

    • ࣗݾූ߸Խث(auto-encoder)Λ࢖͏ख๏͕ଟ͍ • ը૾ͷ؍ଌΛͦͷ··࢖͏[Mattner+ 2012] • ΦϒδΣΫτͷҐஔΛදݱ͢ΔΑ͏ʹ੍໿͢Δ ྫ)Spatial Softmax [Finn+ 2015] • ؍ଌʹ໨ཱͭಛ௃͕ଘࡏͯ͠ͳ͍ͱ୯ʹ؍ଌΛ࠶ߏ੒͢Δ͚ͩͰ͸ྑ͍දݱʹ͸ͳΒͳ͍ • ྫ)ήʔϜʹ͓͚Δখ͍͞ΞΠςϜ • ҧ͏࣌ؒεςοϓ͔Β࠶ߏ੒ͨ͠Γɼ࣌ؒൃలʹ੍ؔͯ͠໿Λ͔͚Δ͜ͱͰରԠ 20
  10. (ྫ) E2C [Watter+ 2015] Embed to Control: A Locally Linear

    Latent Dynamics Model for Control from Raw Images • VAEΛ༻͍ͨॱϞσϧɽঢ়ଶ(જࡏදݱ)ɹͷભҠΛઢܗͰ͋ΔͱԾఆ. • ࣍ͷ࣌ؒεςοϓͷঢ়ଶͷ༧ଌɹɹͱͦͷঢ়ଶɹɹͷKLΛ
 ͚ۙͮΔ͜ͱͰॱϞσϧΛֶश • ΧϧϚϯϑΟϧλͱͯ͠ఆࣜԽͨ͠΋ͷ΋͋Δ(DVBF)
 [Karl+ 2016] 22 st ̂ st+1 ∼ (μ = Wst + Uat + V, σ) ઢܗ ̂ st+1 st+1
  11. World Models • VAEͱMDN-RNNΛར༻ͨ͠ॱϞσϧ • Vision model (V): ߴ࣍ݩͷ؍ଌσʔλΛVAEΛ༻͍ͯ
 ௿࣍ݩͷίʔυ(ঢ়ଶ)ʹѹॖ

    • Memory RNN (M): աڈͷίʔυ͔Β࣍ͷεςοϓͷ
 ίʔυ(ঢ়ଶ)Λ༧ଌ [DLྠಡձ]World Models
 https://www.slideshare.net/DeepLearningJP2016/dlworld-models-95167842 (ྫ) World Model [Ha+ 2018] 23
  12. ֶशͷ໨తؔ਺ ٯϞσϧ(inverse model)ͷֶश • ͱͬͨߦಈΛਪఆͰ͖ΔΑ͏ʹঢ়ଶͷදݱʹ੍໿Λ՝͢ • ྫ) Learning to Poke

    by Poking [Agrawal+ 2016] • ͍ͭͬͭͨҐஔ(ɹ)ɼ֯౓(ɹ)ɼڑ཭(ɹ)Λਪఆ 24 lt θt pt
  13. (ྫ) ICM [Pathak+ 2017] Curiosity-driven Exploration by Self-supervised Prediction •

    ॱϞσϧͷ༧ଌޡࠩɹɹΛڧԽֶशͷ಺తใुͱͯ͠ར༻ • ΤʔδΣϯτͷ֎෦͔Βͷใु͕εύʔεͳͱ͖ʹ୳ࡧΛଅਐ͢Δ • ٯϞσϧʹΑΔLoss΋ར༻ [DLྠಡձ]Large-Scale Study of Curiosity-Driven Learning
 https://www.slideshare.net/DeepLearningJP2016/dllargescale-study-of-curiositydriven-learning 25 ℒfwd ( ̂ ϕ (ot+1), ̂ f ( ̂ ϕ (ot), at)) = 1 2 ̂ f ( ̂ ϕ (ot), at) − ̂ ϕ (ot+1) 2 2 ℒfwd min θP ,θI ,θF [−λπ(st ; θP) [Σt rt] + (1 − β)ℒinv + βℒfwd] ٯϞσϧ ॱϞσϧ ֎తใु
  14. ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • ࣮ੈքʹؔ͢Δࣄલ஌ࣝ(prior)Λঢ়ଶۭؒʹ൓ө͢ΔͨΊʹɼ໨తؔ਺Λ޻෉͢Δ • ͍Ζ͍Ζͳ΋ͷ͕ఏҊ͞Ε͍ͯΔ
 • Slowness prior [Lesort+

    2017, Jonschkowski+ 2017] • ॏཁͳ΋ͷ͸Ώͬ͘Γͱ࿈ଓతʹಈ͖ɼٸܹͳมԽ͕ى͜ΔՄೳੑ͸௿͍
 
 • Variability [Jonschkowski+ 2017] • ؔ܎ͷ͋Δ΋ͷ͸ಈ͘ͷͰɼঢ়ଶදݱֶश͸ಈ͍͍ͯΔ΋ͷʹ஫໨͢΂͖
 
 28 ℒSlowness (D, ϕ) = [ Δst 2 ] ℒVariabilty (D, ϕ) = [e− st1 − st2 ]
  15. ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • Robotic Priors [Jonschkowski+ 2015]Ͱಋೖ͞Ε͍ͯΔ΋ͷ • Proportionality •

    ҧ͏ঢ়ଶͰ΋ಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓Ͱ͋Δ
 
 • Repeatability • ࣅͨঢ়ଶͰಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓ɾಉ͡ํ޲Ͱ͋Δ
 29 ℒProp (D, ϕ) = [( Δst2 − Δst1 ) 2 |at1 = at2] ℒRep (D, ϕ) = [e− st2 − st1 2 Δst2 − Δst1 2 |at1 = at2]
  16. ֶशͷ໨తؔ਺ ϋΠϒϦουͳ໨తؔ਺ • ࣮ࡍ͸ࠓ·Ͱʹڍ͛ͨ໨తؔ਺ͷ͏ͪɼෳ਺Λ૊Έ߹ΘͤͯSRL͕ߦΘΕΔ͜ͱ͕ଟ͍ 30 ߦಈ/࣍ͷঢ়ଶ ͷ੍໿ ॱϞσϧ
 ※࣍ͷঢ়ଶͷ༧ଌ ٯϞσϧ

    ؍ଌͷ࠶ߏ੒ ࣍ͷ؍ଌͷ
 ༧ଌ ใुͷ׆༻ E2C
 [Watter+ 2015] ✔ ✔ ✔ ✔ World Model
 [Ha+ 2018] ✔ ✔ ✔ ICM
 [Pathak+ 2017] ✔ ✔ ✔ Causal InfoGAN
 [Kurutach+ 2018] ✔ ✔ ✔ ✔ VPN
 [Oh+ 2017] ✔ ✔ Robotic Priors
 [Jonschkowski+ 2015] ✔ ✔
  17. ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ • ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ͸໰୊ͷෳࡶੑʹӨڹΛٴ΅͢ • Ͳͷ͘Β͍ͷ࣍ݩͷେ͖͔͞ɼߦಈ͕཭ࢄ͔ɾ࿈ଓ͔ • ௨ৗɼਅͷঢ়ଶΑΓ΋େ͖ͳঢ়ଶۭؒͷ࣍ݩΛઃܭ͢Δ͜ͱ͕ଟ͍ • ঢ়ଶΛͲͷ͙Β͍ͷ࣍ݩʹ͢Ε͹͍͍͔Α͘Θ͔Βͳ͍λεΫ΋ଟ͍ ྫ)Atari

    31 ؀ڥ ؍ଌͷछྨ ؍ଌۭؒͷ࣍ݩ ঢ়ଶͷ࣍ݩ ߦಈ Robotic Priors
 [Jon-schkowski+ 2015] slot car racing ը૾ 16×16×3 2 ཭ࢄ(25) E2C
 [Watter+ 2015] cart-pole ը૾ 80×80×3 8 ཭ࢄ ICM
 [Pathak+ 2017] Mario Bros. ը૾ 42×42×3 2 ཭ࢄ(14)
  18. ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΤʔδΣϯτʹ࣮ࡍʹڧԽֶशλεΫΛղ͔ͤͯɼλεΫؒͰసҠͰ͖Δ͙Β͍൚Խ͞Ε ͨදݱʹͳ͍ͬͯΔ͔Λௐ΂Δ • ΋ͬͱ΋Ұൠతͳํ๏͕ͩɼ࣮ݧίετ͕ߴ͍ • ͲͷڧԽֶशΞϧΰϦζϜΛ࢖ͬͯධՁ͢Ε͹͍͍͔Θ͔Βͳ͍ •

    ͳͷͰɼֶशͨ͠ঢ়ଶදݱ͕ྑ͍͔Ͳ͏͔ͷதؒతͳධՁख๏͕ཉ͍͠ • ࠷ۙ๣๏Λ࢖͏ • ࣭తධՁ • ྔతධՁ (KNN-MSE [Lesort+ 2017]) 32 KNN − MSE(s) = 1 k ∑ s′∈KNN(s,k) ˜ s − ˜ s′ 2
  19. ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΋ͭΕͷͳ͍දݱ(disentangled)͔Ͳ͏͔ΛΈΔ • disentangled metric score [Higgins+ 2016]

    • σʔλͷഎޙͷੜ੒ཁҼ͕෼͔͍ͬͯΔલఏ • ༰ྔ͕খ͘͞VC࣍ݩͷখ͍͞൑ผثͷaccuracyΛ༻͍Δํ๏ • ਅͷঢ়ଶ΁ͷճؼϞσϧΛ࡞Δ [Jonschkowski+ 2015] • ςετηοτͷਫ਼౓ΛධՁ͢Δ 33
  20. ධՁʹ༻͍ΔλεΫ SRLͰఆ൪ͷλεΫ • ϏσΦήʔϜ • ྫ) AtariɼDoomɼSuper Mario Bros. •

    ෺ཧγϛϡϨʔλ • ྫ) OpenAI Gymɼ DeepMind Labs • ࣮ϩϘοτ • ྫ) ϚχϐϡϨʔγϣϯ[Finn+ 2015]ɼϘλϯԡ͠[Lesort+ 2015]ɼ೺࣋[Finn+ 2015] 36
  21. S-RL Toolbox SRLΞϧΰϦζϜͷධՁʹؔ͢Δ͍Ζ͍ΖΛղܾ͢Δπʔϧ [Raffin+ 2018] • https://github.com/araffin/robotics-rl-srl • ଟ༷ͳػೳ •

    10छྨͷڧԽֶशΞϧΰϦζϜ • Open AI GymܗࣜͷΠϯλʔϑΣΠεΛ࣋ͭධՁ؀ڥ • ϩΨʔɾՄࢹԽπʔϧ • ϋΠύʔύϥϝʔλαʔνπʔϧ • ࣮ػͷbaxterͰूΊͨσʔληοτ • SRLͷ࣮૷ू΋SRL-Zooͱؚͯ͠·Ε͍ͯΔ • https://github.com/araffin/srl-zoo • PyTorchͰ͏Ε͍͠ 37
  22. ײ૝ • ঢ়ଶදݱʹؔͯ͠ͲΕ͚ͩෆ࣮֬ੑ͕͋Δͷ͔ΛධՁ͢Δݚڀ͸͋ΔͷͩΖ͏͔ʁ • ྫ͑͹ɼ࠷ॳͷ1ϑϨʔϜ͚ͩݟͨͱ͖ͱɼ20ϑϨʔϜ࿈ଓͰݟͨͱ͖Ͱ͸ͦͷঢ়ଶදݱͷෆ ࣮֬ੑ͸ҟͳΔ͸ͣ • ͦͷෆ࣮֬ੑΛ൓өͨ͠policy͕࡞ΕΕ͹ޮ཰తͳ୳ࡧʹ΋ͭͳ͕Δʁ • ͨ͘͞ΜͷλεΫΛղ͔ͤͯSRLͯ͠ɼྑ͍SRLͷύϥϝʔλΛֶशͨ͠ͷͪɼfew-shot

    Ͱ৽͍͠λεΫʹద߹ͤ͞ΔMAMLతͳΞϓϩʔν͕༗ޮ͔΋ • ͦ΋ͦ΋ɼSRLΛ͍ͨ͠ؾ࣋ͪ͸ɼͨ͘͞ΜͷλεΫͰڞ༗Ͱ͖ΔදݱΛֶश͍͔ͨ͠Βͩͬ ͨͷͰ͸ʁ • (·͋ɼ࣮ݧίετ͕ߴ͍ͷͰɼ࿦จ಺Ͱͨ͘͞ΜͷυϝΠϯΛ࢖ͬͨڧԽֶशΛͨ͘͠ͳ͍ͷ͸Θ ͔Δ͚Ͳ΋…) 39
  23. References [Agrawal+ 2016] Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra

    Malik, Sergey Levine (2016). Learning to Poke by Poking: Experiential Learning of Intuitive Physics. https://arxiv.org/abs/1606.07419 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/document/6472238 [Böhmer+ 2015] Böhmer, W., Springenberg, J. T., Boedecker, J., Riedmiller, M., and Obermayer, K. (2015). Autonomous learning of state representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz, pages 1–10. http://www.ni.tu-berlin.de/fileadmin/fg215/articles/ boehmer15b.pdf [Curran+ 2015] William Curran, Tim Brys, Matthew Taylor, William Smart (2015). Using PCA to Efficiently Represent State Spaces. https:// arxiv.org/abs/1505.00322 [Finn+ 2015] Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel (2015). Deep Spatial Autoencoders for Visuomotor Learning. https://arxiv.org/abs/1509.06113 [Ha+ 2018] David Ha, Jürgen Schmidhuber (2018). World Models. https://arxiv.org/abs/1803.10122 [Higgins+ 2016] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner (2016). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. https:// openreview.net/forum?id=Sy2fzU9gl [Jonschkowski+ 2015] Jonschkowski, R. and Brock, O. (2015). Learning state representations with robotic priors. Auton. Robots, 39(3): 407–428. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-15-AURO.pdf [Jonschkowski+ 2017] Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller (2017). PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations. https://arxiv.org/abs/1705.09805 [Karl+ 2016] Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. https://arxiv.org/abs/1605.06432 42
  24. References [Kurutach+ 2018] Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart

    Russell, Pieter Abbeel (2018). Learning Plannable Representations with Causal InfoGAN. https://arxiv.org/abs/1807.09341 [Lake+ 2016} Building Machines That Learn and Think Like People (2016). Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman. https://arxiv.org/abs/1604.00289 [Oh+ 2017] Junhyuk Oh, Satinder Singh, Honglak Lee (2017). Value Prediction Network. https://arxiv.org/abs/1707.03497 [Pathak+ 2017] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell (2017). Curiosity-driven Exploration by Self- supervised Prediction. https://arxiv.org/abs/1705.05363 [Raffin+ 2018] Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. https://arxiv.org/abs/1809.09369 [Lesort+ 2017] Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat (2017). Unsupervised state representation learning with robotic priors: a robustness benchmark. https://arxiv.org/abs/1709.05185 [Mattner+ 2012] Mattner, J., Lange, S., and Riedmiller, M. A. (2012). Learn to swing up and balance a real pole based on raw visual input data. In Neural Information Processing - 19th International Conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012, Proceedings, Part V, pages 126–133. https://ieeexplore.ieee.org/document/7759578 [van Hoof+ 2016] van Hoof, H., Chen, N., Karl, M., van der Smagt, P., and Peters, J. (2016). Stable reinforcement learning with autoencoders for tactile and visual data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3928–3934. https://ieeexplore.ieee.org/document/7759578/ [Watter+ 2015] Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller (2015). Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. https://arxiv.org/abs/1506.07365 43
  25. ਂ૚ֶशʹ͓͚Δੜ੒Ϟσϧ ਂ૚ੜ੒Ϟσϧ (Deep Generative Model, DGM) • ෼෍ʹχϡʔϥϧωοτϫʔΫΛ༻͍Δ • VAEͱGAN͕Α͘஌ΒΕ͍ͯΔ

    • ͱ͘ʹɼVAE͸ࠓ·Ͱͷ؍ଌͷܥྻͷ௿࣍ݩදݱ(ঢ়ଶදݱ)Λֶश͢ΔͨΊʹ
 Α͘༻͍ΒΕ͍ͯΔ(ୈ1෦) 48 VAE ग़య: [Tschannen+ 2018] GAN ग़య: [Tschannen+ 2018]
  26. VAE Variational Autoencoder (VAE) [Kingma+ 2014] • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ܇࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢ • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓

    ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ͷ࠷খԽ) 49 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qKL (z|x)∥p(z))] ※ ܦݧσʔλ෼෍ɹɹͰظ଴஋ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼ΍΍ݟ׳Εͳ͍͕ී௨ͷVAEͷELBO ̂ p(x) [−log pθ (x)] = ℒVAE (θ, ϕ) − ̂ p(x) [DKL (qϕ (z|x)∥pθ (z|x))] −ℒVAE ̂ p(x) [−log pθ (x)] ℒVAE ̂ p(x) ग़య: [Tschannen+ 2018] KL߲ ࠶ߏ੒
  27. VAE VAEͷloss • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯٯ఻೻
 • ୈ2߲͸ɼclosed-formʹٻΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ • Τϯίʔμͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ΛબΜͩͱ͖͸

    closed-formʹܭࢉͰ͖Δ • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷڑ཭Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ
 ྫ) GANʹ͓͚Δdensity ratio trick 50 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qϕ (z|x)∥p(z))] z(i) ∼ qϕ (z|x(i)) qϕ (z|x) = (μϕ (x), diag (σϕ (x))) p(z) = (0,I) KL߲ ࠶ߏ੒
  28. ఢରతֶशʹΑΔີ౓ൺਪఆ f-μΠόʔδΣϯε • ɹΛತؔ਺Ͱɼ ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼ ͱ ͷf-μΠόʔδΣϯεΛ
 
 
 ͱఆٛ͢Δɽ

    • ͷͱ͖ɼKL divergenceʹͳΔ • ɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ 51 f f(1) = 0 px py Df (px ∥py) = ∫ f ( px (x) py (x) ) py (x)dx f(t) = t log t Df (px ∥py) = DKL (px ∥py) px py
  29. ఢରతֶशʹΑΔີ౓ൺਪఆ GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠දݱ͢Δ • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2஋෼ྨλεΫʹམͱ͠ࠐΈɼDiscriminator ͸ͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ

    ͋Δ֬཰Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹݸͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ 52 c ∈ {0,1} px py px (x) = p(x|c = 1) py (x) = p(x|c = 0) Sη px (x) px (x) py (x) = p(x|c = 1) p(x|c = 0) = p(c = 1|x) p(c = 0|x) ≈ Sη (x) 1 − Sη (x) px N DKL (px ∥py) = ∫ px (x)log ( px (x) py (x) ) dx ≈ 1 N N ∑ i=1 log ( Sη (x(i)) 1 − Sη (x(i))) ग़య: [Tschannen+ 2018]
  30. Pixyzͱ͸ Pixyz • ෳࡶͳਂ૚ੜ੒ϞσϧΛ؆୯ʹ࣮૷ɾར༻͢Δ͜ͱʹ
 ಛԽͨ͠PyTorchϕʔεͷϥΠϒϥϦ • ϨϙδτϦ: https://github.com/masa-su/pixyz • υΩϡϝϯτ:

    https://docs.pixyz.io • ౦େদඌݚ ླ໦͞Μ͕։ൃ • ڧԽֶशΞʔΩςΫνϟษڧձͷΦʔΨφΠβͷ1ਓ • ਂ૚ੜ੒ϞσϧΛهड़͢ΔϥΠϒϥϦͱͯ͠
 ֬཰ม਺ɹɹɹͷಉ࣌෼෍ɹɹɹɹΛҙ໊ࣝͯ͠෇͚ΒΕ͍ͯΔ 54 x, y, z P(x, y, z)
  31. 1. Distribution API ֬཰෼෍ͷAPI • DistributionΫϥεΛܧঝͯ͠ωοτϫʔΫΛఆٛ͢Δ • torch.distributions ʹؚ·ΕΔ΋ͷͱ΄΅ಉ͡ॻ͖ํ •

    ಉ࣌෼෍ͷҼ਺෼ղΛɼ෼෍ͷֻ͚ࢉͱͯ͠௚઀هड़Ͱ͖Δ • ෼෍ͷੵͱͯ͠ߏ੒͞ΕΔ෼෍΋ɼಉ༷ʹ෼෍ͱͯ͠αϯϓϦϯά΍໬౓ܭࢉ͕Մೳ 56
  32. 2. Loss API DistributionΫϥεΛ΋ͱʹɼޡࠩؔ਺΍ԼքΛܭࢉ͢Δ • σʔλΛҾ਺ͱͯ͠estimateϝιουΛ࢖͏͜ͱͰ஋ΛධՁͰ͖Δ (define-and-run) • ༷ʑͳLoss͕طʹఆٛ͞Ε͍ͯΔ •

    ྫ) ෛͷର਺໬౓(NLL)ɼKLμΠόʔδΣϯε(KullbackLeibler)… • Lossؒͷ࢛ଇԋࢉ͕ՄೳͳͷͰɼෳࡶͳਂ૚ੜ੒Ϟσϧ͕༻ҙʹهड़Ͱ͖Δ 57 − ∑ x,y∼pdata (x,y) [ Eq(z|x,y) [log p(x, z|y) q(z|x, y) ] + α log q(y|x) ] − ∑ xu ∼pdata(xu) Eq(z|xu ,y)q(y|xu ) [ log p (xu , z|y) q(z|xu , y)q(y|xu ) ] ྫ) M2Ϟσϧ[Kingma+ 2014]
  33. 1. Πϯετʔϧ લఏ: PyTorch͕Πϯετʔϧࡁ • ͜ͷลΛΈ͍ͯͩ͘͞ https://pytorch.org/get-started/locally/ • ;ͭ͏͸ɼɹɹɹɹɹɹɹɹɹɹɹͰΑ͍ͱࢥΘΕΔ 1)

    PixyzͷgithubϨϙδτϦ͔Βclone 2) pip install • কདྷɼόʔδϣϯ͕҆ఆͨ͠ΒPyPIʹొ࿥͢Δ༧ఆͩͦ͏Ͱ͢(git clone͢Δඞཁͳ͘ͳΔ) 60 git clone https://github.com/masa-su/pixyz.git pip install -e pixyz pip install torch torchvision
  34. 2. ࢖ͬͯΈΔ PixyzʹΑΔ࣮૷ͷجຊతͳྲྀΕ 1. ෼෍Λఆٛ͢Δ • ෼෍ͷੵ΋෼෍ͱͯ͠ΈͳͤΔʂ 2. ໨తؔ਺ɾϞσϧΛఆٛ͢Δ •

    Model APIɼLoss APIɼDistribution APIͷ3ͭͷॻ͖ํ͕ଘࡏ • Lossಉ࢜ͷ࢛ଇԋࢉ͕Ͱ͖Δʂ 3. ֶश͢Δ • ModelΫϥεΛܧঝͨ͠৔߹͸ɼmodel.train()ͰOKʂ 61
  35. ࠓ೔ͷνϡʔτϦΞϧࢿྉ ʮश͏ΑΓ׳ΕΑʯͱ͍͏͜ͱͰ༻ҙͯ͠Έ·ͨ͠ • https://github.com/TMats/rlarch-pixyz-tutorial • 00: PixyzͰѻ͏֬཰෼෍ʹ͍ͭͯ • 01: Model

    APIͷVAEΫϥεΛ࢖ͬͯɼvanillaͳVAE[Kingma+ 2014]Λ࣮૷͢Δ • 02: Loss APIΛ࢖ͬͯɼΑΓෳࡶͳਂ૚ੜ੒ϞσϧΛ࣮૷͢Δ • M2Ϟσϧ[Kingma+ 2014] • ॳΊͯ͜ͷࢿྉΛར༻͢ΔͷͰɼࠓޙͷࢀߟͷͨΊʹɼ࣭໰ɾίϝϯτͳͲ͋Ε͹ͥͻ͓ ئ͍͠·͢ 62
  36. Pixyzoo Pixyzoo • PixyzΛར༻ͨ͠ਂ૚ੜ੒Ϟσϧͷ࣮૷ϨϙδτϦ • ΋ͪΖΜGan ZooΈ͍ͨͳͷΛҙ͍ࣝͯ͠Δ • https://github.com/masa-su/pixyzoo •

    ݱࡏɼGQNɾVIBɾFactorVAEͳͲ͕ೖ͍ͬͯΔ • ଓʑ௥Ճ͍ͨ͠ • ϓϧϦΫେ׻ܴͰ͢ • pixyzooϨϙδτϦΛforkͯ͠ϓϧϦΫΛૹ͍ͬͯͩ͘͞ 64
  37. References [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir

    Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 66
  38. GQNͱ͸ʁ /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH<&TMBNJ > • 4."MJ&TMBNJ %BOJMP+3F[FOEF FUBM 4DJFODF  

    • ͪͳΈʹ4DJFODFຊࢽͷهࣄ͸࣮૷্શ͘ࢀߟʹͳΒͳ͍4VQQMFNFOUBMΛಡΈ·͠ΐ͏ • ෳ਺ͷࢹ఺ʹ͓͚Δը૾Λ΋ͱʹɼผͷࢹ఺͔Βͷը૾Λੜ੒͢Δ(FOFSBUJWF2VFSZ /FUXPSL (2/ ΛఏҊ
 IUUQTXXXZPVUVCFDPNXBUDI UJNF@DPOUJOVFW3#+'OH/2P • ΋͠ɼࢹ఺ͷҐஔʹΑΒͳ͍ঢ়ଶͷදݱ͕֫ಘͰ͖ΔͳΒ͏Ε͍͠ ண໨͍ͯ͠Δཧ༝  • ڊେͳDPOEJUJPOBM7"&Λར༻ 70
  39. GQNͷ໰୊ઃఆ σʔληοτ • ɹݸͷγʔϯ ؀ڥ ͦΕͧΕʹର͠ɼɹݸͷ࠲ඪͱͦͷ࠲ඪ͔Βͷ
 3(#ը૾͔ΒͳΔର • ɹݸ໨ͷγʔϯͷɹݸ໨ͷ3(#ը૾ •

    ɹݸ໨ͷγʔϯͷɹݸ໨ͷࢹ఺ WJFXQPJOU  ໰୊ઃఆ • ɹݸͷ؍ଌʢจ຺ʣɹɹɹɹɹɹͱ೚ҙͷࢹ఺ʢΫΤϦʣ͕༩͑ΒΕͨ΋ͱͰ
 ରԠ͢Δ3(#ը૾ɹɹΛ༧ଌ͢Δɽ • ༗ݶͷ࣍ݩతͳ ը૾ͷ ؍ଌ͔Β͸ɼܾఆ࿦తʹ༧ଌ͢Δ͜ͱ͸Ͱ͖ͳ͍໰୊ • จ຺Ͱ৚͚݅ͮͨ֬཰Ϟσϧ ਂ૚ੜ੒Ϟσϧ ͱͯ͠ղ͘ 73 {(xk i , vk i )} (i ∈ {1,…, N}, k ∈ {1,…, K}) N K vk i xk i i k i k M x1,…,M i , v1,…,M i vq i xq i
  40. લఏ: Conditional VAE Conditional VAE [Sohn+ 2015] • VAEʹ೚ҙͷ৘ใɹΛ৚͚݅ͮͨ(conditioned)Ϟσϧ •

    ࣄલ෼෍Λ ͱͯ͠ϞσϧԽ͢Δ͜ͱͰɼςετ࣌ʹ௚઀ɹΛਪ࿦Ͱ͖Δ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹม෼Լք:ELBO (ෛͷLoss) • ࣄલ෼෍ͱͯ͠ɹʹґଘ͠ͳ͍෼෍ɹɹΛ࢖͏όʔδϣϯ΋͋Δ[Kingma+ 2014] • PixyzνϡʔτϦΞϧʹొ৔ͨ͠M2Ϟσϧ͸͜ͷύλʔϯ 74 !(#|%, ') ' % # )(%|#, ') )(#|') y log $ %|' ≥ ) * + %, ' log $ % +, ' $(+|') /(+|%, ') = ) * + %, ' log $(%|+, ') − 23[/(+|%, ')||$(+|')] p(z|y) z ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ y p(z)
  41. Generative Query Network GQNͷϞσϧ • จ຺ɹɹɹɹɹɹɹɹɹͱΫΤϦɹɹͰ৚͚݅ͮͨConditional VAE • ɹ͸ܾఆ࿦తͳม׵(දݱωοτϫʔΫ) άϥϑΟΧϧϞσϧͱͯ͠ͷղऍɹɹɹɹม෼Լք:ELBO

    (ෛͷLoss) 75 r = f(x1,…,M i , v1,…,M i ) vq i f !" # $ # !", &", ' ( !" #, &", ' )(#|&", ') ' &" จ຺ ΫΤϦ જࡏม਺ log $ %&|(&, * ≥ , & - %&, (&, * log . %& -, (&, * /(-|(&, *) 2 - %&, (&, * = , & - %&, (&, * log . %& -, (&, * − 56[2 - %&, (&, * ||/(-|(&, *)] ࣄલ෼෍ Τϯίʔμ σίʔμ ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ
  42. Generative Query Network ม෼Լք (ෛͷଛࣦLoss) 76 ! " # $",

    &", ' log + $" #, &", ' − -.[0 # $", &", ' ||2(#|&", ')] KL߲ ࠶ߏ੒ Τϯίʔμͱࣄલ෼෍͕ۙͮ͘Α͏ʹֶश
 →ςετ࣌ʹࣄલ෼෍Λ࢖͑͹
 ɹΫΤϦʹର͢Δਅͷը૾ɹ͕ͳͯ͘΋
 ɹจ຺ɹͱΫΤϦɹ͔ΒରԠ͢Δ
 ɹજࡏม਺ɹΛਪ࿦Ͱ͖ΔΑ͏ʹͳΔ͸ͣʂ Τϯίʔμ͸ɼจ຺ɹͱΫΤϦɹɼ
 ରԠ͢Δը૾ɹ͔Βɼજࡏม਺ɹΛਪ࿦ɽ જࡏม਺ɹ͔ΒɼΫΤϦʹରԠ͢Δը૾ɹ͕
 ࠶ߏ੒͞ΕΔΑ͏ʹֶश
 →જࡏม਺ɹ͸ͦͷγʔϯશମΛද͢Α͏ͳ
 ɹԿΒ͔ͷදݱֶ͕श͞ΕΔ͸ͣʂ xq vq r z r vq xq z z xq z
  43. ΞʔΩςΫνϟత޻෉: DRAWͷར༻ DRAW [Gregor+ 2015] • VAEͷ͓͚Δજࡏม਺ɹ΁ͷਪ࿦ΛRNNΛ༻͍ͯෳ਺ճʹ෼͚ͯɼ
 ࣗݾճؼతʹߦ͏͜ͱͰɼϞσϧͷදݱྗΛߴΊΔ • ͜ͷͱ͖ͷELBO͸୩ޱ͞ΜͷࢿྉͰಋग़͞Ε͍ͯΔ(p11-13)


    https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query- network-126329901 • ݁࿦ͱͯ͠͸ɼ࠶ߏ੒ͷ໬౓ͱ֤εςοϓͷKLͷ࿨ʹͳΔ • ࣄલ෼෍ͱΤϯίʔμͷ྆ํʹར༻ 78 z q(z|x) = L ∏ l=1 ql (zl |x, z<l) πθ (z|vq, r) = L ∏ l=1 πθl (zl |vq, r, z<l) qϕ (z|xq, vq, r) = L ∏ l=1 qϕl (zl |xq, vq, r, z<l) ࣄલ෼෍ Τϯίʔμ
  44. ࿦จதͷ࣮ݧ݁Ռ Roomσʔληοτ • ϥϯμϜͳ࢛͍֯෦԰ʹϥϯμϜͳ਺ʢ1~3ʣͷ༷ʑͳ෺ମΛ഑ஔ • นͷςΫενϟ: 5छྨ চͷςΫενϟ: 3छྨ ෺ମͷܗঢ়:

    7छྨ • αΠζɼҐஔɼ৭͸ϥϯμϜɽϥΠτ΋ϥϯμϜ • 2ສछྨͷγʔϯΛϨϯμϦϯά • σʔληοτ(͚ͩ)͸ެ։͞Ε͍ͯΔ • Roomͷଞʹ΋਺छྨͷσʔληοτ͕ଘࡏ
 https://github.com/deepmind/gqn-datasets
 • ৽͍͠ࢹ఺Ͱͷը૾͕༧ଌͰ͖͍ͯΔ͜ͱ͕
 ఆੑతʹΘ͔Δ(ӈਤ) 79
  45. PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/GQN • ౦େদඌݚB4 ୩ޱ͞ΜʹΑΔ࣮૷ • Eslami͞Μ(1st Author)͔ΒϋΠύϥ௚఻ •

    DeepMind͸جຊతʹ࣮૷Λެ։͍ͯ͠ͳ͍ͷͰɼ
 ͓ͦΒ͘࠷΋஧࣮ͳ࣮૷ͳ͸ͣ • ࿦จͰ͸ɼK80(24GB)4ຕར༻͍ͯ͠Δͱͷ͜ͱ • खݩͰ֬ೝͯ͠ɼTitanX(12GB)4ຕʹΪϦΪϦ৐Δ͙Β͍ • ύϥϝʔλ਺ݮΒͯ͠΋ͦΜͳʹӨڹͳ͍ • චऀ(Eslami͞ΜɼRezende͞Μ)ʹ΋঺հͯ͠΋Β͍·ͨ͠ • PixyzͷDeepMindσϏϡʔ(?) 81 https://twitter.com/arkitus/status/1072845916850274304
  46. σΟεΧογϣϯ ݁ہજࡏදݱͱͯ͠Կ͕֫ಘ͞Ε͍ͯΔͷ͔ʁ(ঢ়ଶදݱֶशత؍఺) • ෺ମͷදݱɼγʔϯͦͷ΋ͷͷදݱɼࢹ఺ؒͷؔ܎ͱ؍ଌͷؔ܎ • ͜ΕΒΛͲ͏΍ͬͯऔΓग़ͯ͠ར༻͢Δͷ͔ʁ • ࣮ੈքʹసҠͤ͞ΔͱͲ͏ͳΔ͔ʁ • ݱ࣮ੈքͰࡱӨͨ͠ը૾Λ࢖ͬͯGQNΛֶशͤ͞ΔϓϩδΣΫτ


    https://github.com/brettgohre/still_life_rendering_gqn Ͳ͏΍ͬͯΤʔδΣϯτͷߦಈͷܾఆ(ڧԽֶश)ʹ࢖ͬͯΏ͔͘ʁ • ͲΜͳΞϓϦέʔγϣϯ͕͋ΓಘΔ͔ʁ ϝλֶशͷจ຺ • λεΫΛͲ͏ఆٛ͢Δͷ͔ɼԿͰ৚͚݅ͮΔͷ͔͕໰୊ʹͳ͖͍ͬͯͯΔ 85
  47. TD-VAEͱ͸ʁ 5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS<(SFHPS > • ,BSPM(SFHPS (FPSHF1BQBNBLBSJPT FUBM *$-30SBM  

    • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧΛఏҊͨ͠ • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧͰ͸ɼεςοϓ͝ͱʹਪ࿦Λߦ͏͜ͱ͕ओྲྀͰ͋ͬͨ
 ୈ෦ͷॱϞσϧ ͕ɼ5%7"&͸೚ҙͷεςοϓ·Ͱδϟϯϓͯ͠ਪ࿦Ͱ͖Δ • ͜ΕΛ࢖ͬͯ࣌ܥྻͷந৅ԽʹऔΓ૊Ή͜ͱ͕Ͱ͖ͳ͍ͩΖ͏͔ ண໨͍ͯ͠Δཧ༝  • 3//Λ༻͍ͨʮ৴೦ঢ়ଶʯͷಋೖͱ&-#0ͷ
 ෼ղͷ࢓ํ͕ΧΪ 87
  48. લఏ: ࣗݾճؼϞσϧ ࣗݾճؼϞσϧ (Autoregressive Model) • ܥྻσʔλɹɹɹɹɹɹɹΛϞσϦϯά͢Δํ๏ • νΣʔϯϧʔϧΛ༻͍ͯɼ໬౓Λ৚݅෇͖෼෍ͷੵʹ෼ղ (ࣜ͸྆ลର਺Λͱͬͨ)

    • RNNΛ༻͍࣮ͯ૷Ͱ͖Δ • ໰୊఺ • ؍ଌۭؒͰ͔͠༧ଌ͠ͳ͍ͨΊɼσʔλͷѹॖͨ͠දݱΛֶश͠ͳ͍ • ֤εςοϓͰσίʔυɾΤϯίʔυΛ͢ΔͨΊܭࢉྔ͕େ͖͍ • ܇࿅࣌ʹ͸࣍ͷεςοϓͷσʔλ͕ೖͬͯ͘Δ͕(ڭࢣڧ੍)ɼςετ࣌ʹ͸ࣗ਎ͷ༧ଌΛೖྗ ͢ΔͨΊෆ҆ఆ 90 x = (x1 , …, xT) log p (x1 , …, xT) = ∑ t log p (xt |x1 , …, xt−1) ht = f (ht−1 , xt)
  49. લఏ: ঢ়ଶۭؒϞσϧ ঢ়ଶۭؒϞσϧ (State-space Model) • ܥྻσʔλɹɹɹɹɹɹ Λજࡏม਺(ঢ়ଶ) ɹ Λ༻͍ͯϞσϦϯά͢Δํ๏

    • ɹͱɹͷಉ࣌෼෍: • Τϯίʔμ: • ɹͷೖྗͱͯ͠ɹ·ͰͷܥྻɹɹɹɹɹΛ༻͍Δ৔߹ɿϑΟϧλϦϯά
 ɹɹɹɹɹɹɹܥྻશମɹΛ༻͍Δ৔߹ɿεϜʔδϯά • ม෼Լք:ELBO (ෛͷLoss) • ঢ়ଶؒͰͷભҠΛϞσϧԽ͢Δ • ςετ࣌ʹ֤εςοϓͰͷσίʔυɾΤϯίʔυ͕ඞཁͳ͍ 91 x = (x1 , …, xT) z = (z1 , …, zT) x z p(x, z) = ∏ t p (zt |zt−1) p (xt |zt) q(z|x) = ∏ t q (zt |zt−1 , ϕt (x)) log p(x) ≥ z∼q(z|x) [∑ t log p (xt |zt) + log p (zt |zt−1) − log q (zt |zt−1 , ϕt (x)) ] σίʔμ ঢ়ଶભҠ ϕt t x (x1 , …, xt) !"#$ %"#$ !" %"
  50. ϑΟϧλϦϯά෼෍ͷಋೖ ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖͯ͠ELBOΛಋग़ • ϑΟϧλϦϯά෼෍ʹΑͬͯɼજࡏม਺͸ ͷ2͚ͭͩͰදݱͰ͖Δ 93 log(x) = ∑ t

    log p(xt |xx<t ) = ∑ t log ∫ p(xt |zt )p(zt |xx<t )dzt ≥ ∑ t q(zt ,zt−1 |x≤t ) [ log p(xt |zt )p(zt |x<t ) q(zt , zt−1 |x≤t ) ] = ∑ t q(zt |x≤t )q(zt−1 |zt ,x≤t ) [log p (xt |zt) + log p (zt−1 |x<t) + log p (zt |zt−1) −log q (zt |x≤t) − log q (zt−1 |zt , x≤t)] p(zt |x1 , …, xt ) (zt−1 , zt ) ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ Jensenͷෆ౳ࣜΑΓ !"#$ !" %"#$ %" Τϯίʔμ͸աڈʹ ޲͔͏ਪ࿦ʹͳ͍ͬͯΔ
  51. ϑΟϧλϦϯά෼෍ͷ࣮૷ TD-VAEͰ͸ϑΟϧλϦϯά෼෍ΛRNNΛ༻͍࣮ͯ૷͍ͯ͠Δ • ৴೦ঢ়ଶΛද͢ม਺Λɹɹͱͯ͠ɼ֤εςοϓͷ৴೦ঢ়ଶΛɹɹɹɹɹɹͱϞσϧԽ • ৴೦ঢ়ଶɹ͸աڈͷ؍ଌͷܥྻ ͷ৘ใΛؚΜͰ͍Δͱߟ͑ΒΕΔ • ͜ͷͱ͖ɼม෼Լք:ELBO (ෛͷLoss)͸

    94 bt bt = f (bt−1 , xt) bt (x1 , …, xt) pB(zt |bt)q(zt−1 |zt , bt−1 , bt) [log p (xt |zt) + log pB (zt−1 |bt−1) + log p (zt |zt−1) −log pB (zt |bt) − log q (zt−1 |zt , bt−1 , bt)] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ !"#$ %"#$ !" %" &"#$ &"
  52. ࣌ؒεςοϓͷδϟϯϓ ࠓ·Ͱͷٞ࿦Λ1εςοϓͷભҠ͔Βɼ਺εςοϓͷભҠʹ֦ு͢Δ • දه͕มΘΔ͚ͩɼม෼Լք:ELBO(ෛͷLoss)͸ • ֶश࣌͸ɼδϟϯϓ͢Δεςοϓ਺ ΛɹɹɹͷൣғͰαϯϓϦϯάֶͯ͠श • ঢ়ଶભҠͷೖྗʹɹΛՃ͑Δ •

    ςετ࣌͸ɼ
 ɹˠϑΟϧλϦϯά෼෍→ →ঢ়ଶભҠ→ →σίʔμ→ ͱͯ͠༧ଌ͕Ͱ͖Δ 95 pB (zt2 |bt2 )q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ xt1 zt1 zt2 ̂ xt2 δ = t2 − t1 [1,D] δ p(zt2 |z t 1 , δ)
  53. TD-VAEͷֶश TD-VAEͷม෼Լք: ELBO (ෛͷLoss) • ᶃϑΟϧλϦϯά෼෍ɹɹɹ ͔ΒɹΛαϯϓϧ • ᶄͦΕΛ࢖ͬͯɼΤϯίʔμɹɹɹɹ ͔ΒɹɹΛαϯϓϧ

    • ୈ2߲ͱୈ4߲͸ɼΤϯίʔμͱϑΟϧλϦϯά෼෍ͷKLμΠόʔδΣϯεʹͳΔ 96 zt2 ∼pB (zt2 |bt2 ),zt1 ∼q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ᶃϑΟϧλϦϯά෼෍͔Β
 ɹαϯϓϧ ᶄΤϯίʔμ͔Β
 ɹαϯϓϧ pB (zt2 |bt2 ) zt2 zt1 q(zt1 |zt2 , bt1 , bt2 ) KL [q(zt1 |zt2 , bt1 , bt2 )||pB (zt1 |bt1 )]
  54. PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/TD-VAE • ModelΫϥεΛܧঝ • ࣗݾճؼ༻ͷ IterativeLoss Λ༻͍͍ͯΔ •

    Pixyz v0.0.5Ҏ্͕ඞཁ • ΤϯίʔμपΓͷLoss͕
 ਧͬඈͿࣗମ͕ى͖͍ͯΔͷͰ
 ϋΠύϥνϡʔχϯά͕ඞཁ͔ʁ • GQNͷͱ͖΋େมͩͬͨ 97 ෼෍ Loss
  55. σΟεΧογϣϯ TDͷҙຯ͢Δͱ͜Ζ • Τϯίʔμ͕աڈ΁ͷਪ࿦ʹͳ͍ͬͯΔ෦෼͕Temporal DIfferenceͬΆ͍ • ҰԠɼ4.3અʹهड़͸͋Δ ࣌ؒํ޲ͷந৅Խ • ࣌ܥྻͷந৅Խ͸ڧԽֶशʹͱͬͯେ͖ͳ՝୊

    • TD-VAEͰ͸ɼ௚઀తʹ͸ߦಈΛѻ͍ͬͯͳ͍ • τϧΫϨϕϧͷ੍ޚͷ࣌ܥྻ͔Βɼ࣌ܥྻతʹந৅Խ͞ΕͨߦಈϓϦϛςΟϒ͕࡞ΒΕͦͷϓϦ ϛςΟϒ্Ͱ୳ࡧͰ͖Δͱ୳ࡧޮ཰తʹ΋ྑͦ͞͏ͩࣗ͠વͳؾ΋͢Δ TD-VAEͷRNNʹશͯΛୗ͍ͯ͠Δײ • RNNͷදݱྗͷ໰୊ 100
  56. References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende,

    Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http:// science.sciencemag.org/content/360/6394/1204 [Gregor+ 2015] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra. DRAW: A Recurrent Neural Network For Image Generation. https://arxiv.org/abs/1502.04623 [Gregor+ 2016] Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra. Towards Conceptual Compression. https://arxiv.org/abs/1604.08772 [Gregor+ 2019] Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. Temporal Difference Variational Auto-Encoder. https://openreview.net/forum?id=S1x4ghC9tQ [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Sohn+ 2015] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 3483–3491, 2015. https://papers.nips.cc/ paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 [Wagstaff+ 2019] Edward Wagstaff, Fabian B. Fuchs, Martin Engelcke, Ingmar Posner, Michael Osborne. On the Limitations of Representing Functions on Sets. https://arxiv.org/abs/1901.09006 102