Upgrade to Pro — share decks privately, control downloads, hide ads and more …

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

1) 強化学習のための状態表現学習と世界モデル

強化学習問題において,「状態」は所与のものとして考えがちであるが,必ずしもエージェントの観測そのものを用いることが良いとは限らない.例えば,部分観測問題であれば,エージェントが過去の観測を何らかの形で記憶して利用することが有益であろう.そのため,効率的な強化学習のためには,エージェントの過去の観測から有益な「状態」の表現を学習するようにモデルを設計することが有望である.このような状態表現や状態遷移を学習し,エージェントの環境のモデリングを行うモデルは「世界モデル」[1]や,「内部モデル」と呼ばれており,近年,画像など高次元の入力に対応するために状態表現学習に深層生成モデルを用いる研究が数多く発表されている.これらの研究を,2018年にarXivに投稿されたレビュー論文[2]に基づきながら整理して議論する.

2) 深層生成モデルライブラリPixyzハンズオン

様々な深層生成モデルを簡潔に記述することのできる,PyTorchベースのライブラリであるPixyz[3]のハンズオンを行う(PyTorchが使用可能なラップトップがあると便利だと思います).

3) 最近の世界モデル研究紹介: GQN・TD-VAE

英DeepMind社から2018年に発表された世界モデル関連の研究である,Generative Query Network (GQN)[4] とTemoral Difference Variational Auto-Encoder (TD-VAE) [5]の2つのモデルに関して,Pixyzによる実装例を交えながら説明を行う.これらのモデルの応用やその先の展望を議論したい.

051308bf9721ce4caac8ea220705b769?s=128

Tatsuya Matsushima

February 05, 2019
Tweet

Transcript

  1. ঢ়ଶදݱֶशͱੈքϞσϧͷ࠷ۙͷݚڀ
 ͓Αͼਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzͷ঺հ 1 ౦ژେֶ ޻ֶܥݚڀՊ म࢜՝ఔ1೥ দౢ ୡ໵ (Tatsuya Matsushima)

    @__tmats__
  2. ࣗݾ঺հ ౦ژେֶ ޻ֶܥݚڀՊ ٕज़ܦӦઓֶུઐ߈ দඌݚڀࣨ M1 দౢ ୡ໵ (Tatsuya Matsushima)

    • ਓؒͱڞੜͰ͖ΔΑ͏ͳదԠతͳϩϘοτͷ։ൃͱɼ
 ͦͷΑ͏ͳϩϘοτΛ࡞Δ͜ͱͰੜ໋ੑ΍ਓؒͷ஌ೳΛ
 ߏ੒తʹཧղ͢Δ͜ͱʹڵຯ͕͋Γ·͢ɽ • ࠷ۙɼ೔ܦΫϩετϨϯυ͞ΜͰهࣄΛॻ͖·ͨ͠ • দඌݚ͕஫໨ʂ AIͷʮ਎ମੑʯΛάʔάϧ΍ϑΣΠεϒοΫ͕ݚڀ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00001/ • ϩϘοτ੍ޚʹେ੾ͳʮঢ়ଶʯදݱɹσʔλ͔Β؀ڥͷදݱΛֶͿ
 https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00015/ 2 @__tmats__
  3. ͓͠ͳ͕͖ ୈ1෦: 19:00-19:25 ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ • ڧԽֶश໰୊ʹ͓͚Δঢ়ଶͷදݱΛֶश͢Δํ๏Λ·ͱΊΔɽ
 ͜ͷදݱ͸ɼ؀ڥΛԿΒ͔ͷܗͰϞσϧԽͨ͠ʮੈքϞσϧʯͱͳ͍ͬͯΔ͜ͱ͕๬·͍͠ ୈ2෦: 19:30-20:00 ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ

    • ۙ೥ɼʮੈքϞσϧʯ͸ਂ૚ੜ੒ϞσϧΛ༻͍࣮ͯ૷͞ΕΔ͜ͱ͕ଟ͍ɽ
 ਂ૚ੜ੒ϞσϧΛ؆ܿʹॻ͚ΔϥΠϒϥϦPixyzͷνϡʔτϦΞϧΛߦ͏ɽ ୈ3෦: 20:05-20:35 ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE • 2018೥ʹӳDeepMind͔Βൃද͞Εͨ2ͭͷੈքϞσϧʮGQNʯͱʮTD-VAEʯΛ
 PixyzʹΑΔ࣮૷ྫΛަ͑ͳ͕Βղઆ͢Δɽ 3
  4. ୈ1෦: ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ 4

  5. ൃද಺༰ʹ͍ͭͯ (ຊൃදͰϕʔεͱ͍ͯ͠Δ࿦จ) State Representation Learning for Control: An Overview •

    https://arxiv.org/abs/1802.04181 (Last revised 5 Jun 2018) • Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat • S-RL Toolboxͱ͍͏πʔϧ΋࡞੒͍ͯ͠Δ https://github.com/araffin/robotics-rl-srl • ੍ޚλεΫʹ༻͍Δঢ়ଶͷදݱֶशʹؔ͢ΔϨϏϡʔ࿦จ • UC BerkeleyΛத৺ʹ੝Μʹݚڀ͞Ε͍ͯΔ෼໺ • ೔ຊͰ͸͋Μ·Γݟͳ͍ؾ͕͢Δ • Χόʔ͞Ε͍ͯͳ͍ଞͷ࿦จ΋ຊൃදͰ͸௥Ճͨ͠ 5
  6. ঢ়ଶදݱֶशͱ͸ʁ දݱֶश (representation learning) • σʔλ͔Βabstructͳಛ௃Λݟ͚ͭΔֶश ঢ়ଶදݱֶश(state representation learning, SRL)

    • ঢ়ଶදݱ(state representation)ͱ͸ɼ
 ֶशͨ͠ಛ௃͕௿࣍ݩͰɼ࣌ؒతʹൃల͠ɼΤʔδΣϯτͷߦಈͷӨڹΛड͚Δ΋ͷ • ͜ͷΑ͏ͳදݱ͸ϩϘςΟΫε΍੍ޚ໰୊ʹ༗ӹͰ͋Δͱߟ͑ΒΕΔ cf)࣍ݩͷढ͍ • ྫ) ը૾৘ใ͸ඇৗʹߴ࣍ݩ͕ͩɼϩϘοτͷ੍ޚͷ໨తؔ਺͸΋ͬͱ௿࣍ݩʹදݱ͞Ε͏Δ • ϚχϐϡϨʔγϣϯͷ৔߹ɼ෺ମͷ3࣍ݩͷҐஔ৘ใ • ੜͷ؍ଌσʔλ͔Β͜ͷঢ়ଶදݱΛݟ͚ͭΔख๏ͷݚڀ͕ओཁͳςʔϚ 6
  7. ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • ਓؒ͸͋ΒΏΔ΋ͷΛ஌֮Ͱ͖ΔΘ͚Ͱ͸͘ɼ৘ใ(ܹࢗ)͔ΒੈքΛϞσϧԽͨ͠಺෦Ϟ σϧΛ࡞Γɼਓؒͷ஌ೳʹେ͖ͳ໾ׂΛ୲͍ͬͯΔͱࢥΘΕΔ • ੈքϞσϧͱ΋͍͏

    • [DLྠಡձ]GQNͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ
 https://www.slideshare.net/DeepLearningJP2016/dlgqn-111725780 • ࠓ·ͰͷهԱ͔ΒະདྷΛ༧ଌ͢Δྗ͕஌ೳ • δΣϑɾϗʔΩϯεʰߟ͑Δ೴ɾߟ͑Δίϯϐϡʔλʱ • ֶशͨ͠಺෦ϞσϧΛ༻͍ͯະདྷΛγϛϡϨʔγϣϯ͠ͳ͕Βߦಈ͍ͯ͠Δ
 ͱߟ͑ΒΕΔ 7
  8. ੈքϞσϧ ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ • Josh TenenbaumઌੜʹΑΔMITͰͷߨٛ • MIT

    AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
 https://www.youtube.com/watch?v=7ROelYvo8f0 8
  9. ྑ͍දݱͱ͸ʁ ੜͷ؍ଌ৘ใͷແؔ࿈ͳ෦෼Λແࢹͯ͠ɼڧԽֶशʹར༻͢ΔͨΊʹඞཁෆՄܽͳ ৘ใΛΤϯίʔυ͢Δ͜ͱ͕ඞཁ [Böhmer et al., 2015]ʹΑΔྑ͍ঢ়ଶදݱͷఆٛ • Ϛϧίϑੑ͕͋Δ •

    ݱࡏͷঢ়ଶͷΈΛݟΔ͚ͩͰɼ͋ΔํࡦΛ༻͍ͯߦಈΛબ୒͢Δ͜ͱ͕Ͱ͖Δ͙Β͍े෼ͳ৘ ใΛཁ໿͍ͯ͠Δ • ํࡦͷվળͷͨΊʹར༻Ͱ͖Δ • ಉ͡Α͏ͳಛ௃Λ࣋ͭݟͨ͜ͱͷͳ͍ঢ়ଶʹɼֶशͨ͠Ձ஋ؔ਺Λ൚ԽͰ͖Δ • ௿࣍ݩͰ͋Δ 9
  10. SRLͰ͸ਅͷঢ়ଶɹɹɹΛ࢖Θͣʹɼ͜ΕΛۙࣅ͢ΔΑ͏ͳঢ়ଶɹɹɹΛֶश͢Δ • աڈͷ؍ଌɹɹ͔Βݱࡏͷঢ়ଶɹ΁ͷϚοϐϯάɹɹɹɹɹɹͷֶश SRLͷҰൠԽ 10 at ∈ ot ∈ ؍ଌ

    ߦಈ at ot ot+1 ਅͷঢ়ଶ(ෆ໌) ˜ st ˜ st+1 ˜ st ∈ ˜ ใु ˜ st ∈ ˜ st ∈ o1:t st st = ϕ (o1:t)
  11. SRLͷΞϓϩʔν SRLͷΞϓϩʔνʹ͸͍͔ͭ͘ύλʔϯ͕͋Δ • ࣗݾූ߸Խث(auto-encoder)ͷར༻ • ॱϞσϧ(forward model)ͷར༻ • ٯϞσϧ(inverse model)ͷར༻

    • ࣄલ஌ࣝ(prior)ͷಋೖ 11
  12. SRLͷΞϓϩʔν ࣗݾූ߸Խث(auto-encoder)ͷར༻ • ࠶ߏ੒ޡࠩͷ࠷খԽΛ͢Δ͜ͱͰɼΤϯίʔμɹͱσίʔμɹɹΛֶश • ͦͷࡍɼঢ়ଶɹ͕͋Δੑ࣭Λ࣋ͭΑ͏ʹ੍໿Λ͔͚Δ • ྫ)࣍ݩͷ੍໿ɼϊΠζͷআڈ(denoising)ɼεύʔεੑͷ੍໿ 12 st

    st ϕ ϕ−1 st = ϕ (ot ; θϕ) ̂ ot = ϕ−1 (st ; θϕ−1) ࠶ߏ੒ޡࠩ Τϯίʔμ σίʔμ
  13. SRLͷΞϓϩʔν ॱϞσϧ(forward model)ͷར༻ • ॱϞσϧɹ͸ঢ়ଶɹͱߦಈɹΛ༻͍ͯ࣍ͷঢ়ଶɹɹΛ༧ଌ • ॱϞσϧʹઢܗม׵ͳͲͷ੍໿Λ͔͚Δ͜ͱ͕Ͱ͖Δ • Τϯίʔμɹ͸࣍ͷঢ়ଶͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ 13

    ̂ st+1 = f (st , at ; θfwd) ॱϞσϧ ࣍ͷঢ়ଶͷ༧ଌޡࠩ st = ϕ (ot ; θϕ) Τϯίʔμ ϕ st at st+1 f
  14. SRLͷΞϓϩʔν ٯϞσϧ(inverse model)ͷར༻ • ঢ়ଶɹͱ࣍ͷঢ়ଶɹɹ͔Β࣮ࡍʹऔΒΕͨߦಈɹΛਪఆ͢Δ • Τϯίʔμɹ͸࣮ࡍʹͱΒΕͨߦಈɹͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ 14 st st+1

    at ϕ at st = ϕ (ot ; θϕ) Τϯίʔμ ̂ at = g (st , st+1 ; θinv) ٯϞσϧ ࣮ࡍʹऔΒΕͨ
 ߦಈͷ༧ଌޡࠩ
  15. SRLͷΞϓϩʔν ࣄલ஌ࣝ(prior)ͷಋೖ • ಛఆͷ੍໿΍μΠφϛΫεʹؔ͢Δࣄલ஌ࣝΛར༻͢Δ • ྫ) ࣌ؒతͳ࿈ଓੑ • ࣄલ஌ࣝ͸͋Δ৚݅ɹͷ΋ͱͰɼঢ়ଶͷू߹ɹɹʹద༻͞ΕΔlossΛ௨ͯ͡ఆٛ͞ΕΔ •

    15 Loss = ℒprior (s1:n ; θϕ |c) s1:n c ঢ়ଶͷۭؒࣗମʹ
 ੍໿Λ͓͘ st = ϕ (ot ; θϕ) Τϯίʔμ
  16. ͳͥSRLΛߟ͑Δ΂͖ͳͷ͔? • ੜͷ؍ଌ͔Βend-to-endʹ௚઀ڧԽֶश͢Δͷ͸ίετ͕ߴ͍ • SRLͰྑ͍priorΛೖΕͯ͋͛Δ͜ͱ͕Ͱ͖Δ͔΋ • ϚϧνϞʔμϧͳ؍ଌʹ֦ு͠ಘΔ • ؔ࿈ͨ͠λεΫΛࣄલʹղ͘͜ͱͰసҠֶशʹར༻Ͱ͖Δ •

    ਐԽઓུ(ES)ͳͲͷɼ࣍ݩ͕୳ࡧεϐʔυʹ௚݁͢ΔΑ͏ͳΞϧΰϦζϜΛ࠾༻͢Δ͜ͱ ͕ՄೳʹͳΔ Why SRL? 16
  17. طଘͷݚڀͷ঺հͱ෼ྨ 17

  18. ݚڀͷ෼ྨ ෼ྨͷํ๏ • ֶशͷ໨తؔ਺ • ؍ଌۭؒɾߦಈۭؒͷઃܭ • ঢ়ଶදݱͷධՁࢦඪ • ධՁʹ༻͍ΔλεΫ

    18
  19. ֶशͷ໨తؔ਺ • ؍ଌͷ࠶ߏ੒ • ॱϞσϧ(forward model)ͷֶश • ٯϞσϧ(inverse model)ͷֶश •

    ಛ௃ͷఢରతֶशͷ׆༻ • ใुͷ׆༻ • ͦͷଞͷ໨తؔ਺ • ϋΠϒϦουͳ໨తؔ਺ 19
  20. ֶशͷ໨తؔ਺ ؍ଌͷ࠶ߏ੒ • ࣍ݩѹॖͱͯ͠Α͘࢖ΘΕΔํ๏ • ྫ) PCA[Curran+ 2015]ɼDAEɼVAE[van Hoof+ 2016]ɽ

    • ࣗݾූ߸Խث(auto-encoder)Λ࢖͏ख๏͕ଟ͍ • ը૾ͷ؍ଌΛͦͷ··࢖͏[Mattner+ 2012] • ΦϒδΣΫτͷҐஔΛදݱ͢ΔΑ͏ʹ੍໿͢Δ ྫ)Spatial Softmax [Finn+ 2015] • ؍ଌʹ໨ཱͭಛ௃͕ଘࡏͯ͠ͳ͍ͱ୯ʹ؍ଌΛ࠶ߏ੒͢Δ͚ͩͰ͸ྑ͍දݱʹ͸ͳΒͳ͍ • ྫ)ήʔϜʹ͓͚Δখ͍͞ΞΠςϜ • ҧ͏࣌ؒεςοϓ͔Β࠶ߏ੒ͨ͠Γɼ࣌ؒൃలʹ੍ؔͯ͠໿Λ͔͚Δ͜ͱͰରԠ 20
  21. ֶशͷ໨తؔ਺ ॱϞσϧ(forward model)ͷֶश • ঢ়ଶ͕࣍ͷঢ়ଶΛ༧ଌ͢Δͷʹඞཁͳ৘ใΛΤϯίʔυ͢ΔΑ͏ʹ͢Δ • ؍ଌͷ࠶ߏ੒ͱΑ͘૊Έ߹ΘͤΒΕΔ • ঢ়ଶۭؒʹ͓͚ΔભҠΛઢܗͱԾఆ͢Δ͜ͱ͕ଟ͍ 21

    ̂ st+1 = Wst + Uat + V
  22. (ྫ) E2C [Watter+ 2015] Embed to Control: A Locally Linear

    Latent Dynamics Model for Control from Raw Images • VAEΛ༻͍ͨॱϞσϧɽঢ়ଶ(જࡏදݱ)ɹͷભҠΛઢܗͰ͋ΔͱԾఆ. • ࣍ͷ࣌ؒεςοϓͷঢ়ଶͷ༧ଌɹɹͱͦͷঢ়ଶɹɹͷKLΛ
 ͚ۙͮΔ͜ͱͰॱϞσϧΛֶश • ΧϧϚϯϑΟϧλͱͯ͠ఆࣜԽͨ͠΋ͷ΋͋Δ(DVBF)
 [Karl+ 2016] 22 st ̂ st+1 ∼ (μ = Wst + Uat + V, σ) ઢܗ ̂ st+1 st+1
  23. World Models • VAEͱMDN-RNNΛར༻ͨ͠ॱϞσϧ • Vision model (V): ߴ࣍ݩͷ؍ଌσʔλΛVAEΛ༻͍ͯ
 ௿࣍ݩͷίʔυ(ঢ়ଶ)ʹѹॖ

    • Memory RNN (M): աڈͷίʔυ͔Β࣍ͷεςοϓͷ
 ίʔυ(ঢ়ଶ)Λ༧ଌ [DLྠಡձ]World Models
 https://www.slideshare.net/DeepLearningJP2016/dlworld-models-95167842 (ྫ) World Model [Ha+ 2018] 23
  24. ֶशͷ໨తؔ਺ ٯϞσϧ(inverse model)ͷֶश • ͱͬͨߦಈΛਪఆͰ͖ΔΑ͏ʹঢ়ଶͷදݱʹ੍໿Λ՝͢ • ྫ) Learning to Poke

    by Poking [Agrawal+ 2016] • ͍ͭͬͭͨҐஔ(ɹ)ɼ֯౓(ɹ)ɼڑ཭(ɹ)Λਪఆ 24 lt θt pt
  25. (ྫ) ICM [Pathak+ 2017] Curiosity-driven Exploration by Self-supervised Prediction •

    ॱϞσϧͷ༧ଌޡࠩɹɹΛڧԽֶशͷ಺తใुͱͯ͠ར༻ • ΤʔδΣϯτͷ֎෦͔Βͷใु͕εύʔεͳͱ͖ʹ୳ࡧΛଅਐ͢Δ • ٯϞσϧʹΑΔLoss΋ར༻ [DLྠಡձ]Large-Scale Study of Curiosity-Driven Learning
 https://www.slideshare.net/DeepLearningJP2016/dllargescale-study-of-curiositydriven-learning 25 ℒfwd ( ̂ ϕ (ot+1), ̂ f ( ̂ ϕ (ot), at)) = 1 2 ̂ f ( ̂ ϕ (ot), at) − ̂ ϕ (ot+1) 2 2 ℒfwd min θP ,θI ,θF [−λπ(st ; θP) [Σt rt] + (1 − β)ℒinv + βℒfwd] ٯϞσϧ ॱϞσϧ ֎తใु
  26. ֶशͷ໨తؔ਺ ಛ௃ͷఢରతֶश • ྫ) Causal InfoGAN [Kurutach+ 2018] • GANͷ໨తؔ਺ʹঢ়ଶͱGeneratorͷग़ྗ(؍ଌͷϖΞ)ͷ૬ޓ৘ใྔʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճ

    26 min G,Q,ℳ max D V(G, D) − λIVLB (G, Q) ૬ޓ৘ใྔ
  27. ֶशͷ໨తؔ਺ ใुͷ׆༻ • SLRʹ͓͍ͯ͸ใुΛར༻͢Δ͜ͱ͸ඞͣ͠΋ඞཁͰ͸ͳ͍͕ɼঢ়ଶΛ۠ผ͢ΔͨΊͷ௥ Ճతͳ৘ใͱͯ͠ར༻͠͏Δ • ྫ) VPN [Oh+ 2017]

    • ࣍ͷঢ়ଶͱͦͷঢ়ଶՁ஋΋༧ଌ 27 ߦಈ
 ※optionͷo ࣍ͷঢ়ଶ ࣍ͷঢ়ଶՁ஋ ؍ଌ ঢ়ଶ
  28. ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • ࣮ੈքʹؔ͢Δࣄલ஌ࣝ(prior)Λঢ়ଶۭؒʹ൓ө͢ΔͨΊʹɼ໨తؔ਺Λ޻෉͢Δ • ͍Ζ͍Ζͳ΋ͷ͕ఏҊ͞Ε͍ͯΔ
 • Slowness prior [Lesort+

    2017, Jonschkowski+ 2017] • ॏཁͳ΋ͷ͸Ώͬ͘Γͱ࿈ଓతʹಈ͖ɼٸܹͳมԽ͕ى͜ΔՄೳੑ͸௿͍
 
 • Variability [Jonschkowski+ 2017] • ؔ܎ͷ͋Δ΋ͷ͸ಈ͘ͷͰɼঢ়ଶදݱֶश͸ಈ͍͍ͯΔ΋ͷʹ஫໨͢΂͖
 
 28 ℒSlowness (D, ϕ) = [ Δst 2 ] ℒVariabilty (D, ϕ) = [e− st1 − st2 ]
  29. ֶशͷ໨తؔ਺ ͦͷଞͷ໨తؔ਺ • Robotic Priors [Jonschkowski+ 2015]Ͱಋೖ͞Ε͍ͯΔ΋ͷ • Proportionality •

    ҧ͏ঢ়ଶͰ΋ಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓Ͱ͋Δ
 
 • Repeatability • ࣅͨঢ়ଶͰಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓ɾಉ͡ํ޲Ͱ͋Δ
 29 ℒProp (D, ϕ) = [( Δst2 − Δst1 ) 2 |at1 = at2] ℒRep (D, ϕ) = [e− st2 − st1 2 Δst2 − Δst1 2 |at1 = at2]
  30. ֶशͷ໨తؔ਺ ϋΠϒϦουͳ໨తؔ਺ • ࣮ࡍ͸ࠓ·Ͱʹڍ͛ͨ໨తؔ਺ͷ͏ͪɼෳ਺Λ૊Έ߹ΘͤͯSRL͕ߦΘΕΔ͜ͱ͕ଟ͍ 30 ߦಈ/࣍ͷঢ়ଶ ͷ੍໿ ॱϞσϧ
 ※࣍ͷঢ়ଶͷ༧ଌ ٯϞσϧ

    ؍ଌͷ࠶ߏ੒ ࣍ͷ؍ଌͷ
 ༧ଌ ใुͷ׆༻ E2C
 [Watter+ 2015] ✔ ✔ ✔ ✔ World Model
 [Ha+ 2018] ✔ ✔ ✔ ICM
 [Pathak+ 2017] ✔ ✔ ✔ Causal InfoGAN
 [Kurutach+ 2018] ✔ ✔ ✔ ✔ VPN
 [Oh+ 2017] ✔ ✔ Robotic Priors
 [Jonschkowski+ 2015] ✔ ✔
  31. ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ • ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ͸໰୊ͷෳࡶੑʹӨڹΛٴ΅͢ • Ͳͷ͘Β͍ͷ࣍ݩͷେ͖͔͞ɼߦಈ͕཭ࢄ͔ɾ࿈ଓ͔ • ௨ৗɼਅͷঢ়ଶΑΓ΋େ͖ͳঢ়ଶۭؒͷ࣍ݩΛઃܭ͢Δ͜ͱ͕ଟ͍ • ঢ়ଶΛͲͷ͙Β͍ͷ࣍ݩʹ͢Ε͹͍͍͔Α͘Θ͔Βͳ͍λεΫ΋ଟ͍ ྫ)Atari

    31 ؀ڥ ؍ଌͷछྨ ؍ଌۭؒͷ࣍ݩ ঢ়ଶͷ࣍ݩ ߦಈ Robotic Priors
 [Jon-schkowski+ 2015] slot car racing ը૾ 16×16×3 2 ཭ࢄ(25) E2C
 [Watter+ 2015] cart-pole ը૾ 80×80×3 8 ཭ࢄ ICM
 [Pathak+ 2017] Mario Bros. ը૾ 42×42×3 2 ཭ࢄ(14)
  32. ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΤʔδΣϯτʹ࣮ࡍʹڧԽֶशλεΫΛղ͔ͤͯɼλεΫؒͰసҠͰ͖Δ͙Β͍൚Խ͞Ε ͨදݱʹͳ͍ͬͯΔ͔Λௐ΂Δ • ΋ͬͱ΋Ұൠతͳํ๏͕ͩɼ࣮ݧίετ͕ߴ͍ • ͲͷڧԽֶशΞϧΰϦζϜΛ࢖ͬͯධՁ͢Ε͹͍͍͔Θ͔Βͳ͍ •

    ͳͷͰɼֶशͨ͠ঢ়ଶදݱ͕ྑ͍͔Ͳ͏͔ͷதؒతͳධՁख๏͕ཉ͍͠ • ࠷ۙ๣๏Λ࢖͏ • ࣭తධՁ • ྔతධՁ (KNN-MSE [Lesort+ 2017]) 32 KNN − MSE(s) = 1 k ∑ s′∈KNN(s,k) ˜ s − ˜ s′ 2
  33. ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ • ΋ͭΕͷͳ͍දݱ(disentangled)͔Ͳ͏͔ΛΈΔ • disentangled metric score [Higgins+ 2016]

    • σʔλͷഎޙͷੜ੒ཁҼ͕෼͔͍ͬͯΔલఏ • ༰ྔ͕খ͘͞VC࣍ݩͷখ͍͞൑ผثͷaccuracyΛ༻͍Δํ๏ • ਅͷঢ়ଶ΁ͷճؼϞσϧΛ࡞Δ [Jonschkowski+ 2015] • ςετηοτͷਫ਼౓ΛධՁ͢Δ 33
  34. ঢ়ଶදݱͷධՁࢦඪ Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ 34

  35. ධՁʹ༻͍ΔλεΫ SRLͰఆ൪ͷλεΫ • ৼࢠɾ౗ཱৼࢠ • ϥϯμϜͳҐஔ͔Βελʔτ͢ΔৼࢠΛཱͯΔ • Cart-Pole • ୆ंͷ͍ͭͨ౗ཱৼࢠΛཱͯΔ

    • ਨ௚ํ޲͔Β15°ͣΕΔ͔த৺͔Β2.4ϢχοτͿΜͣΕͯ͠·͏ͱΤϐιʔυ͕ऴྃ͢Δ 35
  36. ධՁʹ༻͍ΔλεΫ SRLͰఆ൪ͷλεΫ • ϏσΦήʔϜ • ྫ) AtariɼDoomɼSuper Mario Bros. •

    ෺ཧγϛϡϨʔλ • ྫ) OpenAI Gymɼ DeepMind Labs • ࣮ϩϘοτ • ྫ) ϚχϐϡϨʔγϣϯ[Finn+ 2015]ɼϘλϯԡ͠[Lesort+ 2015]ɼ೺࣋[Finn+ 2015] 36
  37. S-RL Toolbox SRLΞϧΰϦζϜͷධՁʹؔ͢Δ͍Ζ͍ΖΛղܾ͢Δπʔϧ [Raffin+ 2018] • https://github.com/araffin/robotics-rl-srl • ଟ༷ͳػೳ •

    10छྨͷڧԽֶशΞϧΰϦζϜ • Open AI GymܗࣜͷΠϯλʔϑΣΠεΛ࣋ͭධՁ؀ڥ • ϩΨʔɾՄࢹԽπʔϧ • ϋΠύʔύϥϝʔλαʔνπʔϧ • ࣮ػͷbaxterͰूΊͨσʔληοτ • SRLͷ࣮૷ू΋SRL-Zooͱؚͯ͠·Ε͍ͯΔ • https://github.com/araffin/srl-zoo • PyTorchͰ͏Ε͍͠ 37
  38. ୈ1෦ͷ͓ΘΓʹ 38

  39. ײ૝ • ঢ়ଶදݱʹؔͯ͠ͲΕ͚ͩෆ࣮֬ੑ͕͋Δͷ͔ΛධՁ͢Δݚڀ͸͋ΔͷͩΖ͏͔ʁ • ྫ͑͹ɼ࠷ॳͷ1ϑϨʔϜ͚ͩݟͨͱ͖ͱɼ20ϑϨʔϜ࿈ଓͰݟͨͱ͖Ͱ͸ͦͷঢ়ଶදݱͷෆ ࣮֬ੑ͸ҟͳΔ͸ͣ • ͦͷෆ࣮֬ੑΛ൓өͨ͠policy͕࡞ΕΕ͹ޮ཰తͳ୳ࡧʹ΋ͭͳ͕Δʁ • ͨ͘͞ΜͷλεΫΛղ͔ͤͯSRLͯ͠ɼྑ͍SRLͷύϥϝʔλΛֶशͨ͠ͷͪɼfew-shot

    Ͱ৽͍͠λεΫʹద߹ͤ͞ΔMAMLతͳΞϓϩʔν͕༗ޮ͔΋ • ͦ΋ͦ΋ɼSRLΛ͍ͨ͠ؾ࣋ͪ͸ɼͨ͘͞ΜͷλεΫͰڞ༗Ͱ͖ΔදݱΛֶश͍͔ͨ͠Βͩͬ ͨͷͰ͸ʁ • (·͋ɼ࣮ݧίετ͕ߴ͍ͷͰɼ࿦จ಺Ͱͨ͘͞ΜͷυϝΠϯΛ࢖ͬͨڧԽֶशΛͨ͘͠ͳ͍ͷ͸Θ ͔Δ͚Ͳ΋…) 39
  40. σΟεΧογϣϯ ੈքϞσϧͷֶशͱํࡦͷֶशͷ࿩ • ੈքϞσϧ͕ෆ׬શͳͱ͖ʹํࡦΛͲ͏ֶश͢Δͷ͔ʁ • ϞσϧΛΞϯαϯϒϧ͢Δํ๏ දݱֶशͱ͍͏໰୊ઃఆࣗମͷ࿩ • ݁ہɼਅͷdownstreamͷλεΫ͕Θ͔Βͳ͍ͱ͖ʹ΋ɼͳΜΒ͔ͷྑ͍දݱ͕ଘࡏͯ͠ ͍Δ͸ͣͱ͍͏ԾఆΛ͓͘ɼදݱֶशͷ໰୊ʹߦ͖ண͘ͷͰ͸

    • meta-priorͷ֓೦ʹ૬౰[Bengio+ 2013] • ͜ͷ೉͠͞ͷഎܠʹ͸ɼλεΫ͕཭ࢄతʹಘΒΕΔͱ͍͏໰୊ઃఆࣗମͷԾఆ͕͋ΔΑ͏ ͳؾ΋͢Δ 40
  41. ୈ1෦ͷAppendix 41

  42. References [Agrawal+ 2016] Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra

    Malik, Sergey Levine (2016). Learning to Poke by Poking: Experiential Learning of Intuitive Physics. https://arxiv.org/abs/1606.07419 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/document/6472238 [Böhmer+ 2015] Böhmer, W., Springenberg, J. T., Boedecker, J., Riedmiller, M., and Obermayer, K. (2015). Autonomous learning of state representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz, pages 1–10. http://www.ni.tu-berlin.de/fileadmin/fg215/articles/ boehmer15b.pdf [Curran+ 2015] William Curran, Tim Brys, Matthew Taylor, William Smart (2015). Using PCA to Efficiently Represent State Spaces. https:// arxiv.org/abs/1505.00322 [Finn+ 2015] Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel (2015). Deep Spatial Autoencoders for Visuomotor Learning. https://arxiv.org/abs/1509.06113 [Ha+ 2018] David Ha, Jürgen Schmidhuber (2018). World Models. https://arxiv.org/abs/1803.10122 [Higgins+ 2016] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner (2016). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. https:// openreview.net/forum?id=Sy2fzU9gl [Jonschkowski+ 2015] Jonschkowski, R. and Brock, O. (2015). Learning state representations with robotic priors. Auton. Robots, 39(3): 407–428. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-15-AURO.pdf [Jonschkowski+ 2017] Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller (2017). PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations. https://arxiv.org/abs/1705.09805 [Karl+ 2016] Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. https://arxiv.org/abs/1605.06432 42
  43. References [Kurutach+ 2018] Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart

    Russell, Pieter Abbeel (2018). Learning Plannable Representations with Causal InfoGAN. https://arxiv.org/abs/1807.09341 [Lake+ 2016} Building Machines That Learn and Think Like People (2016). Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman. https://arxiv.org/abs/1604.00289 [Oh+ 2017] Junhyuk Oh, Satinder Singh, Honglak Lee (2017). Value Prediction Network. https://arxiv.org/abs/1707.03497 [Pathak+ 2017] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell (2017). Curiosity-driven Exploration by Self- supervised Prediction. https://arxiv.org/abs/1705.05363 [Raffin+ 2018] Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat (2018). S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. https://arxiv.org/abs/1809.09369 [Lesort+ 2017] Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat (2017). Unsupervised state representation learning with robotic priors: a robustness benchmark. https://arxiv.org/abs/1709.05185 [Mattner+ 2012] Mattner, J., Lange, S., and Riedmiller, M. A. (2012). Learn to swing up and balance a real pole based on raw visual input data. In Neural Information Processing - 19th International Conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012, Proceedings, Part V, pages 126–133. https://ieeexplore.ieee.org/document/7759578 [van Hoof+ 2016] van Hoof, H., Chen, N., Karl, M., van der Smagt, P., and Peters, J. (2016). Stable reinforcement learning with autoencoders for tactile and visual data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3928–3934. https://ieeexplore.ieee.org/document/7759578/ [Watter+ 2015] Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller (2015). Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. https://arxiv.org/abs/1506.07365 43
  44. ࣭ٙԠ౴ɾσΟεΧογϣϯ ٳܜ 44

  45. ୈ2෦: ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ 45

  46. ਂ૚ੜ੒Ϟσϧ 46

  47. ੜ੒Ϟσϧ ੜ੒Ϟσϧ • σʔλͷ෼෍ΛϞσϧԽ͢ΔΞϓϩʔν • Ϟσϧ͔Βαϯϓϧ͢Δ͜ͱͰਓ޻తͳσʔλ఺Λੜ੒͢Δ͜ͱ͕Ͱ͖Δ 47 αϯϓϦϯά

  48. ਂ૚ֶशʹ͓͚Δੜ੒Ϟσϧ ਂ૚ੜ੒Ϟσϧ (Deep Generative Model, DGM) • ෼෍ʹχϡʔϥϧωοτϫʔΫΛ༻͍Δ • VAEͱGAN͕Α͘஌ΒΕ͍ͯΔ

    • ͱ͘ʹɼVAE͸ࠓ·Ͱͷ؍ଌͷܥྻͷ௿࣍ݩදݱ(ঢ়ଶදݱ)Λֶश͢ΔͨΊʹ
 Α͘༻͍ΒΕ͍ͯΔ(ୈ1෦) 48 VAE ग़య: [Tschannen+ 2018] GAN ग़య: [Tschannen+ 2018]
  49. VAE Variational Autoencoder (VAE) [Kingma+ 2014] • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ܇࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢ • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓

    ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ͷ࠷খԽ) 49 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qKL (z|x)∥p(z))] ※ ܦݧσʔλ෼෍ɹɹͰظ଴஋ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼ΍΍ݟ׳Εͳ͍͕ී௨ͷVAEͷELBO ̂ p(x) [−log pθ (x)] = ℒVAE (θ, ϕ) − ̂ p(x) [DKL (qϕ (z|x)∥pθ (z|x))] −ℒVAE ̂ p(x) [−log pθ (x)] ℒVAE ̂ p(x) ग़య: [Tschannen+ 2018] KL߲ ࠶ߏ੒
  50. VAE VAEͷloss • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯٯ఻೻
 • ୈ2߲͸ɼclosed-formʹٻΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ • Τϯίʔμͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ΛબΜͩͱ͖͸

    closed-formʹܭࢉͰ͖Δ • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷڑ཭Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ
 ྫ) GANʹ͓͚Δdensity ratio trick 50 ℒVAE (θ, ϕ) = ̂ p(x) [qϕ (z|x) [−log pθ (x|z)]] + ̂ p(x) [DKL (qϕ (z|x)∥p(z))] z(i) ∼ qϕ (z|x(i)) qϕ (z|x) = (μϕ (x), diag (σϕ (x))) p(z) = (0,I) KL߲ ࠶ߏ੒
  51. ఢରతֶशʹΑΔີ౓ൺਪఆ f-μΠόʔδΣϯε • ɹΛತؔ਺Ͱɼ ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼ ͱ ͷf-μΠόʔδΣϯεΛ
 
 
 ͱఆٛ͢Δɽ

    • ͷͱ͖ɼKL divergenceʹͳΔ • ɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ 51 f f(1) = 0 px py Df (px ∥py) = ∫ f ( px (x) py (x) ) py (x)dx f(t) = t log t Df (px ∥py) = DKL (px ∥py) px py
  52. ఢରతֶशʹΑΔີ౓ൺਪఆ GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠දݱ͢Δ • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2஋෼ྨλεΫʹམͱ͠ࠐΈɼDiscriminator ͸ͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ

    ͋Δ֬཰Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹݸͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ 52 c ∈ {0,1} px py px (x) = p(x|c = 1) py (x) = p(x|c = 0) Sη px (x) px (x) py (x) = p(x|c = 1) p(x|c = 0) = p(c = 1|x) p(c = 0|x) ≈ Sη (x) 1 − Sη (x) px N DKL (px ∥py) = ∫ px (x)log ( px (x) py (x) ) dx ≈ 1 N N ∑ i=1 log ( Sη (x(i)) 1 − Sη (x(i))) ग़య: [Tschannen+ 2018]
  53. ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyz 53

  54. Pixyzͱ͸ Pixyz • ෳࡶͳਂ૚ੜ੒ϞσϧΛ؆୯ʹ࣮૷ɾར༻͢Δ͜ͱʹ
 ಛԽͨ͠PyTorchϕʔεͷϥΠϒϥϦ • ϨϙδτϦ: https://github.com/masa-su/pixyz • υΩϡϝϯτ:

    https://docs.pixyz.io • ౦େদඌݚ ླ໦͞Μ͕։ൃ • ڧԽֶशΞʔΩςΫνϟษڧձͷΦʔΨφΠβͷ1ਓ • ਂ૚ੜ੒ϞσϧΛهड़͢ΔϥΠϒϥϦͱͯ͠
 ֬཰ม਺ɹɹɹͷಉ࣌෼෍ɹɹɹɹΛҙ໊ࣝͯ͠෇͚ΒΕ͍ͯΔ 54 x, y, z P(x, y, z)
  55. 3छྨͷAPIʹΑΔ֊૚తͳߏ଄ • ֤API͕ׯব͠ͳ͍ͨΊɼࣗ༝ʹωοτϫʔΫ΍෼෍ɾ໨తؔ਺Λߏ੒ɾมߋՄೳ • طଘͷ֬཰ϞσϦϯάݴޠͰ͸ɼ֬཰෼෍ͱωοτϫʔΫΛಉ࣌ʹهड़͢Δඞཁ͕͋ͬͨ • ྫ) Edward Pixyzͷ3ͭͷAPI 55

  56. 1. Distribution API ֬཰෼෍ͷAPI • DistributionΫϥεΛܧঝͯ͠ωοτϫʔΫΛఆٛ͢Δ • torch.distributions ʹؚ·ΕΔ΋ͷͱ΄΅ಉ͡ॻ͖ํ •

    ಉ࣌෼෍ͷҼ਺෼ղΛɼ෼෍ͷֻ͚ࢉͱͯ͠௚઀هड़Ͱ͖Δ • ෼෍ͷੵͱͯ͠ߏ੒͞ΕΔ෼෍΋ɼಉ༷ʹ෼෍ͱͯ͠αϯϓϦϯά΍໬౓ܭࢉ͕Մೳ 56
  57. 2. Loss API DistributionΫϥεΛ΋ͱʹɼޡࠩؔ਺΍ԼքΛܭࢉ͢Δ • σʔλΛҾ਺ͱͯ͠estimateϝιουΛ࢖͏͜ͱͰ஋ΛධՁͰ͖Δ (define-and-run) • ༷ʑͳLoss͕طʹఆٛ͞Ε͍ͯΔ •

    ྫ) ෛͷର਺໬౓(NLL)ɼKLμΠόʔδΣϯε(KullbackLeibler)… • Lossؒͷ࢛ଇԋࢉ͕ՄೳͳͷͰɼෳࡶͳਂ૚ੜ੒Ϟσϧ͕༻ҙʹهड़Ͱ͖Δ 57 − ∑ x,y∼pdata (x,y) [ Eq(z|x,y) [log p(x, z|y) q(z|x, y) ] + α log q(y|x) ] − ∑ xu ∼pdata(xu) Eq(z|xu ,y)q(y|xu ) [ log p (xu , z|y) q(z|xu , y)q(y|xu ) ] ྫ) M2Ϟσϧ[Kingma+ 2014]
  58. 3. Model API Loss΍optimizerΛModelΫϥεʹ౉ͯ͠ϞσϧΛఆٛ • trainϝιουͰֶशɼtestϝιουͰධՁ ग़དྷ߹͍ͷϞσϧ΋༻ҙ͞Ε͍ͯΔ • ؆୯ͳϞσϧͰͬ͞͞ͱ࣮૷͍ͨ͠ਓ޲͚ •

    ྫ) VAEɼGANɼม෼ਪ࿦(VI)ɼ࠷໬ਪఆ(ML) https://docs.pixyz.io/en/latest/models.html 58
  59. PixyzϋϯζΦϯ 59

  60. 1. Πϯετʔϧ લఏ: PyTorch͕Πϯετʔϧࡁ • ͜ͷลΛΈ͍ͯͩ͘͞ https://pytorch.org/get-started/locally/ • ;ͭ͏͸ɼɹɹɹɹɹɹɹɹɹɹɹͰΑ͍ͱࢥΘΕΔ 1)

    PixyzͷgithubϨϙδτϦ͔Βclone 2) pip install • কདྷɼόʔδϣϯ͕҆ఆͨ͠ΒPyPIʹొ࿥͢Δ༧ఆͩͦ͏Ͱ͢(git clone͢Δඞཁͳ͘ͳΔ) 60 git clone https://github.com/masa-su/pixyz.git pip install -e pixyz pip install torch torchvision
  61. 2. ࢖ͬͯΈΔ PixyzʹΑΔ࣮૷ͷجຊతͳྲྀΕ 1. ෼෍Λఆٛ͢Δ • ෼෍ͷੵ΋෼෍ͱͯ͠ΈͳͤΔʂ 2. ໨తؔ਺ɾϞσϧΛఆٛ͢Δ •

    Model APIɼLoss APIɼDistribution APIͷ3ͭͷॻ͖ํ͕ଘࡏ • Lossಉ࢜ͷ࢛ଇԋࢉ͕Ͱ͖Δʂ 3. ֶश͢Δ • ModelΫϥεΛܧঝͨ͠৔߹͸ɼmodel.train()ͰOKʂ 61
  62. ࠓ೔ͷνϡʔτϦΞϧࢿྉ ʮश͏ΑΓ׳ΕΑʯͱ͍͏͜ͱͰ༻ҙͯ͠Έ·ͨ͠ • https://github.com/TMats/rlarch-pixyz-tutorial • 00: PixyzͰѻ͏֬཰෼෍ʹ͍ͭͯ • 01: Model

    APIͷVAEΫϥεΛ࢖ͬͯɼvanillaͳVAE[Kingma+ 2014]Λ࣮૷͢Δ • 02: Loss APIΛ࢖ͬͯɼΑΓෳࡶͳਂ૚ੜ੒ϞσϧΛ࣮૷͢Δ • M2Ϟσϧ[Kingma+ 2014] • ॳΊͯ͜ͷࢿྉΛར༻͢ΔͷͰɼࠓޙͷࢀߟͷͨΊʹɼ࣭໰ɾίϝϯτͳͲ͋Ε͹ͥͻ͓ ئ͍͠·͢ 62
  63. Pixyzͷ͏Ε͍͠ͱ͜Ζ define-by-runͱdefine-and-runͷ͍͍ͱ͜ͲΓΛ͍ͯ͠Δ • ʮωοτϫʔΫ͸PyTorchͷΑ͏ʹॊೈʹධՁ͍͚ͨ͠ΕͲɼLoss͸ઌʹܾΊ͓͍ͯͯॻ ͍ͨ΋ͷ͕ਖ਼͍͔͠Ͳ͏͔ࣜΛݟͯ֬ೝ͍ͨ͠ʯͱ͍͏ؾ࣋ͪʹԠ͑ͯ͘ΕΔϥΠϒϥϦ • ωοτϫʔΫͱ֬཰෼෍Λॻ͘ϨΠϠΛ੾Γ཭͔ͨ͠ΒͰ͖ٕͨ • ݁Ռͱͯ͠ɼ࿦จͷॻ͔ΕͨLossͷࣜΛͦͷ··ࣸ͠औΔײ͡Ͱ࣮૷Ͱ͖Δ •

    ࣮ݧ͢Δͱ͖΋ɼωοτϫʔΫͷ໰୊ͱLossͷ໰୊Λ੾Γ཭࣮ͯ͠ݧͰ͖Δ Loss APIʹΑͬͯɼҟͳΔਂ૚ੜ੒ϞσϧΛಉҰͷϑϨʔϜϫʔΫͰࠞͥͯॻ͚Δ • ྫ) GANͱVAEͷLossͷ࿨͕औΕΔ 63
  64. Pixyzoo Pixyzoo • PixyzΛར༻ͨ͠ਂ૚ੜ੒Ϟσϧͷ࣮૷ϨϙδτϦ • ΋ͪΖΜGan ZooΈ͍ͨͳͷΛҙ͍ࣝͯ͠Δ • https://github.com/masa-su/pixyzoo •

    ݱࡏɼGQNɾVIBɾFactorVAEͳͲ͕ೖ͍ͬͯΔ • ଓʑ௥Ճ͍ͨ͠ • ϓϧϦΫେ׻ܴͰ͢ • pixyzooϨϙδτϦΛforkͯ͠ϓϧϦΫΛૹ͍ͬͯͩ͘͞ 64
  65. ୈ2෦ͷAppendix 65

  66. References [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir

    Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 66
  67. ࣭ٙԠ౴ɾσΟεΧογϣϯ ٳܜ 67

  68. ୈ3෦: ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE 68

  69. Generative Query Network (GQN) 69

  70. GQNͱ͸ʁ /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH<&TMBNJ > • 4."MJ&TMBNJ %BOJMP+3F[FOEF FUBM 4DJFODF  

    • ͪͳΈʹ4DJFODFຊࢽͷهࣄ͸࣮૷্શ͘ࢀߟʹͳΒͳ͍4VQQMFNFOUBMΛಡΈ·͠ΐ͏ • ෳ਺ͷࢹ఺ʹ͓͚Δը૾Λ΋ͱʹɼผͷࢹ఺͔Βͷը૾Λੜ੒͢Δ(FOFSBUJWF2VFSZ /FUXPSL (2/ ΛఏҊ
 IUUQTXXXZPVUVCFDPNXBUDI UJNF@DPOUJOVFW3#+'OH/2P • ΋͠ɼࢹ఺ͷҐஔʹΑΒͳ͍ঢ়ଶͷදݱ͕֫ಘͰ͖ΔͳΒ͏Ε͍͠ ண໨͍ͯ͠Δཧ༝  • ڊେͳDPOEJUJPOBM7"&Λར༻ 70
  71. GQNͱ͸ʁ /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH 71 https://deepmind.com/blog/neural-scene-representation-and-rendering/#gif-207

  72. GQNͷৄ͍͠ࢿྉ <%-ྠಡձ>(2/ͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ • ౦େদඌݚླ໦͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMHRO
 /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHͷղઆ • ౦େ૬ᖒݚ.ۚࢠ͞Μ
 IUUQTXXXTMJEFTIBSFOFU.BTBZB,BOFLPOFVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHE
 <%-)BDLT>1Z5PSDI

    1JYZ[ʹΑΔ(FOFSBUJWF2VFSZ/FUXPSLͷ࣮૷ • ౦େদඌݚ#୩ޱ͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMIBDLTQZUPSDIQJYZ[HFOFSBUJWFRVFSZ OFUXPSL 72
  73. GQNͷ໰୊ઃఆ σʔληοτ • ɹݸͷγʔϯ ؀ڥ ͦΕͧΕʹର͠ɼɹݸͷ࠲ඪͱͦͷ࠲ඪ͔Βͷ
 3(#ը૾͔ΒͳΔର • ɹݸ໨ͷγʔϯͷɹݸ໨ͷ3(#ը૾ •

    ɹݸ໨ͷγʔϯͷɹݸ໨ͷࢹ఺ WJFXQPJOU  ໰୊ઃఆ • ɹݸͷ؍ଌʢจ຺ʣɹɹɹɹɹɹͱ೚ҙͷࢹ఺ʢΫΤϦʣ͕༩͑ΒΕͨ΋ͱͰ
 ରԠ͢Δ3(#ը૾ɹɹΛ༧ଌ͢Δɽ • ༗ݶͷ࣍ݩతͳ ը૾ͷ ؍ଌ͔Β͸ɼܾఆ࿦తʹ༧ଌ͢Δ͜ͱ͸Ͱ͖ͳ͍໰୊ • จ຺Ͱ৚͚݅ͮͨ֬཰Ϟσϧ ਂ૚ੜ੒Ϟσϧ ͱͯ͠ղ͘ 73 {(xk i , vk i )} (i ∈ {1,…, N}, k ∈ {1,…, K}) N K vk i xk i i k i k M x1,…,M i , v1,…,M i vq i xq i
  74. લఏ: Conditional VAE Conditional VAE [Sohn+ 2015] • VAEʹ೚ҙͷ৘ใɹΛ৚͚݅ͮͨ(conditioned)Ϟσϧ •

    ࣄલ෼෍Λ ͱͯ͠ϞσϧԽ͢Δ͜ͱͰɼςετ࣌ʹ௚઀ɹΛਪ࿦Ͱ͖Δ
 
 ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹม෼Լք:ELBO (ෛͷLoss) • ࣄલ෼෍ͱͯ͠ɹʹґଘ͠ͳ͍෼෍ɹɹΛ࢖͏όʔδϣϯ΋͋Δ[Kingma+ 2014] • PixyzνϡʔτϦΞϧʹొ৔ͨ͠M2Ϟσϧ͸͜ͷύλʔϯ 74 !(#|%, ') ' % # )(%|#, ') )(#|') y log $ %|' ≥ ) * + %, ' log $ % +, ' $(+|') /(+|%, ') = ) * + %, ' log $(%|+, ') − 23[/(+|%, ')||$(+|')] p(z|y) z ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ y p(z)
  75. Generative Query Network GQNͷϞσϧ • จ຺ɹɹɹɹɹɹɹɹɹͱΫΤϦɹɹͰ৚͚݅ͮͨConditional VAE • ɹ͸ܾఆ࿦తͳม׵(දݱωοτϫʔΫ) άϥϑΟΧϧϞσϧͱͯ͠ͷղऍɹɹɹɹม෼Լք:ELBO

    (ෛͷLoss) 75 r = f(x1,…,M i , v1,…,M i ) vq i f !" # $ # !", &", ' ( !" #, &", ' )(#|&", ') ' &" จ຺ ΫΤϦ જࡏม਺ log $ %&|(&, * ≥ , & - %&, (&, * log . %& -, (&, * /(-|(&, *) 2 - %&, (&, * = , & - %&, (&, * log . %& -, (&, * − 56[2 - %&, (&, * ||/(-|(&, *)] ࣄલ෼෍ Τϯίʔμ σίʔμ ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ
  76. Generative Query Network ม෼Լք (ෛͷଛࣦLoss) 76 ! " # $",

    &", ' log + $" #, &", ' − -.[0 # $", &", ' ||2(#|&", ')] KL߲ ࠶ߏ੒ Τϯίʔμͱࣄલ෼෍͕ۙͮ͘Α͏ʹֶश
 →ςετ࣌ʹࣄલ෼෍Λ࢖͑͹
 ɹΫΤϦʹର͢Δਅͷը૾ɹ͕ͳͯ͘΋
 ɹจ຺ɹͱΫΤϦɹ͔ΒରԠ͢Δ
 ɹજࡏม਺ɹΛਪ࿦Ͱ͖ΔΑ͏ʹͳΔ͸ͣʂ Τϯίʔμ͸ɼจ຺ɹͱΫΤϦɹɼ
 ରԠ͢Δը૾ɹ͔Βɼજࡏม਺ɹΛਪ࿦ɽ જࡏม਺ɹ͔ΒɼΫΤϦʹରԠ͢Δը૾ɹ͕
 ࠶ߏ੒͞ΕΔΑ͏ʹֶश
 →જࡏม਺ɹ͸ͦͷγʔϯશମΛද͢Α͏ͳ
 ɹԿΒ͔ͷදݱֶ͕श͞ΕΔ͸ͣʂ xq vq r z r vq xq z z xq z
  77. ΞʔΩςΫνϟత޻෉: දݱωοτϫʔΫ දݱωοτϫʔΫΛ༻͍ͯɼɹݸͷ؍ଌɹɹɹɹɹɹΛ1ͭͷจ຺ɹʹཁ໿͢Δ • ը૾ͱ࠲ඪͷϖΞʹରͯͦ͠ΕͧΕม׵ͨ͠΋ͷʹؔͯ͠ฏۉΛͱΔ ֤ࢹ఺ʹ͓͚ΔฏۉΛͱΔ͜ͱͰɼ
 ࢹ఺ͷॱ൪ʹґଘ͠ͳ͍(permutation invariant)දݱΛಘΔ • จ຺ʹ༻͍Δࢹ఺ͷ਺Λࣗ༝ʹઃఆͰ͖Δ

    • 3//ͰϞσϧԽ͢Δͱॱং͕ؔ܎͢Δ ࿨(ฏۉ)Λऔ͍͍ͬͯͷʁͱ͍͏ٙ໰ • ࠷ۙɼҰԠٞ࿦͞Ε͍ͯΔΒ͍͠[Wagstaff+ 2019] 77 x1,…,M i , v1,…,M i M r rk = ψ (xk, vk) r = M ∑ k=1 rk
  78. ΞʔΩςΫνϟత޻෉: DRAWͷར༻ DRAW [Gregor+ 2015] • VAEͷ͓͚Δજࡏม਺ɹ΁ͷਪ࿦ΛRNNΛ༻͍ͯෳ਺ճʹ෼͚ͯɼ
 ࣗݾճؼతʹߦ͏͜ͱͰɼϞσϧͷදݱྗΛߴΊΔ • ͜ͷͱ͖ͷELBO͸୩ޱ͞ΜͷࢿྉͰಋग़͞Ε͍ͯΔ(p11-13)


    https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query- network-126329901 • ݁࿦ͱͯ͠͸ɼ࠶ߏ੒ͷ໬౓ͱ֤εςοϓͷKLͷ࿨ʹͳΔ • ࣄલ෼෍ͱΤϯίʔμͷ྆ํʹར༻ 78 z q(z|x) = L ∏ l=1 ql (zl |x, z<l) πθ (z|vq, r) = L ∏ l=1 πθl (zl |vq, r, z<l) qϕ (z|xq, vq, r) = L ∏ l=1 qϕl (zl |xq, vq, r, z<l) ࣄલ෼෍ Τϯίʔμ
  79. ࿦จதͷ࣮ݧ݁Ռ Roomσʔληοτ • ϥϯμϜͳ࢛͍֯෦԰ʹϥϯμϜͳ਺ʢ1~3ʣͷ༷ʑͳ෺ମΛ഑ஔ • นͷςΫενϟ: 5छྨ চͷςΫενϟ: 3छྨ ෺ମͷܗঢ়:

    7छྨ • αΠζɼҐஔɼ৭͸ϥϯμϜɽϥΠτ΋ϥϯμϜ • 2ສछྨͷγʔϯΛϨϯμϦϯά • σʔληοτ(͚ͩ)͸ެ։͞Ε͍ͯΔ • Roomͷଞʹ΋਺छྨͷσʔληοτ͕ଘࡏ
 https://github.com/deepmind/gqn-datasets
 • ৽͍͠ࢹ఺Ͱͷը૾͕༧ଌͰ͖͍ͯΔ͜ͱ͕
 ఆੑతʹΘ͔Δ(ӈਤ) 79
  80. ࿦จதͷ࣮ݧ݁Ռ Scene Algebra • ֶशͨ͠ωοτϫʔΫΛ༻͍ͯɼจ຺ ্Ͱͷ଍͠ࢉɾҾ͖ࢉΛߦ͏ • word2vecͷΑ͏ʹ༧ଌ݁Ռ͕ԋࢉ௨Γʹͳ͓ͬͯΓɼߏ੒తͳදݱʹͳ͍ͬͯΔ • Ͳ͜·Ͱߏ੒తͳͷ͔͸ٙ໰͕࢒Δ͕…

    • σʔληοτશମͰมԽͷόϦΤʔγϣϯ͕ൺֱత୯७ͳͷͰͰ͖͍ͯΔͷs͔΋ 80 r
  81. PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/GQN • ౦େদඌݚB4 ୩ޱ͞ΜʹΑΔ࣮૷ • Eslami͞Μ(1st Author)͔ΒϋΠύϥ௚఻ •

    DeepMind͸جຊతʹ࣮૷Λެ։͍ͯ͠ͳ͍ͷͰɼ
 ͓ͦΒ͘࠷΋஧࣮ͳ࣮૷ͳ͸ͣ • ࿦จͰ͸ɼK80(24GB)4ຕར༻͍ͯ͠Δͱͷ͜ͱ • खݩͰ֬ೝͯ͠ɼTitanX(12GB)4ຕʹΪϦΪϦ৐Δ͙Β͍ • ύϥϝʔλ਺ݮΒͯ͠΋ͦΜͳʹӨڹͳ͍ • චऀ(Eslami͞ΜɼRezende͞Μ)ʹ΋঺հͯ͠΋Β͍·ͨ͠ • PixyzͷDeepMindσϏϡʔ(?) 81 https://twitter.com/arkitus/status/1072845916850274304
  82. PixyzʹΑΔ࣮૷ ෼෍ͷఆٛLossɾϞσϧͷఆٛ 82 ࣄલ෼෍ɾσίʔμ
 ɹgeneraton.py Τϯίʔμ ɹinference.py

  83. PixyzʹΑΔ࣮૷ LossɾϞσϧͷఆٛ 83 ɹmodel.py ෼෍ͷΠϯελϯε DRAW

  84. PixyzʹΑΔ࣮૷ ݱࡏɼDRAWͷ෦෼ͰforϧʔϓͷதͰ1εςοϓ͝ͱʹlossΛධՁ͍ͯ͠Δ Q. ΋ͬͱ៉ྷʹ͔͚ͳ͍ͷʁ A. ࣍ͷόʔδϣϯ(0.0.5)ͰࣗݾճؼϞσϧʹରԠ͢Δ༧ఆ • ۙ೔தʹmasterʹϚʔδ༧ఆͩͦ͏Ͱ͢ • ͔ͳΓ؆ܿʹͳΔ͸ͣ(࣍ʹ঺հ͢ΔTD-VAEͰ͸ར༻͍ͯ͠Δ)

    ͱ͸͍͑ɼ
 ωοτϫʔΫͱ֬཰෼෍ͷ࣮૷͕෼཭͞Ε͓ͯΓɼPixyzͷྑ͕͞ੜ͖͍ͯΔ 84
  85. σΟεΧογϣϯ ݁ہજࡏදݱͱͯ͠Կ͕֫ಘ͞Ε͍ͯΔͷ͔ʁ(ঢ়ଶදݱֶशత؍఺) • ෺ମͷදݱɼγʔϯͦͷ΋ͷͷදݱɼࢹ఺ؒͷؔ܎ͱ؍ଌͷؔ܎ • ͜ΕΒΛͲ͏΍ͬͯऔΓग़ͯ͠ར༻͢Δͷ͔ʁ • ࣮ੈքʹసҠͤ͞ΔͱͲ͏ͳΔ͔ʁ • ݱ࣮ੈքͰࡱӨͨ͠ը૾Λ࢖ͬͯGQNΛֶशͤ͞ΔϓϩδΣΫτ


    https://github.com/brettgohre/still_life_rendering_gqn Ͳ͏΍ͬͯΤʔδΣϯτͷߦಈͷܾఆ(ڧԽֶश)ʹ࢖ͬͯΏ͔͘ʁ • ͲΜͳΞϓϦέʔγϣϯ͕͋ΓಘΔ͔ʁ ϝλֶशͷจ຺ • λεΫΛͲ͏ఆٛ͢Δͷ͔ɼԿͰ৚͚݅ͮΔͷ͔͕໰୊ʹͳ͖͍ͬͯͯΔ 85
  86. Temporal Difference Variational Auto-Encoder (TD-VAE) 86

  87. TD-VAEͱ͸ʁ 5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS<(SFHPS > • ,BSPM(SFHPS (FPSHF1BQBNBLBSJPT FUBM *$-30SBM  

    • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧΛఏҊͨ͠ • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧͰ͸ɼεςοϓ͝ͱʹਪ࿦Λߦ͏͜ͱ͕ओྲྀͰ͋ͬͨ
 ୈ෦ͷॱϞσϧ ͕ɼ5%7"&͸೚ҙͷεςοϓ·Ͱδϟϯϓͯ͠ਪ࿦Ͱ͖Δ • ͜ΕΛ࢖ͬͯ࣌ܥྻͷந৅ԽʹऔΓ૊Ή͜ͱ͕Ͱ͖ͳ͍ͩΖ͏͔ ண໨͍ͯ͠Δཧ༝  • 3//Λ༻͍ͨʮ৴೦ঢ়ଶʯͷಋೖͱ&-#0ͷ
 ෼ղͷ࢓ํ͕ΧΪ 87
  88. TD-VAEͷৄ͍͠ࢿྉ (2/ʹൺ΂ͯ͋Μ·Γͳ͍ʜ <%-ྠಡձ>5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS • ౦େদඌݚླ໦͞Μ
 IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMUFNQPSBMEJGGFSFODFWBSJBUJPOBMBVUPFODPEFS 88

  89. ͲΜͳঢ়ଶදݱ͕޷·͍͔͠ʁ ࿦จதͰݴٴ͞Ε͍ͯΔɼΤʔδΣϯτͷঢ়ଶදݱ͕࣋ͭ΂͖ੑ࣭ σʔλͷந৅తͳঢ়ଶදݱΛֶश͠ɼ؍ଌͰ͸ͳ͘ঢ়ଶͷϨϕϧͰ༧ଌͰ͖Δ͜ͱ ͋Δ࣌ؒ·Ͱͷશͯͷ؍ଌ͕༩͑ΒΕͨ΋ͱͰɼ
 ঢ়ଶͷϑΟϧλϦϯά෼෍Λܾఆతʹίʔυͨ͠৴೦ঢ়ଶ CFMJFGTUBUF ΛֶशͰ͖Δ͜ͱ • ৴೦ঢ়ଶ͸ΤʔδΣϯτ͕࣋ͭੈքʹؔ͢Δঢ়ଶͷશͯͷ৘ใͱɼ࠷దʹߦಈ͢Δํ๏ΛؚΜ Ͱ͍Δ

    ਺εςοϓઌͷδϟϯϓͨ͠ະདྷΛ༧ଌ͢Δ͜ͱɽ
 ࣌ܥྻશͯΛޡࠩٯ఻೻ͤͣʹ࣌ؒతʹ཭Εͨ࣌఺͔ΒֶशͰ͖ΔΑ͏ʹ͢Δ͜ͱʹΑͬ ͯɼ࣌ܥྻతͳந৅ԽΛߦΘΕΔ͜ͱ ͜ΕΒͷੑ࣭Λຬͨ͢Ϟσϧͱͯ͠ɼ5%7"&ΛఏҊ 89
  90. લఏ: ࣗݾճؼϞσϧ ࣗݾճؼϞσϧ (Autoregressive Model) • ܥྻσʔλɹɹɹɹɹɹɹΛϞσϦϯά͢Δํ๏ • νΣʔϯϧʔϧΛ༻͍ͯɼ໬౓Λ৚݅෇͖෼෍ͷੵʹ෼ղ (ࣜ͸྆ลର਺Λͱͬͨ)

    • RNNΛ༻͍࣮ͯ૷Ͱ͖Δ • ໰୊఺ • ؍ଌۭؒͰ͔͠༧ଌ͠ͳ͍ͨΊɼσʔλͷѹॖͨ͠දݱΛֶश͠ͳ͍ • ֤εςοϓͰσίʔυɾΤϯίʔυΛ͢ΔͨΊܭࢉྔ͕େ͖͍ • ܇࿅࣌ʹ͸࣍ͷεςοϓͷσʔλ͕ೖͬͯ͘Δ͕(ڭࢣڧ੍)ɼςετ࣌ʹ͸ࣗ਎ͷ༧ଌΛೖྗ ͢ΔͨΊෆ҆ఆ 90 x = (x1 , …, xT) log p (x1 , …, xT) = ∑ t log p (xt |x1 , …, xt−1) ht = f (ht−1 , xt)
  91. લఏ: ঢ়ଶۭؒϞσϧ ঢ়ଶۭؒϞσϧ (State-space Model) • ܥྻσʔλɹɹɹɹɹɹ Λજࡏม਺(ঢ়ଶ) ɹ Λ༻͍ͯϞσϦϯά͢Δํ๏

    • ɹͱɹͷಉ࣌෼෍: • Τϯίʔμ: • ɹͷೖྗͱͯ͠ɹ·ͰͷܥྻɹɹɹɹɹΛ༻͍Δ৔߹ɿϑΟϧλϦϯά
 ɹɹɹɹɹɹɹܥྻશମɹΛ༻͍Δ৔߹ɿεϜʔδϯά • ม෼Լք:ELBO (ෛͷLoss) • ঢ়ଶؒͰͷભҠΛϞσϧԽ͢Δ • ςετ࣌ʹ֤εςοϓͰͷσίʔυɾΤϯίʔυ͕ඞཁͳ͍ 91 x = (x1 , …, xT) z = (z1 , …, zT) x z p(x, z) = ∏ t p (zt |zt−1) p (xt |zt) q(z|x) = ∏ t q (zt |zt−1 , ϕt (x)) log p(x) ≥ z∼q(z|x) [∑ t log p (xt |zt) + log p (zt |zt−1) − log q (zt |zt−1 , ϕt (x)) ] σίʔμ ঢ়ଶભҠ ϕt t x (x1 , …, xt) !"#$ %"#$ !" %"
  92. ϑΟϧλϦϯά෼෍ͷಋೖ ঢ়ଶۭؒϞσϧͰ͸ɼঢ়ଶɹΛಘΔͨΊʹલͷεςοϓͷঢ়ଶɹ ͕ඞཁ • ͦͷͨΊʹ͸࣍ʑʹɹɹɹɹɹɹɹͷϦαϯϓϦϯά͕ඞཁ ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖ • ؍ଌͷܥྻɹɹɹɹͷΈʹґଘ͢ΔΑ͏ʹ͢Δ • POMDPͷڧԽֶशʹ͓͚Δ৴೦ঢ়ଶʹ૬౰

    92 zt zt−1 !"#$ %"#$ !" %" zt−1 , zt−2 , …, z1 p(zt |x1 , …, xt ) (x1 , …, xt) !"#$ !" %"#$ %" & %" !$ , . . , !"
  93. ϑΟϧλϦϯά෼෍ͷಋೖ ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖͯ͠ELBOΛಋग़ • ϑΟϧλϦϯά෼෍ʹΑͬͯɼજࡏม਺͸ ͷ2͚ͭͩͰදݱͰ͖Δ 93 log(x) = ∑ t

    log p(xt |xx<t ) = ∑ t log ∫ p(xt |zt )p(zt |xx<t )dzt ≥ ∑ t q(zt ,zt−1 |x≤t ) [ log p(xt |zt )p(zt |x<t ) q(zt , zt−1 |x≤t ) ] = ∑ t q(zt |x≤t )q(zt−1 |zt ,x≤t ) [log p (xt |zt) + log p (zt−1 |x<t) + log p (zt |zt−1) −log q (zt |x≤t) − log q (zt−1 |zt , x≤t)] p(zt |x1 , …, xt ) (zt−1 , zt ) ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ Jensenͷෆ౳ࣜΑΓ !"#$ !" %"#$ %" Τϯίʔμ͸աڈʹ ޲͔͏ਪ࿦ʹͳ͍ͬͯΔ
  94. ϑΟϧλϦϯά෼෍ͷ࣮૷ TD-VAEͰ͸ϑΟϧλϦϯά෼෍ΛRNNΛ༻͍࣮ͯ૷͍ͯ͠Δ • ৴೦ঢ়ଶΛද͢ม਺Λɹɹͱͯ͠ɼ֤εςοϓͷ৴೦ঢ়ଶΛɹɹɹɹɹɹͱϞσϧԽ • ৴೦ঢ়ଶɹ͸աڈͷ؍ଌͷܥྻ ͷ৘ใΛؚΜͰ͍Δͱߟ͑ΒΕΔ • ͜ͷͱ͖ɼม෼Լք:ELBO (ෛͷLoss)͸

    94 bt bt = f (bt−1 , xt) bt (x1 , …, xt) pB(zt |bt)q(zt−1 |zt , bt−1 , bt) [log p (xt |zt) + log pB (zt−1 |bt−1) + log p (zt |zt−1) −log pB (zt |bt) − log q (zt−1 |zt , bt−1 , bt)] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ !"#$ %"#$ !" %" &"#$ &"
  95. ࣌ؒεςοϓͷδϟϯϓ ࠓ·Ͱͷٞ࿦Λ1εςοϓͷભҠ͔Βɼ਺εςοϓͷભҠʹ֦ு͢Δ • දه͕มΘΔ͚ͩɼม෼Լք:ELBO(ෛͷLoss)͸ • ֶश࣌͸ɼδϟϯϓ͢Δεςοϓ਺ ΛɹɹɹͷൣғͰαϯϓϦϯάֶͯ͠श • ঢ়ଶભҠͷೖྗʹɹΛՃ͑Δ •

    ςετ࣌͸ɼ
 ɹˠϑΟϧλϦϯά෼෍→ →ঢ়ଶભҠ→ →σίʔμ→ ͱͯ͠༧ଌ͕Ͱ͖Δ 95 pB (zt2 |bt2 )q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ঢ়ଶભҠ ϑΟϧλϦϯά෼෍ ϑΟϧλϦϯά෼෍ Τϯίʔμ σίʔμ xt1 zt1 zt2 ̂ xt2 δ = t2 − t1 [1,D] δ p(zt2 |z t 1 , δ)
  96. TD-VAEͷֶश TD-VAEͷม෼Լք: ELBO (ෛͷLoss) • ᶃϑΟϧλϦϯά෼෍ɹɹɹ ͔ΒɹΛαϯϓϧ • ᶄͦΕΛ࢖ͬͯɼΤϯίʔμɹɹɹɹ ͔ΒɹɹΛαϯϓϧ

    • ୈ2߲ͱୈ4߲͸ɼΤϯίʔμͱϑΟϧλϦϯά෼෍ͷKLμΠόʔδΣϯεʹͳΔ 96 zt2 ∼pB (zt2 |bt2 ),zt1 ∼q(zt1 |zt2 ,bt1 ,bt2 ) [log p (xt2 |zt2 ) + log pB (zt1 |bt1 ) + log p (zt2 |zt1 ) −log pB (zt2 |bt2 ) − log q (zt1 |zt2 , bt1 , bt2 )] ᶃϑΟϧλϦϯά෼෍͔Β
 ɹαϯϓϧ ᶄΤϯίʔμ͔Β
 ɹαϯϓϧ pB (zt2 |bt2 ) zt2 zt1 q(zt1 |zt2 , bt1 , bt2 ) KL [q(zt1 |zt2 , bt1 , bt2 )||pB (zt1 |bt1 )]
  97. PixyzʹΑΔ࣮૷ Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/TD-VAE • ModelΫϥεΛܧঝ • ࣗݾճؼ༻ͷ IterativeLoss Λ༻͍͍ͯΔ •

    Pixyz v0.0.5Ҏ্͕ඞཁ • ΤϯίʔμपΓͷLoss͕
 ਧͬඈͿࣗମ͕ى͖͍ͯΔͷͰ
 ϋΠύϥνϡʔχϯά͕ඞཁ͔ʁ • GQNͷͱ͖΋େมͩͬͨ 97 ෼෍ Loss
  98. ࿦จதͷ࣮ݧ ෦෼؍ଌMiniPacman • 1εςοϓ͝ͱʹ༧ଌ͢ΔϞσϧͰɼఏҊख๏ͷΤϯίʔμͱଞͷΤϯίʔμΛൺֱ
 ɹɹTD-VAEͷΤϯίʔμ
 ɹɹfilteringϞσϧͷΤϯίʔμ
 ɹɹmean-fieldϞσϧͷΤϯίʔμ • ELBOͱෛͷର਺໬౓ʹؔͯ͠TD-VAEͷΤϯίʔμ͕ྑ͍͜ͱΛࣔ͢ 98

    q (zt−1 |zt , bt−1 , bt) q (zt |zt−1 , bt) q (zt |bt) !"#$ %"#$ !" %" &"#$ &"
  99. ࿦จதͷ࣮ݧ MovingMNIST • ਺ࣈ͕ࠨӈʹಈ͘MNISTͰɼεςοϓΛεςοϓΛඈ͹ͨ͠༧ଌΛͤ͞Δ࣮ݧ • 1͔Β4εςοϓͷؒͰඈ͹ֶͯ͠श • ͳΜ͔஌ͬͯΔMovingMNIST͡Όͳ͍Α͏ͳؾ͕͢Δ….. • ଞʹ΋DeepMind

    LabΛ༻͍࣮ͨݧΛ͍ͯ͠Δ • ΞʔΩςΫνϟʹConvDRAW[Gregor+ 2016]Λ࢖ͬͨͱͷ͜ͱ 99
  100. σΟεΧογϣϯ TDͷҙຯ͢Δͱ͜Ζ • Τϯίʔμ͕աڈ΁ͷਪ࿦ʹͳ͍ͬͯΔ෦෼͕Temporal DIfferenceͬΆ͍ • ҰԠɼ4.3અʹهड़͸͋Δ ࣌ؒํ޲ͷந৅Խ • ࣌ܥྻͷந৅Խ͸ڧԽֶशʹͱͬͯେ͖ͳ՝୊

    • TD-VAEͰ͸ɼ௚઀తʹ͸ߦಈΛѻ͍ͬͯͳ͍ • τϧΫϨϕϧͷ੍ޚͷ࣌ܥྻ͔Βɼ࣌ܥྻతʹந৅Խ͞ΕͨߦಈϓϦϛςΟϒ͕࡞ΒΕͦͷϓϦ ϛςΟϒ্Ͱ୳ࡧͰ͖Δͱ୳ࡧޮ཰తʹ΋ྑͦ͞͏ͩࣗ͠વͳؾ΋͢Δ TD-VAEͷRNNʹશͯΛୗ͍ͯ͠Δײ • RNNͷදݱྗͷ໰୊ 100
  101. ୈ3෦ͷAppendix 101

  102. References [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende,

    Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http:// science.sciencemag.org/content/360/6394/1204 [Gregor+ 2015] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra. DRAW: A Recurrent Neural Network For Image Generation. https://arxiv.org/abs/1502.04623 [Gregor+ 2016] Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra. Towards Conceptual Compression. https://arxiv.org/abs/1604.08772 [Gregor+ 2019] Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. Temporal Difference Variational Auto-Encoder. https://openreview.net/forum?id=S1x4ghC9tQ [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep Generative Models. https://arxiv.org/abs/1406.5298 [Sohn+ 2015] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 3483–3491, 2015. https://papers.nips.cc/ paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based Representation Learning. https://arxiv.org/abs/1812.05069 [Wagstaff+ 2019] Edward Wagstaff, Fabian B. Fuchs, Martin Engelcke, Ingmar Posner, Michael Osborne. On the Limitations of Representing Functions on Sets. https://arxiv.org/abs/1901.09006 102
  103. ࣭ٙԠ౴ɾσΟεΧογϣϯ 103

  104. ँࣙ ຊൃදʹ͋ͨΓɼಛʹҎԼͷํʑͷ͝ڠྗΛ͍͖ͨͩ·ͨ͠ ླ໦խେ͞Μ • (2/ɾ5%7"&ͷྠಡࢿྉͷఏڙ • 1JYZ[։ൃɼ࣮૷ͷ૬ஊ ୩ޱঘฏ͞Μ • (2/࣮૷

    େม͋Γ͕ͱ͏͍͟͝·ͨ͠ 104