Upgrade to Pro — share decks privately, control downloads, hide ads and more …

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

第32回 強化学習アーキテクチャ勉強会 状態表現学習と世界モデルの最近の研究,および深層生成モデルライブラリPixyzの紹介 #rlarch

1) 強化学習のための状態表現学習と世界モデル

強化学習問題において,「状態」は所与のものとして考えがちであるが,必ずしもエージェントの観測そのものを用いることが良いとは限らない.例えば,部分観測問題であれば,エージェントが過去の観測を何らかの形で記憶して利用することが有益であろう.そのため,効率的な強化学習のためには,エージェントの過去の観測から有益な「状態」の表現を学習するようにモデルを設計することが有望である.このような状態表現や状態遷移を学習し,エージェントの環境のモデリングを行うモデルは「世界モデル」[1]や,「内部モデル」と呼ばれており,近年,画像など高次元の入力に対応するために状態表現学習に深層生成モデルを用いる研究が数多く発表されている.これらの研究を,2018年にarXivに投稿されたレビュー論文[2]に基づきながら整理して議論する.

2) 深層生成モデルライブラリPixyzハンズオン

様々な深層生成モデルを簡潔に記述することのできる,PyTorchベースのライブラリであるPixyz[3]のハンズオンを行う(PyTorchが使用可能なラップトップがあると便利だと思います).

3) 最近の世界モデル研究紹介: GQN・TD-VAE

英DeepMind社から2018年に発表された世界モデル関連の研究である,Generative Query Network (GQN)[4] とTemoral Difference Variational Auto-Encoder (TD-VAE) [5]の2つのモデルに関して,Pixyzによる実装例を交えながら説明を行う.これらのモデルの応用やその先の展望を議論したい.

Tatsuya Matsushima

February 05, 2019
Tweet

More Decks by Tatsuya Matsushima

Other Decks in Technology

Transcript

  1. ঢ়ଶදݱֶशͱੈքϞσϧͷ࠷ۙͷݚڀ

    ͓Αͼਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzͷ঺հ
    1
    ౦ژେֶ ޻ֶܥݚڀՊ म࢜՝ఔ1೥
    দౢ ୡ໵ (Tatsuya Matsushima)
    @__tmats__

    View Slide

  2. ࣗݾ঺հ
    ౦ژେֶ ޻ֶܥݚڀՊ ٕज़ܦӦઓֶུઐ߈ দඌݚڀࣨ
    M1 দౢ ୡ໵ (Tatsuya Matsushima)
    • ਓؒͱڞੜͰ͖ΔΑ͏ͳదԠతͳϩϘοτͷ։ൃͱɼ

    ͦͷΑ͏ͳϩϘοτΛ࡞Δ͜ͱͰੜ໋ੑ΍ਓؒͷ஌ೳΛ

    ߏ੒తʹཧղ͢Δ͜ͱʹڵຯ͕͋Γ·͢ɽ
    • ࠷ۙɼ೔ܦΫϩετϨϯυ͞ΜͰهࣄΛॻ͖·ͨ͠
    • দඌݚ͕஫໨ʂ AIͷʮ਎ମੑʯΛάʔάϧ΍ϑΣΠεϒοΫ͕ݚڀ

    https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00001/
    • ϩϘοτ੍ޚʹେ੾ͳʮঢ়ଶʯදݱɹσʔλ͔Β؀ڥͷදݱΛֶͿ

    https://trend.nikkeibp.co.jp/atcl/contents/technology/00007/00015/
    2
    @__tmats__

    View Slide

  3. ͓͠ͳ͕͖
    ୈ1෦: 19:00-19:25 ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ
    • ڧԽֶश໰୊ʹ͓͚Δঢ়ଶͷදݱΛֶश͢Δํ๏Λ·ͱΊΔɽ

    ͜ͷදݱ͸ɼ؀ڥΛԿΒ͔ͷܗͰϞσϧԽͨ͠ʮੈքϞσϧʯͱͳ͍ͬͯΔ͜ͱ͕๬·͍͠
    ୈ2෦: 19:30-20:00 ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ
    • ۙ೥ɼʮੈքϞσϧʯ͸ਂ૚ੜ੒ϞσϧΛ༻͍࣮ͯ૷͞ΕΔ͜ͱ͕ଟ͍ɽ

    ਂ૚ੜ੒ϞσϧΛ؆ܿʹॻ͚ΔϥΠϒϥϦPixyzͷνϡʔτϦΞϧΛߦ͏ɽ
    ୈ3෦: 20:05-20:35 ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE
    • 2018೥ʹӳDeepMind͔Βൃද͞Εͨ2ͭͷੈքϞσϧʮGQNʯͱʮTD-VAEʯΛ

    PixyzʹΑΔ࣮૷ྫΛަ͑ͳ͕Βղઆ͢Δɽ
    3

    View Slide

  4. ୈ1෦: ڧԽֶशͷͨΊͷঢ়ଶදݱֶशͱੈքϞσϧ
    4

    View Slide

  5. ൃද಺༰ʹ͍ͭͯ
    (ຊൃදͰϕʔεͱ͍ͯ͠Δ࿦จ)
    State Representation Learning for Control: An Overview
    • https://arxiv.org/abs/1802.04181 (Last revised 5 Jun 2018)
    • Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat
    • S-RL Toolboxͱ͍͏πʔϧ΋࡞੒͍ͯ͠Δ https://github.com/araffin/robotics-rl-srl
    • ੍ޚλεΫʹ༻͍Δঢ়ଶͷදݱֶशʹؔ͢ΔϨϏϡʔ࿦จ
    • UC BerkeleyΛத৺ʹ੝Μʹݚڀ͞Ε͍ͯΔ෼໺
    • ೔ຊͰ͸͋Μ·Γݟͳ͍ؾ͕͢Δ
    • Χόʔ͞Ε͍ͯͳ͍ଞͷ࿦จ΋ຊൃදͰ͸௥Ճͨ͠
    5

    View Slide

  6. ঢ়ଶදݱֶशͱ͸ʁ
    දݱֶश (representation learning)
    • σʔλ͔Βabstructͳಛ௃Λݟ͚ͭΔֶश
    ঢ়ଶදݱֶश(state representation learning, SRL)
    • ঢ়ଶදݱ(state representation)ͱ͸ɼ

    ֶशͨ͠ಛ௃͕௿࣍ݩͰɼ࣌ؒతʹൃల͠ɼΤʔδΣϯτͷߦಈͷӨڹΛड͚Δ΋ͷ
    • ͜ͷΑ͏ͳදݱ͸ϩϘςΟΫε΍੍ޚ໰୊ʹ༗ӹͰ͋Δͱߟ͑ΒΕΔ cf)࣍ݩͷढ͍
    • ྫ) ը૾৘ใ͸ඇৗʹߴ࣍ݩ͕ͩɼϩϘοτͷ੍ޚͷ໨తؔ਺͸΋ͬͱ௿࣍ݩʹදݱ͞Ε͏Δ
    • ϚχϐϡϨʔγϣϯͷ৔߹ɼ෺ମͷ3࣍ݩͷҐஔ৘ใ
    • ੜͷ؍ଌσʔλ͔Β͜ͷঢ়ଶදݱΛݟ͚ͭΔख๏ͷݚڀ͕ओཁͳςʔϚ
    6

    View Slide

  7. ੈքϞσϧ
    ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ
    • ਓؒ͸͋ΒΏΔ΋ͷΛ஌֮Ͱ͖ΔΘ͚Ͱ͸͘ɼ৘ใ(ܹࢗ)͔ΒੈքΛϞσϧԽͨ͠಺෦Ϟ
    σϧΛ࡞Γɼਓؒͷ஌ೳʹେ͖ͳ໾ׂΛ୲͍ͬͯΔͱࢥΘΕΔ
    • ੈքϞσϧͱ΋͍͏
    • [DLྠಡձ]GQNͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ

    https://www.slideshare.net/DeepLearningJP2016/dlgqn-111725780
    • ࠓ·ͰͷهԱ͔ΒະདྷΛ༧ଌ͢Δྗ͕஌ೳ
    • δΣϑɾϗʔΩϯεʰߟ͑Δ೴ɾߟ͑Δίϯϐϡʔλʱ
    • ֶशͨ͠಺෦ϞσϧΛ༻͍ͯະདྷΛγϛϡϨʔγϣϯ͠ͳ͕Βߦಈ͍ͯ͠Δ

    ͱߟ͑ΒΕΔ
    7

    View Slide

  8. ੈքϞσϧ
    ஌ೳʹ͓͚ΔModel Building [Lake+ 2016]ͷॏཁੑ
    • Josh TenenbaumઌੜʹΑΔMITͰͷߨٛ
    • MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

    https://www.youtube.com/watch?v=7ROelYvo8f0
    8

    View Slide

  9. ྑ͍දݱͱ͸ʁ
    ੜͷ؍ଌ৘ใͷແؔ࿈ͳ෦෼Λແࢹͯ͠ɼڧԽֶशʹར༻͢ΔͨΊʹඞཁෆՄܽͳ
    ৘ใΛΤϯίʔυ͢Δ͜ͱ͕ඞཁ
    [Böhmer et al., 2015]ʹΑΔྑ͍ঢ়ଶදݱͷఆٛ
    • Ϛϧίϑੑ͕͋Δ
    • ݱࡏͷঢ়ଶͷΈΛݟΔ͚ͩͰɼ͋ΔํࡦΛ༻͍ͯߦಈΛબ୒͢Δ͜ͱ͕Ͱ͖Δ͙Β͍े෼ͳ৘
    ใΛཁ໿͍ͯ͠Δ
    • ํࡦͷվળͷͨΊʹར༻Ͱ͖Δ
    • ಉ͡Α͏ͳಛ௃Λ࣋ͭݟͨ͜ͱͷͳ͍ঢ়ଶʹɼֶशͨ͠Ձ஋ؔ਺Λ൚ԽͰ͖Δ
    • ௿࣍ݩͰ͋Δ
    9

    View Slide

  10. SRLͰ͸ਅͷঢ়ଶɹɹɹΛ࢖Θͣʹɼ͜ΕΛۙࣅ͢ΔΑ͏ͳঢ়ଶɹɹɹΛֶश͢Δ
    • աڈͷ؍ଌɹɹ͔Βݱࡏͷঢ়ଶɹ΁ͷϚοϐϯάɹɹɹɹɹɹͷֶश
    SRLͷҰൠԽ
    10
    at

    ot

    ؍ଌ
    ߦಈ
    at
    ot
    ot+1
    ਅͷঢ়ଶ(ෆ໌)
    ˜
    st
    ˜
    st+1
    ˜
    st
    ∈ ˜

    ใु
    ˜
    st
    ∈ ˜
    st

    o1:t
    st
    st
    = ϕ (o1:t)

    View Slide

  11. SRLͷΞϓϩʔν
    SRLͷΞϓϩʔνʹ͸͍͔ͭ͘ύλʔϯ͕͋Δ
    • ࣗݾූ߸Խث(auto-encoder)ͷར༻
    • ॱϞσϧ(forward model)ͷར༻
    • ٯϞσϧ(inverse model)ͷར༻
    • ࣄલ஌ࣝ(prior)ͷಋೖ
    11

    View Slide

  12. SRLͷΞϓϩʔν
    ࣗݾූ߸Խث(auto-encoder)ͷར༻
    • ࠶ߏ੒ޡࠩͷ࠷খԽΛ͢Δ͜ͱͰɼΤϯίʔμɹͱσίʔμɹɹΛֶश
    • ͦͷࡍɼঢ়ଶɹ͕͋Δੑ࣭Λ࣋ͭΑ͏ʹ੍໿Λ͔͚Δ
    • ྫ)࣍ݩͷ੍໿ɼϊΠζͷআڈ(denoising)ɼεύʔεੑͷ੍໿
    12
    st
    st
    ϕ ϕ−1
    st
    = ϕ (ot
    ; θϕ)
    ̂
    ot
    = ϕ−1
    (st
    ; θϕ−1)
    ࠶ߏ੒ޡࠩ
    Τϯίʔμ σίʔμ

    View Slide

  13. SRLͷΞϓϩʔν
    ॱϞσϧ(forward model)ͷར༻
    • ॱϞσϧɹ͸ঢ়ଶɹͱߦಈɹΛ༻͍ͯ࣍ͷঢ়ଶɹɹΛ༧ଌ
    • ॱϞσϧʹઢܗม׵ͳͲͷ੍໿Λ͔͚Δ͜ͱ͕Ͱ͖Δ
    • Τϯίʔμɹ͸࣍ͷঢ়ଶͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ
    13
    ̂
    st+1
    = f (st
    , at
    ; θfwd)
    ॱϞσϧ
    ࣍ͷঢ়ଶͷ༧ଌޡࠩ
    st
    = ϕ (ot
    ; θϕ)
    Τϯίʔμ
    ϕ
    st
    at
    st+1
    f

    View Slide

  14. SRLͷΞϓϩʔν
    ٯϞσϧ(inverse model)ͷར༻
    • ঢ়ଶɹͱ࣍ͷঢ়ଶɹɹ͔Β࣮ࡍʹऔΒΕͨߦಈɹΛਪఆ͢Δ
    • Τϯίʔμɹ͸࣮ࡍʹͱΒΕͨߦಈɹͷ༧ଌޡࠩΛٯ఻೻ͤ͞Δ͜ͱͰֶश͞ΕΔ
    14
    st
    st+1
    at
    ϕ at
    st
    = ϕ (ot
    ; θϕ)
    Τϯίʔμ
    ̂
    at
    = g (st
    , st+1
    ; θinv)
    ٯϞσϧ
    ࣮ࡍʹऔΒΕͨ

    ߦಈͷ༧ଌޡࠩ

    View Slide

  15. SRLͷΞϓϩʔν
    ࣄલ஌ࣝ(prior)ͷಋೖ
    • ಛఆͷ੍໿΍μΠφϛΫεʹؔ͢Δࣄલ஌ࣝΛར༻͢Δ
    • ྫ) ࣌ؒతͳ࿈ଓੑ
    • ࣄલ஌ࣝ͸͋Δ৚݅ɹͷ΋ͱͰɼঢ়ଶͷू߹ɹɹʹద༻͞ΕΔlossΛ௨ͯ͡ఆٛ͞ΕΔ

    15
    Loss = ℒprior (s1:n
    ; θϕ
    |c)
    s1:n
    c
    ঢ়ଶͷۭؒࣗମʹ

    ੍໿Λ͓͘
    st
    = ϕ (ot
    ; θϕ)
    Τϯίʔμ

    View Slide

  16. ͳͥSRLΛߟ͑Δ΂͖ͳͷ͔?
    • ੜͷ؍ଌ͔Βend-to-endʹ௚઀ڧԽֶश͢Δͷ͸ίετ͕ߴ͍
    • SRLͰྑ͍priorΛೖΕͯ͋͛Δ͜ͱ͕Ͱ͖Δ͔΋
    • ϚϧνϞʔμϧͳ؍ଌʹ֦ு͠ಘΔ
    • ؔ࿈ͨ͠λεΫΛࣄલʹղ͘͜ͱͰసҠֶशʹར༻Ͱ͖Δ
    • ਐԽઓུ(ES)ͳͲͷɼ࣍ݩ͕୳ࡧεϐʔυʹ௚݁͢ΔΑ͏ͳΞϧΰϦζϜΛ࠾༻͢Δ͜ͱ
    ͕ՄೳʹͳΔ
    Why SRL?
    16

    View Slide

  17. طଘͷݚڀͷ঺հͱ෼ྨ
    17

    View Slide

  18. ݚڀͷ෼ྨ
    ෼ྨͷํ๏
    • ֶशͷ໨తؔ਺
    • ؍ଌۭؒɾߦಈۭؒͷઃܭ
    • ঢ়ଶදݱͷධՁࢦඪ
    • ධՁʹ༻͍ΔλεΫ
    18

    View Slide

  19. ֶशͷ໨తؔ਺
    • ؍ଌͷ࠶ߏ੒
    • ॱϞσϧ(forward model)ͷֶश
    • ٯϞσϧ(inverse model)ͷֶश
    • ಛ௃ͷఢରతֶशͷ׆༻
    • ใुͷ׆༻
    • ͦͷଞͷ໨తؔ਺
    • ϋΠϒϦουͳ໨తؔ਺
    19

    View Slide

  20. ֶशͷ໨తؔ਺
    ؍ଌͷ࠶ߏ੒
    • ࣍ݩѹॖͱͯ͠Α͘࢖ΘΕΔํ๏
    • ྫ) PCA[Curran+ 2015]ɼDAEɼVAE[van Hoof+ 2016]ɽ
    • ࣗݾූ߸Խث(auto-encoder)Λ࢖͏ख๏͕ଟ͍
    • ը૾ͷ؍ଌΛͦͷ··࢖͏[Mattner+ 2012]
    • ΦϒδΣΫτͷҐஔΛදݱ͢ΔΑ͏ʹ੍໿͢Δ ྫ)Spatial Softmax [Finn+ 2015]
    • ؍ଌʹ໨ཱͭಛ௃͕ଘࡏͯ͠ͳ͍ͱ୯ʹ؍ଌΛ࠶ߏ੒͢Δ͚ͩͰ͸ྑ͍දݱʹ͸ͳΒͳ͍
    • ྫ)ήʔϜʹ͓͚Δখ͍͞ΞΠςϜ
    • ҧ͏࣌ؒεςοϓ͔Β࠶ߏ੒ͨ͠Γɼ࣌ؒൃలʹ੍ؔͯ͠໿Λ͔͚Δ͜ͱͰରԠ 20

    View Slide

  21. ֶशͷ໨తؔ਺
    ॱϞσϧ(forward model)ͷֶश
    • ঢ়ଶ͕࣍ͷঢ়ଶΛ༧ଌ͢Δͷʹඞཁͳ৘ใΛΤϯίʔυ͢ΔΑ͏ʹ͢Δ
    • ؍ଌͷ࠶ߏ੒ͱΑ͘૊Έ߹ΘͤΒΕΔ
    • ঢ়ଶۭؒʹ͓͚ΔભҠΛઢܗͱԾఆ͢Δ͜ͱ͕ଟ͍
    21
    ̂
    st+1
    = Wst
    + Uat
    + V

    View Slide

  22. (ྫ) E2C [Watter+ 2015]
    Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw
    Images
    • VAEΛ༻͍ͨॱϞσϧɽঢ়ଶ(જࡏදݱ)ɹͷભҠΛઢܗͰ͋ΔͱԾఆ.
    • ࣍ͷ࣌ؒεςοϓͷঢ়ଶͷ༧ଌɹɹͱͦͷঢ়ଶɹɹͷKLΛ

    ͚ۙͮΔ͜ͱͰॱϞσϧΛֶश
    • ΧϧϚϯϑΟϧλͱͯ͠ఆࣜԽͨ͠΋ͷ΋͋Δ(DVBF)

    [Karl+ 2016]
    22
    st
    ̂
    st+1
    ∼ (μ = Wst
    + Uat
    + V, σ)
    ઢܗ
    ̂
    st+1
    st+1

    View Slide

  23. World Models
    • VAEͱMDN-RNNΛར༻ͨ͠ॱϞσϧ
    • Vision model (V): ߴ࣍ݩͷ؍ଌσʔλΛVAEΛ༻͍ͯ

    ௿࣍ݩͷίʔυ(ঢ়ଶ)ʹѹॖ
    • Memory RNN (M): աڈͷίʔυ͔Β࣍ͷεςοϓͷ

    ίʔυ(ঢ়ଶ)Λ༧ଌ
    [DLྠಡձ]World Models

    https://www.slideshare.net/DeepLearningJP2016/dlworld-models-95167842
    (ྫ) World Model [Ha+ 2018]
    23

    View Slide

  24. ֶशͷ໨తؔ਺
    ٯϞσϧ(inverse model)ͷֶश
    • ͱͬͨߦಈΛਪఆͰ͖ΔΑ͏ʹঢ়ଶͷදݱʹ੍໿Λ՝͢
    • ྫ) Learning to Poke by Poking [Agrawal+ 2016]
    • ͍ͭͬͭͨҐஔ(ɹ)ɼ֯౓(ɹ)ɼڑ཭(ɹ)Λਪఆ
    24
    lt
    θt
    pt

    View Slide

  25. (ྫ) ICM [Pathak+ 2017]
    Curiosity-driven Exploration by Self-supervised Prediction
    • ॱϞσϧͷ༧ଌޡࠩɹɹΛڧԽֶशͷ಺తใुͱͯ͠ར༻
    • ΤʔδΣϯτͷ֎෦͔Βͷใु͕εύʔεͳͱ͖ʹ୳ࡧΛଅਐ͢Δ
    • ٯϞσϧʹΑΔLoss΋ར༻
    [DLྠಡձ]Large-Scale Study of Curiosity-Driven Learning

    https://www.slideshare.net/DeepLearningJP2016/dllargescale-study-of-curiositydriven-learning
    25
    ℒfwd (
    ̂
    ϕ (ot+1), ̂
    f (
    ̂
    ϕ (ot), at)) =
    1
    2
    ̂
    f (
    ̂
    ϕ (ot), at) − ̂
    ϕ (ot+1)
    2
    2
    ℒfwd
    min
    θP
    ,θI
    ,θF
    [−λπ(st
    ; θP)
    [Σt
    rt] + (1 − β)ℒinv
    + βℒfwd]
    ٯϞσϧ ॱϞσϧ
    ֎తใु

    View Slide

  26. ֶशͷ໨తؔ਺
    ಛ௃ͷఢରతֶश
    • ྫ) Causal InfoGAN [Kurutach+ 2018]
    • GANͷ໨తؔ਺ʹঢ়ଶͱGeneratorͷग़ྗ(؍ଌͷϖΞ)ͷ૬ޓ৘ใྔʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճ
    26
    min
    G,Q,ℳ
    max
    D
    V(G, D) − λIVLB
    (G, Q)
    ૬ޓ৘ใྔ

    View Slide

  27. ֶशͷ໨తؔ਺
    ใुͷ׆༻
    • SLRʹ͓͍ͯ͸ใुΛར༻͢Δ͜ͱ͸ඞͣ͠΋ඞཁͰ͸ͳ͍͕ɼঢ়ଶΛ۠ผ͢ΔͨΊͷ௥
    Ճతͳ৘ใͱͯ͠ར༻͠͏Δ
    • ྫ) VPN [Oh+ 2017]
    • ࣍ͷঢ়ଶͱͦͷঢ়ଶՁ஋΋༧ଌ
    27
    ߦಈ

    ※optionͷo
    ࣍ͷঢ়ଶ
    ࣍ͷঢ়ଶՁ஋
    ؍ଌ ঢ়ଶ

    View Slide

  28. ֶशͷ໨తؔ਺
    ͦͷଞͷ໨తؔ਺
    • ࣮ੈքʹؔ͢Δࣄલ஌ࣝ(prior)Λঢ়ଶۭؒʹ൓ө͢ΔͨΊʹɼ໨తؔ਺Λ޻෉͢Δ
    • ͍Ζ͍Ζͳ΋ͷ͕ఏҊ͞Ε͍ͯΔ

    • Slowness prior [Lesort+ 2017, Jonschkowski+ 2017]
    • ॏཁͳ΋ͷ͸Ώͬ͘Γͱ࿈ଓతʹಈ͖ɼٸܹͳมԽ͕ى͜ΔՄೳੑ͸௿͍


    • Variability [Jonschkowski+ 2017]
    • ؔ܎ͷ͋Δ΋ͷ͸ಈ͘ͷͰɼঢ়ଶදݱֶश͸ಈ͍͍ͯΔ΋ͷʹ஫໨͢΂͖


    28
    ℒSlowness
    (D, ϕ) = [ Δst
    2
    ]
    ℒVariabilty
    (D, ϕ) = [e− st1
    − st2
    ]

    View Slide

  29. ֶशͷ໨తؔ਺
    ͦͷଞͷ໨తؔ਺
    • Robotic Priors [Jonschkowski+ 2015]Ͱಋೖ͞Ε͍ͯΔ΋ͷ
    • Proportionality
    • ҧ͏ঢ়ଶͰ΋ಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓Ͱ͋Δ


    • Repeatability
    • ࣅͨঢ়ଶͰಉ͡ߦಈΛͨ͠৔߹ʹ͸ɼঢ়ଶʹٴ΅͢Өڹ͸ಉఔ౓ɾಉ͡ํ޲Ͱ͋Δ

    29
    ℒProp
    (D, ϕ) = [( Δst2
    − Δst1
    )
    2
    |at1
    = at2]
    ℒRep
    (D, ϕ) = [e− st2
    − st1
    2
    Δst2
    − Δst1
    2 |at1
    = at2]

    View Slide

  30. ֶशͷ໨తؔ਺
    ϋΠϒϦουͳ໨తؔ਺
    • ࣮ࡍ͸ࠓ·Ͱʹڍ͛ͨ໨తؔ਺ͷ͏ͪɼෳ਺Λ૊Έ߹ΘͤͯSRL͕ߦΘΕΔ͜ͱ͕ଟ͍
    30
    ߦಈ/࣍ͷঢ়ଶ
    ͷ੍໿
    ॱϞσϧ

    ※࣍ͷঢ়ଶͷ༧ଌ
    ٯϞσϧ ؍ଌͷ࠶ߏ੒
    ࣍ͷ؍ଌͷ

    ༧ଌ
    ใुͷ׆༻
    E2C

    [Watter+ 2015]
    ✔ ✔ ✔ ✔
    World Model

    [Ha+ 2018]
    ✔ ✔ ✔
    ICM

    [Pathak+ 2017]
    ✔ ✔ ✔
    Causal InfoGAN

    [Kurutach+ 2018]
    ✔ ✔ ✔ ✔
    VPN

    [Oh+ 2017]
    ✔ ✔
    Robotic Priors

    [Jonschkowski+ 2015]
    ✔ ✔

    View Slide

  31. ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ
    • ؍ଌɾঢ়ଶɾߦಈۭؒͷઃܭ͸໰୊ͷෳࡶੑʹӨڹΛٴ΅͢
    • Ͳͷ͘Β͍ͷ࣍ݩͷେ͖͔͞ɼߦಈ͕཭ࢄ͔ɾ࿈ଓ͔
    • ௨ৗɼਅͷঢ়ଶΑΓ΋େ͖ͳঢ়ଶۭؒͷ࣍ݩΛઃܭ͢Δ͜ͱ͕ଟ͍
    • ঢ়ଶΛͲͷ͙Β͍ͷ࣍ݩʹ͢Ε͹͍͍͔Α͘Θ͔Βͳ͍λεΫ΋ଟ͍ ྫ)Atari
    31
    ؀ڥ ؍ଌͷछྨ ؍ଌۭؒͷ࣍ݩ ঢ়ଶͷ࣍ݩ ߦಈ
    Robotic Priors

    [Jon-schkowski+ 2015]
    slot car racing ը૾ 16×16×3 2 ཭ࢄ(25)
    E2C

    [Watter+ 2015]
    cart-pole ը૾ 80×80×3 8 ཭ࢄ
    ICM

    [Pathak+ 2017]
    Mario Bros. ը૾ 42×42×3 2 ཭ࢄ(14)

    View Slide

  32. ঢ়ଶදݱͷධՁࢦඪ
    Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ
    • ΤʔδΣϯτʹ࣮ࡍʹڧԽֶशλεΫΛղ͔ͤͯɼλεΫؒͰసҠͰ͖Δ͙Β͍൚Խ͞Ε
    ͨදݱʹͳ͍ͬͯΔ͔Λௐ΂Δ
    • ΋ͬͱ΋Ұൠతͳํ๏͕ͩɼ࣮ݧίετ͕ߴ͍
    • ͲͷڧԽֶशΞϧΰϦζϜΛ࢖ͬͯධՁ͢Ε͹͍͍͔Θ͔Βͳ͍
    • ͳͷͰɼֶशͨ͠ঢ়ଶදݱ͕ྑ͍͔Ͳ͏͔ͷதؒతͳධՁख๏͕ཉ͍͠
    • ࠷ۙ๣๏Λ࢖͏
    • ࣭తධՁ
    • ྔతධՁ (KNN-MSE [Lesort+ 2017])
    32
    KNN − MSE(s) =
    1
    k ∑
    s′∈KNN(s,k)
    ˜
    s − ˜
    s′ 2

    View Slide

  33. ঢ়ଶදݱͷධՁࢦඪ
    Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ
    • ΋ͭΕͷͳ͍දݱ(disentangled)͔Ͳ͏͔ΛΈΔ
    • disentangled metric score [Higgins+ 2016]
    • σʔλͷഎޙͷੜ੒ཁҼ͕෼͔͍ͬͯΔલఏ
    • ༰ྔ͕খ͘͞VC࣍ݩͷখ͍͞൑ผثͷaccuracyΛ༻͍Δํ๏
    • ਅͷঢ়ଶ΁ͷճؼϞσϧΛ࡞Δ [Jonschkowski+ 2015]
    • ςετηοτͷਫ਼౓ΛධՁ͢Δ
    33

    View Slide

  34. ঢ়ଶදݱͷධՁࢦඪ
    Ͳ͏΍ͬͯঢ়ଶදݱͷྑ͞ΛධՁ͢Δ͔ʁ
    34

    View Slide

  35. ධՁʹ༻͍ΔλεΫ
    SRLͰఆ൪ͷλεΫ
    • ৼࢠɾ౗ཱৼࢠ
    • ϥϯμϜͳҐஔ͔Βελʔτ͢ΔৼࢠΛཱͯΔ
    • Cart-Pole
    • ୆ंͷ͍ͭͨ౗ཱৼࢠΛཱͯΔ
    • ਨ௚ํ޲͔Β15°ͣΕΔ͔த৺͔Β2.4ϢχοτͿΜͣΕͯ͠·͏ͱΤϐιʔυ͕ऴྃ͢Δ
    35

    View Slide

  36. ධՁʹ༻͍ΔλεΫ
    SRLͰఆ൪ͷλεΫ
    • ϏσΦήʔϜ
    • ྫ) AtariɼDoomɼSuper Mario Bros.
    • ෺ཧγϛϡϨʔλ
    • ྫ) OpenAI Gymɼ DeepMind Labs
    • ࣮ϩϘοτ
    • ྫ) ϚχϐϡϨʔγϣϯ[Finn+ 2015]ɼϘλϯԡ͠[Lesort+ 2015]ɼ೺࣋[Finn+ 2015]
    36

    View Slide

  37. S-RL Toolbox
    SRLΞϧΰϦζϜͷධՁʹؔ͢Δ͍Ζ͍ΖΛղܾ͢Δπʔϧ [Raffin+ 2018]
    • https://github.com/araffin/robotics-rl-srl
    • ଟ༷ͳػೳ
    • 10छྨͷڧԽֶशΞϧΰϦζϜ
    • Open AI GymܗࣜͷΠϯλʔϑΣΠεΛ࣋ͭධՁ؀ڥ
    • ϩΨʔɾՄࢹԽπʔϧ
    • ϋΠύʔύϥϝʔλαʔνπʔϧ
    • ࣮ػͷbaxterͰूΊͨσʔληοτ
    • SRLͷ࣮૷ू΋SRL-Zooͱؚͯ͠·Ε͍ͯΔ
    • https://github.com/araffin/srl-zoo
    • PyTorchͰ͏Ε͍͠
    37

    View Slide

  38. ୈ1෦ͷ͓ΘΓʹ
    38

    View Slide

  39. ײ૝
    • ঢ়ଶදݱʹؔͯ͠ͲΕ͚ͩෆ࣮֬ੑ͕͋Δͷ͔ΛධՁ͢Δݚڀ͸͋ΔͷͩΖ͏͔ʁ
    • ྫ͑͹ɼ࠷ॳͷ1ϑϨʔϜ͚ͩݟͨͱ͖ͱɼ20ϑϨʔϜ࿈ଓͰݟͨͱ͖Ͱ͸ͦͷঢ়ଶදݱͷෆ
    ࣮֬ੑ͸ҟͳΔ͸ͣ
    • ͦͷෆ࣮֬ੑΛ൓өͨ͠policy͕࡞ΕΕ͹ޮ཰తͳ୳ࡧʹ΋ͭͳ͕Δʁ
    • ͨ͘͞ΜͷλεΫΛղ͔ͤͯSRLͯ͠ɼྑ͍SRLͷύϥϝʔλΛֶशͨ͠ͷͪɼfew-shot
    Ͱ৽͍͠λεΫʹద߹ͤ͞ΔMAMLతͳΞϓϩʔν͕༗ޮ͔΋
    • ͦ΋ͦ΋ɼSRLΛ͍ͨ͠ؾ࣋ͪ͸ɼͨ͘͞ΜͷλεΫͰڞ༗Ͱ͖ΔදݱΛֶश͍͔ͨ͠Βͩͬ
    ͨͷͰ͸ʁ
    • (·͋ɼ࣮ݧίετ͕ߴ͍ͷͰɼ࿦จ಺Ͱͨ͘͞ΜͷυϝΠϯΛ࢖ͬͨڧԽֶशΛͨ͘͠ͳ͍ͷ͸Θ
    ͔Δ͚Ͳ΋…)
    39

    View Slide

  40. σΟεΧογϣϯ
    ੈքϞσϧͷֶशͱํࡦͷֶशͷ࿩
    • ੈքϞσϧ͕ෆ׬શͳͱ͖ʹํࡦΛͲ͏ֶश͢Δͷ͔ʁ
    • ϞσϧΛΞϯαϯϒϧ͢Δํ๏
    දݱֶशͱ͍͏໰୊ઃఆࣗମͷ࿩
    • ݁ہɼਅͷdownstreamͷλεΫ͕Θ͔Βͳ͍ͱ͖ʹ΋ɼͳΜΒ͔ͷྑ͍දݱ͕ଘࡏͯ͠
    ͍Δ͸ͣͱ͍͏ԾఆΛ͓͘ɼදݱֶशͷ໰୊ʹߦ͖ண͘ͷͰ͸
    • meta-priorͷ֓೦ʹ૬౰[Bengio+ 2013]
    • ͜ͷ೉͠͞ͷഎܠʹ͸ɼλεΫ͕཭ࢄతʹಘΒΕΔͱ͍͏໰୊ઃఆࣗମͷԾఆ͕͋ΔΑ͏
    ͳؾ΋͢Δ
    40

    View Slide

  41. ୈ1෦ͷAppendix
    41

    View Slide

  42. References
    [Agrawal+ 2016] Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine (2016). Learning to Poke by Poking:
    Experiential Learning of Intuitive Physics. https://arxiv.org/abs/1606.07419
    [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on
    Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/document/6472238
    [Böhmer+ 2015] Böhmer, W., Springenberg, J. T., Boedecker, J., Riedmiller, M., and Obermayer, K. (2015). Autonomous learning of state
    representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from
    their real-world sensor observations. KI - Künstliche Intelligenz, pages 1–10. http://www.ni.tu-berlin.de/fileadmin/fg215/articles/
    boehmer15b.pdf
    [Curran+ 2015] William Curran, Tim Brys, Matthew Taylor, William Smart (2015). Using PCA to Efficiently Represent State Spaces. https://
    arxiv.org/abs/1505.00322
    [Finn+ 2015] Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel (2015). Deep Spatial Autoencoders for
    Visuomotor Learning. https://arxiv.org/abs/1509.06113
    [Ha+ 2018] David Ha, Jürgen Schmidhuber (2018). World Models. https://arxiv.org/abs/1803.10122
    [Higgins+ 2016] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir
    Mohamed, Alexander Lerchner (2016). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. https://
    openreview.net/forum?id=Sy2fzU9gl
    [Jonschkowski+ 2015] Jonschkowski, R. and Brock, O. (2015). Learning state representations with robotic priors. Auton. Robots, 39(3):
    407–428. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-15-AURO.pdf
    [Jonschkowski+ 2017] Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller (2017). PVEs: Position-Velocity Encoders
    for Unsupervised Learning of Structured State Representations. https://arxiv.org/abs/1705.09805
    [Karl+ 2016] Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt. Deep Variational Bayes Filters: Unsupervised
    Learning of State Space Models from Raw Data. https://arxiv.org/abs/1605.06432
    42

    View Slide

  43. References
    [Kurutach+ 2018] Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel (2018). Learning Plannable
    Representations with Causal InfoGAN. https://arxiv.org/abs/1807.09341
    [Lake+ 2016} Building Machines That Learn and Think Like People (2016). Brenden M. Lake, Tomer D. Ullman, Joshua B.
    Tenenbaum, Samuel J. Gershman. https://arxiv.org/abs/1604.00289
    [Oh+ 2017] Junhyuk Oh, Satinder Singh, Honglak Lee (2017). Value Prediction Network. https://arxiv.org/abs/1707.03497
    [Pathak+ 2017] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell (2017). Curiosity-driven Exploration by Self-
    supervised Prediction. https://arxiv.org/abs/1705.05363
    [Raffin+ 2018] Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat (2018). S-RL
    Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning. https://arxiv.org/abs/1809.09369
    [Lesort+ 2017] Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat (2017). Unsupervised state
    representation learning with robotic priors: a robustness benchmark. https://arxiv.org/abs/1709.05185
    [Mattner+ 2012] Mattner, J., Lange, S., and Riedmiller, M. A. (2012). Learn to swing up and balance a real pole based on raw
    visual input data. In Neural Information Processing - 19th International Conference, ICONIP 2012, Doha, Qatar, November
    12-15, 2012, Proceedings, Part V, pages 126–133. https://ieeexplore.ieee.org/document/7759578
    [van Hoof+ 2016] van Hoof, H., Chen, N., Karl, M., van der Smagt, P., and Peters, J. (2016). Stable reinforcement learning with
    autoencoders for tactile and visual data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
    pages 3928–3934. https://ieeexplore.ieee.org/document/7759578/
    [Watter+ 2015] Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller (2015). Embed to Control: A
    Locally Linear Latent Dynamics Model for Control from Raw Images. https://arxiv.org/abs/1506.07365
    43

    View Slide

  44. ࣭ٙԠ౴ɾσΟεΧογϣϯ
    ٳܜ
    44

    View Slide

  45. ୈ2෦: ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyzϋϯζΦϯ
    45

    View Slide

  46. ਂ૚ੜ੒Ϟσϧ
    46

    View Slide

  47. ੜ੒Ϟσϧ
    ੜ੒Ϟσϧ
    • σʔλͷ෼෍ΛϞσϧԽ͢ΔΞϓϩʔν
    • Ϟσϧ͔Βαϯϓϧ͢Δ͜ͱͰਓ޻తͳσʔλ఺Λੜ੒͢Δ͜ͱ͕Ͱ͖Δ
    47
    αϯϓϦϯά

    View Slide

  48. ਂ૚ֶशʹ͓͚Δੜ੒Ϟσϧ
    ਂ૚ੜ੒Ϟσϧ (Deep Generative Model, DGM)
    • ෼෍ʹχϡʔϥϧωοτϫʔΫΛ༻͍Δ
    • VAEͱGAN͕Α͘஌ΒΕ͍ͯΔ
    • ͱ͘ʹɼVAE͸ࠓ·Ͱͷ؍ଌͷܥྻͷ௿࣍ݩදݱ(ঢ়ଶදݱ)Λֶश͢ΔͨΊʹ

    Α͘༻͍ΒΕ͍ͯΔ(ୈ1෦)
    48
    VAE
    ग़య: [Tschannen+ 2018]
    GAN
    ग़య: [Tschannen+ 2018]

    View Slide

  49. VAE
    Variational Autoencoder (VAE) [Kingma+ 2014]
    • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ܇࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢
    • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓ ͷԼքʹͳ͍ͬͯΔ(ELBO)
    • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ͷ࠷খԽ)
    49
    ℒVAE
    (θ, ϕ) = ̂
    p(x) [qϕ
    (z|x) [−log pθ
    (x|z)]] + ̂
    p(x) [DKL (qKL
    (z|x)∥p(z))]
    ※ ܦݧσʔλ෼෍ɹɹͰظ଴஋ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼ΍΍ݟ׳Εͳ͍͕ී௨ͷVAEͷELBO
    ̂
    p(x) [−log pθ
    (x)] = ℒVAE
    (θ, ϕ) − ̂
    p(x) [DKL (qϕ
    (z|x)∥pθ
    (z|x))]
    −ℒVAE
    ̂
    p(x) [−log pθ
    (x)]
    ℒVAE
    ̂
    p(x)
    ग़య: [Tschannen+ 2018]
    KL߲
    ࠶ߏ੒

    View Slide

  50. VAE
    VAEͷloss
    • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯٯ఻೻

    • ୈ2߲͸ɼclosed-formʹٻΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ
    • Τϯίʔμͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ΛબΜͩͱ͖͸
    closed-formʹܭࢉͰ͖Δ
    • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷڑ཭Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ

    ྫ) GANʹ͓͚Δdensity ratio trick
    50
    ℒVAE
    (θ, ϕ) = ̂
    p(x) [qϕ
    (z|x) [−log pθ
    (x|z)]] + ̂
    p(x) [DKL (qϕ
    (z|x)∥p(z))]
    z(i) ∼ qϕ
    (z|x(i))

    (z|x) = (μϕ
    (x), diag (σϕ
    (x))) p(z) = (0,I)
    KL߲
    ࠶ߏ੒

    View Slide

  51. ఢରతֶशʹΑΔີ౓ൺਪఆ
    f-μΠόʔδΣϯε
    • ɹΛತؔ਺Ͱɼ ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼ ͱ ͷf-μΠόʔδΣϯεΛ



    ͱఆٛ͢Δɽ
    • ͷͱ͖ɼKL divergenceʹͳΔ
    • ɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ
    Ͱ͖Δ
    • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ
    51
    f f(1) = 0 px
    py
    Df (px
    ∥py) =

    f
    (
    px
    (x)
    py
    (x) )
    py
    (x)dx
    f(t) = t log t
    Df (px
    ∥py) = DKL (px
    ∥py)
    px
    py

    View Slide

  52. ఢରతֶशʹΑΔີ౓ൺਪఆ
    GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ
    • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠දݱ͢Δ
    • ͭ·Γɼɹɹɹɹɹɹɹɼ
    • 2஋෼ྨλεΫʹམͱ͠ࠐΈɼDiscriminator ͸ͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ
    ͋Δ֬཰Λ༧ଌ͢Δ
    • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ


    ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ
    • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹݸͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ
    52
    c ∈ {0,1}
    px
    py
    px
    (x) = p(x|c = 1) py
    (x) = p(x|c = 0)

    px
    (x)
    px
    (x)
    py
    (x)
    =
    p(x|c = 1)
    p(x|c = 0)
    =
    p(c = 1|x)
    p(c = 0|x)


    (x)
    1 − Sη
    (x)
    px N
    DKL (px
    ∥py) =

    px
    (x)log
    (
    px
    (x)
    py
    (x) )
    dx ≈
    1
    N
    N

    i=1
    log
    (
    Sη (x(i))
    1 − Sη (x(i)))
    ग़య: [Tschannen+ 2018]

    View Slide

  53. ਂ૚ੜ੒ϞσϧϥΠϒϥϦPixyz
    53

    View Slide

  54. Pixyzͱ͸
    Pixyz
    • ෳࡶͳਂ૚ੜ੒ϞσϧΛ؆୯ʹ࣮૷ɾར༻͢Δ͜ͱʹ

    ಛԽͨ͠PyTorchϕʔεͷϥΠϒϥϦ
    • ϨϙδτϦ: https://github.com/masa-su/pixyz
    • υΩϡϝϯτ: https://docs.pixyz.io
    • ౦େদඌݚ ླ໦͞Μ͕։ൃ
    • ڧԽֶशΞʔΩςΫνϟษڧձͷΦʔΨφΠβͷ1ਓ
    • ਂ૚ੜ੒ϞσϧΛهड़͢ΔϥΠϒϥϦͱͯ͠

    ֬཰ม਺ɹɹɹͷಉ࣌෼෍ɹɹɹɹΛҙ໊ࣝͯ͠෇͚ΒΕ͍ͯΔ
    54
    x, y, z P(x, y, z)

    View Slide

  55. 3छྨͷAPIʹΑΔ֊૚తͳߏ଄
    • ֤API͕ׯব͠ͳ͍ͨΊɼࣗ༝ʹωοτϫʔΫ΍෼෍ɾ໨తؔ਺Λߏ੒ɾมߋՄೳ
    • طଘͷ֬཰ϞσϦϯάݴޠͰ͸ɼ֬཰෼෍ͱωοτϫʔΫΛಉ࣌ʹهड़͢Δඞཁ͕͋ͬͨ
    • ྫ) Edward
    Pixyzͷ3ͭͷAPI
    55

    View Slide

  56. 1. Distribution API
    ֬཰෼෍ͷAPI
    • DistributionΫϥεΛܧঝͯ͠ωοτϫʔΫΛఆٛ͢Δ
    • torch.distributions ʹؚ·ΕΔ΋ͷͱ΄΅ಉ͡ॻ͖ํ
    • ಉ࣌෼෍ͷҼ਺෼ղΛɼ෼෍ͷֻ͚ࢉͱͯ͠௚઀هड़Ͱ͖Δ
    • ෼෍ͷੵͱͯ͠ߏ੒͞ΕΔ෼෍΋ɼಉ༷ʹ෼෍ͱͯ͠αϯϓϦϯά΍໬౓ܭࢉ͕Մೳ
    56

    View Slide

  57. 2. Loss API
    DistributionΫϥεΛ΋ͱʹɼޡࠩؔ਺΍ԼքΛܭࢉ͢Δ
    • σʔλΛҾ਺ͱͯ͠estimateϝιουΛ࢖͏͜ͱͰ஋ΛධՁͰ͖Δ (define-and-run)
    • ༷ʑͳLoss͕طʹఆٛ͞Ε͍ͯΔ
    • ྫ) ෛͷର਺໬౓(NLL)ɼKLμΠόʔδΣϯε(KullbackLeibler)…
    • Lossؒͷ࢛ଇԋࢉ͕ՄೳͳͷͰɼෳࡶͳਂ૚ੜ੒Ϟσϧ͕༻ҙʹهड़Ͱ͖Δ
    57
    − ∑
    x,y∼pdata
    (x,y)
    [
    Eq(z|x,y) [log
    p(x, z|y)
    q(z|x, y) ] + α log q(y|x)
    ]
    − ∑
    xu
    ∼pdata(xu)
    Eq(z|xu
    ,y)q(y|xu
    ) [
    log
    p (xu
    , z|y)
    q(z|xu
    , y)q(y|xu
    ) ]
    ྫ) M2Ϟσϧ[Kingma+ 2014]

    View Slide

  58. 3. Model API
    Loss΍optimizerΛModelΫϥεʹ౉ͯ͠ϞσϧΛఆٛ
    • trainϝιουͰֶशɼtestϝιουͰධՁ
    ग़དྷ߹͍ͷϞσϧ΋༻ҙ͞Ε͍ͯΔ
    • ؆୯ͳϞσϧͰͬ͞͞ͱ࣮૷͍ͨ͠ਓ޲͚
    • ྫ) VAEɼGANɼม෼ਪ࿦(VI)ɼ࠷໬ਪఆ(ML) https://docs.pixyz.io/en/latest/models.html
    58

    View Slide

  59. PixyzϋϯζΦϯ
    59

    View Slide

  60. 1. Πϯετʔϧ
    લఏ: PyTorch͕Πϯετʔϧࡁ
    • ͜ͷลΛΈ͍ͯͩ͘͞ https://pytorch.org/get-started/locally/
    • ;ͭ͏͸ɼɹɹɹɹɹɹɹɹɹɹɹͰΑ͍ͱࢥΘΕΔ
    1) PixyzͷgithubϨϙδτϦ͔Βclone
    2) pip install
    • কདྷɼόʔδϣϯ͕҆ఆͨ͠ΒPyPIʹొ࿥͢Δ༧ఆͩͦ͏Ͱ͢(git clone͢Δඞཁͳ͘ͳΔ)
    60
    git clone https://github.com/masa-su/pixyz.git
    pip install -e pixyz
    pip install torch torchvision

    View Slide

  61. 2. ࢖ͬͯΈΔ
    PixyzʹΑΔ࣮૷ͷجຊతͳྲྀΕ
    1. ෼෍Λఆٛ͢Δ
    • ෼෍ͷੵ΋෼෍ͱͯ͠ΈͳͤΔʂ
    2. ໨తؔ਺ɾϞσϧΛఆٛ͢Δ
    • Model APIɼLoss APIɼDistribution APIͷ3ͭͷॻ͖ํ͕ଘࡏ
    • Lossಉ࢜ͷ࢛ଇԋࢉ͕Ͱ͖Δʂ
    3. ֶश͢Δ
    • ModelΫϥεΛܧঝͨ͠৔߹͸ɼmodel.train()ͰOKʂ
    61

    View Slide

  62. ࠓ೔ͷνϡʔτϦΞϧࢿྉ
    ʮश͏ΑΓ׳ΕΑʯͱ͍͏͜ͱͰ༻ҙͯ͠Έ·ͨ͠
    • https://github.com/TMats/rlarch-pixyz-tutorial
    • 00: PixyzͰѻ͏֬཰෼෍ʹ͍ͭͯ
    • 01: Model APIͷVAEΫϥεΛ࢖ͬͯɼvanillaͳVAE[Kingma+ 2014]Λ࣮૷͢Δ
    • 02: Loss APIΛ࢖ͬͯɼΑΓෳࡶͳਂ૚ੜ੒ϞσϧΛ࣮૷͢Δ
    • M2Ϟσϧ[Kingma+ 2014]
    • ॳΊͯ͜ͷࢿྉΛར༻͢ΔͷͰɼࠓޙͷࢀߟͷͨΊʹɼ࣭໰ɾίϝϯτͳͲ͋Ε͹ͥͻ͓
    ئ͍͠·͢
    62

    View Slide

  63. Pixyzͷ͏Ε͍͠ͱ͜Ζ
    define-by-runͱdefine-and-runͷ͍͍ͱ͜ͲΓΛ͍ͯ͠Δ
    • ʮωοτϫʔΫ͸PyTorchͷΑ͏ʹॊೈʹධՁ͍͚ͨ͠ΕͲɼLoss͸ઌʹܾΊ͓͍ͯͯॻ
    ͍ͨ΋ͷ͕ਖ਼͍͔͠Ͳ͏͔ࣜΛݟͯ֬ೝ͍ͨ͠ʯͱ͍͏ؾ࣋ͪʹԠ͑ͯ͘ΕΔϥΠϒϥϦ
    • ωοτϫʔΫͱ֬཰෼෍Λॻ͘ϨΠϠΛ੾Γ཭͔ͨ͠ΒͰ͖ٕͨ
    • ݁Ռͱͯ͠ɼ࿦จͷॻ͔ΕͨLossͷࣜΛͦͷ··ࣸ͠औΔײ͡Ͱ࣮૷Ͱ͖Δ
    • ࣮ݧ͢Δͱ͖΋ɼωοτϫʔΫͷ໰୊ͱLossͷ໰୊Λ੾Γ཭࣮ͯ͠ݧͰ͖Δ
    Loss APIʹΑͬͯɼҟͳΔਂ૚ੜ੒ϞσϧΛಉҰͷϑϨʔϜϫʔΫͰࠞͥͯॻ͚Δ
    • ྫ) GANͱVAEͷLossͷ࿨͕औΕΔ
    63

    View Slide

  64. Pixyzoo
    Pixyzoo
    • PixyzΛར༻ͨ͠ਂ૚ੜ੒Ϟσϧͷ࣮૷ϨϙδτϦ
    • ΋ͪΖΜGan ZooΈ͍ͨͳͷΛҙ͍ࣝͯ͠Δ
    • https://github.com/masa-su/pixyzoo
    • ݱࡏɼGQNɾVIBɾFactorVAEͳͲ͕ೖ͍ͬͯΔ
    • ଓʑ௥Ճ͍ͨ͠
    • ϓϧϦΫେ׻ܴͰ͢
    • pixyzooϨϙδτϦΛforkͯ͠ϓϧϦΫΛૹ͍ͬͯͩ͘͞
    64

    View Slide

  65. ୈ2෦ͷAppendix
    65

    View Slide

  66. References
    [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep
    Generative Models. https://arxiv.org/abs/1406.5298
    [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based
    Representation Learning. https://arxiv.org/abs/1812.05069
    66

    View Slide

  67. ࣭ٙԠ౴ɾσΟεΧογϣϯ
    ٳܜ
    67

    View Slide

  68. ୈ3෦: ࠷ۙͷੈքϞσϧݚڀ঺հ: GQNɾTD-VAE
    68

    View Slide

  69. Generative Query Network (GQN)
    69

    View Slide

  70. GQNͱ͸ʁ
    /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH<&TMBNJ>
    • 4."MJ&TMBNJ %BOJMP+3F[FOEF FUBM 4DJFODF

    • ͪͳΈʹ4DJFODFຊࢽͷهࣄ͸࣮૷্શ͘ࢀߟʹͳΒͳ͍4VQQMFNFOUBMΛಡΈ·͠ΐ͏
    • ෳ਺ͷࢹ఺ʹ͓͚Δը૾Λ΋ͱʹɼผͷࢹ఺͔Βͷը૾Λੜ੒͢Δ(FOFSBUJWF2VFSZ
    /FUXPSL (2/
    ΛఏҊ

    IUUQTXXXZPVUVCFDPNXBUDI UJNF@DPOUJOVFW3#+'OH/2P
    • ΋͠ɼࢹ఺ͷҐஔʹΑΒͳ͍ঢ়ଶͷදݱ͕֫ಘͰ͖ΔͳΒ͏Ε͍͠ ண໨͍ͯ͠Δཧ༝

    • ڊେͳDPOEJUJPOBM7"&Λར༻
    70

    View Slide

  71. GQNͱ͸ʁ
    /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOH
    71
    https://deepmind.com/blog/neural-scene-representation-and-rendering/#gif-207

    View Slide

  72. GQNͷৄ͍͠ࢿྉ
    <%-ྠಡձ>(2/ͱؔ࿈ݚڀɼੈքϞσϧͱͷؔ܎ʹ͍ͭͯ
    • ౦େদඌݚླ໦͞Μ

    IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMHRO

    /FVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHͷղઆ
    • ౦େ૬ᖒݚ.ۚࢠ͞Μ

    IUUQTXXXTMJEFTIBSFOFU.BTBZB,BOFLPOFVSBMTDFOFSFQSFTFOUBUJPOBOESFOEFSJOHE

    <%-)BDLT>1Z5PSDI 1JYZ[ʹΑΔ(FOFSBUJWF2VFSZ/FUXPSLͷ࣮૷
    • ౦େদඌݚ#୩ޱ͞Μ

    IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMIBDLTQZUPSDIQJYZ[HFOFSBUJWFRVFSZ
    OFUXPSL
    72

    View Slide

  73. GQNͷ໰୊ઃఆ
    σʔληοτ
    • ɹݸͷγʔϯ ؀ڥ
    ͦΕͧΕʹର͠ɼɹݸͷ࠲ඪͱͦͷ࠲ඪ͔Βͷ

    3(#ը૾͔ΒͳΔର
    • ɹݸ໨ͷγʔϯͷɹݸ໨ͷ3(#ը૾
    • ɹݸ໨ͷγʔϯͷɹݸ໨ͷࢹ఺ WJFXQPJOU

    ໰୊ઃఆ
    • ɹݸͷ؍ଌʢจ຺ʣɹɹɹɹɹɹͱ೚ҙͷࢹ఺ʢΫΤϦʣ͕༩͑ΒΕͨ΋ͱͰ

    ରԠ͢Δ3(#ը૾ɹɹΛ༧ଌ͢Δɽ
    • ༗ݶͷ࣍ݩతͳ ը૾ͷ
    ؍ଌ͔Β͸ɼܾఆ࿦తʹ༧ଌ͢Δ͜ͱ͸Ͱ͖ͳ͍໰୊
    • จ຺Ͱ৚͚݅ͮͨ֬཰Ϟσϧ ਂ૚ੜ੒Ϟσϧ
    ͱͯ͠ղ͘
    73
    {(xk
    i
    , vk
    i
    )} (i ∈ {1,…, N}, k ∈ {1,…, K})
    N K
    vk
    i
    xk
    i
    i k
    i k
    M x1,…,M
    i
    , v1,…,M
    i
    vq
    i
    xq
    i

    View Slide

  74. લఏ: Conditional VAE
    Conditional VAE [Sohn+ 2015]
    • VAEʹ೚ҙͷ৘ใɹΛ৚͚݅ͮͨ(conditioned)Ϟσϧ
    • ࣄલ෼෍Λ ͱͯ͠ϞσϧԽ͢Δ͜ͱͰɼςετ࣌ʹ௚઀ɹΛਪ࿦Ͱ͖Δ


    ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹม෼Լք:ELBO (ෛͷLoss)
    • ࣄલ෼෍ͱͯ͠ɹʹґଘ͠ͳ͍෼෍ɹɹΛ࢖͏όʔδϣϯ΋͋Δ[Kingma+ 2014]
    • PixyzνϡʔτϦΞϧʹొ৔ͨ͠M2Ϟσϧ͸͜ͷύλʔϯ
    74
    !(#|%, ')
    '
    %
    #
    )(%|#, ')
    )(#|')
    y
    log $ %|'
    ≥ )
    * + %, ' log
    $ % +, ' $(+|')
    /(+|%, ')
    = )
    * + %, ' log $(%|+, ') − 23[/(+|%, ')||$(+|')]
    p(z|y) z
    ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ
    y p(z)

    View Slide

  75. Generative Query Network
    GQNͷϞσϧ
    • จ຺ɹɹɹɹɹɹɹɹɹͱΫΤϦɹɹͰ৚͚݅ͮͨConditional VAE
    • ɹ͸ܾఆ࿦తͳม׵(දݱωοτϫʔΫ)
    άϥϑΟΧϧϞσϧͱͯ͠ͷղऍɹɹɹɹม෼Լք:ELBO (ෛͷLoss)
    75
    r = f(x1,…,M
    i
    , v1,…,M
    i
    ) vq
    i
    f
    !"
    #
    $ # !", &", '
    ( !" #, &", '
    )(#|&", ')
    '
    &"
    จ຺
    ΫΤϦ
    જࡏม਺ log $ %&|(&, *
    ≥ ,
    & - %&, (&, * log
    . %& -, (&, * /(-|(&, *)
    2 - %&, (&, *
    = ,
    & - %&, (&, * log . %& -, (&, *
    − 56[2 - %&, (&, * ||/(-|(&, *)]
    ࣄલ෼෍
    Τϯίʔμ
    σίʔμ
    ΍Γ͍ͨ͜ͱ͸ର਺໬౓ͷ࠷େԽˠELBOͷ࠷େԽ

    View Slide

  76. Generative Query Network
    ม෼Լք (ෛͷଛࣦLoss)
    76
    !
    " # $", &", ' log + $" #, &", ' − -.[0 # $", &", ' ||2(#|&", ')]
    KL߲
    ࠶ߏ੒
    Τϯίʔμͱࣄલ෼෍͕ۙͮ͘Α͏ʹֶश

    →ςετ࣌ʹࣄલ෼෍Λ࢖͑͹

    ɹΫΤϦʹର͢Δਅͷը૾ɹ͕ͳͯ͘΋

    ɹจ຺ɹͱΫΤϦɹ͔ΒରԠ͢Δ

    ɹજࡏม਺ɹΛਪ࿦Ͱ͖ΔΑ͏ʹͳΔ͸ͣʂ
    Τϯίʔμ͸ɼจ຺ɹͱΫΤϦɹɼ

    ରԠ͢Δը૾ɹ͔Βɼજࡏม਺ɹΛਪ࿦ɽ
    જࡏม਺ɹ͔ΒɼΫΤϦʹରԠ͢Δը૾ɹ͕

    ࠶ߏ੒͞ΕΔΑ͏ʹֶश

    →જࡏม਺ɹ͸ͦͷγʔϯશମΛද͢Α͏ͳ

    ɹԿΒ͔ͷදݱֶ͕श͞ΕΔ͸ͣʂ
    xq
    vq
    r
    z
    r vq
    xq z
    z xq
    z

    View Slide

  77. ΞʔΩςΫνϟత޻෉: දݱωοτϫʔΫ
    දݱωοτϫʔΫΛ༻͍ͯɼɹݸͷ؍ଌɹɹɹɹɹɹΛ1ͭͷจ຺ɹʹཁ໿͢Δ
    • ը૾ͱ࠲ඪͷϖΞʹରͯͦ͠ΕͧΕม׵ͨ͠΋ͷʹؔͯ͠ฏۉΛͱΔ
    ֤ࢹ఺ʹ͓͚ΔฏۉΛͱΔ͜ͱͰɼ

    ࢹ఺ͷॱ൪ʹґଘ͠ͳ͍(permutation invariant)දݱΛಘΔ
    • จ຺ʹ༻͍Δࢹ఺ͷ਺Λࣗ༝ʹઃఆͰ͖Δ
    • 3//ͰϞσϧԽ͢Δͱॱং͕ؔ܎͢Δ
    ࿨(ฏۉ)Λऔ͍͍ͬͯͷʁͱ͍͏ٙ໰
    • ࠷ۙɼҰԠٞ࿦͞Ε͍ͯΔΒ͍͠[Wagstaff+ 2019]
    77
    x1,…,M
    i
    , v1,…,M
    i
    M r
    rk
    = ψ (xk, vk) r =
    M

    k=1
    rk

    View Slide

  78. ΞʔΩςΫνϟత޻෉: DRAWͷར༻
    DRAW [Gregor+ 2015]
    • VAEͷ͓͚Δજࡏม਺ɹ΁ͷਪ࿦ΛRNNΛ༻͍ͯෳ਺ճʹ෼͚ͯɼ

    ࣗݾճؼతʹߦ͏͜ͱͰɼϞσϧͷදݱྗΛߴΊΔ
    • ͜ͷͱ͖ͷELBO͸୩ޱ͞ΜͷࢿྉͰಋग़͞Ε͍ͯΔ(p11-13)

    https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-query-
    network-126329901
    • ݁࿦ͱͯ͠͸ɼ࠶ߏ੒ͷ໬౓ͱ֤εςοϓͷKLͷ࿨ʹͳΔ
    • ࣄલ෼෍ͱΤϯίʔμͷ྆ํʹར༻
    78
    z
    q(z|x) =
    L

    l=1
    ql (zl
    |x, zπθ
    (z|vq, r) =
    L

    l=1
    πθl
    (zl
    |vq, r, zqϕ
    (z|xq, vq, r) =
    L

    l=1
    qϕl
    (zl
    |xq, vq, r, zࣄલ෼෍
    Τϯίʔμ

    View Slide

  79. ࿦จதͷ࣮ݧ݁Ռ
    Roomσʔληοτ
    • ϥϯμϜͳ࢛͍֯෦԰ʹϥϯμϜͳ਺ʢ1~3ʣͷ༷ʑͳ෺ମΛ഑ஔ
    • นͷςΫενϟ: 5छྨ চͷςΫενϟ: 3छྨ ෺ମͷܗঢ়: 7छྨ
    • αΠζɼҐஔɼ৭͸ϥϯμϜɽϥΠτ΋ϥϯμϜ
    • 2ສछྨͷγʔϯΛϨϯμϦϯά
    • σʔληοτ(͚ͩ)͸ެ։͞Ε͍ͯΔ
    • Roomͷଞʹ΋਺छྨͷσʔληοτ͕ଘࡏ

    https://github.com/deepmind/gqn-datasets

    • ৽͍͠ࢹ఺Ͱͷը૾͕༧ଌͰ͖͍ͯΔ͜ͱ͕

    ఆੑతʹΘ͔Δ(ӈਤ)
    79

    View Slide

  80. ࿦จதͷ࣮ݧ݁Ռ
    Scene Algebra
    • ֶशͨ͠ωοτϫʔΫΛ༻͍ͯɼจ຺ ্Ͱͷ଍͠ࢉɾҾ͖ࢉΛߦ͏
    • word2vecͷΑ͏ʹ༧ଌ݁Ռ͕ԋࢉ௨Γʹͳ͓ͬͯΓɼߏ੒తͳදݱʹͳ͍ͬͯΔ
    • Ͳ͜·Ͱߏ੒తͳͷ͔͸ٙ໰͕࢒Δ͕…
    • σʔληοτશମͰมԽͷόϦΤʔγϣϯ͕ൺֱత୯७ͳͷͰͰ͖͍ͯΔͷs͔΋
    80
    r

    View Slide

  81. PixyzʹΑΔ࣮૷
    Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/GQN
    • ౦େদඌݚB4 ୩ޱ͞ΜʹΑΔ࣮૷
    • Eslami͞Μ(1st Author)͔ΒϋΠύϥ௚఻
    • DeepMind͸جຊతʹ࣮૷Λެ։͍ͯ͠ͳ͍ͷͰɼ

    ͓ͦΒ͘࠷΋஧࣮ͳ࣮૷ͳ͸ͣ
    • ࿦จͰ͸ɼK80(24GB)4ຕར༻͍ͯ͠Δͱͷ͜ͱ
    • खݩͰ֬ೝͯ͠ɼTitanX(12GB)4ຕʹΪϦΪϦ৐Δ͙Β͍
    • ύϥϝʔλ਺ݮΒͯ͠΋ͦΜͳʹӨڹͳ͍
    • චऀ(Eslami͞ΜɼRezende͞Μ)ʹ΋঺հͯ͠΋Β͍·ͨ͠
    • PixyzͷDeepMindσϏϡʔ(?)
    81
    https://twitter.com/arkitus/status/1072845916850274304

    View Slide

  82. PixyzʹΑΔ࣮૷
    ෼෍ͷఆٛLossɾϞσϧͷఆٛ
    82
    ࣄલ෼෍ɾσίʔμ

    ɹgeneraton.py
    Τϯίʔμ
    ɹinference.py

    View Slide

  83. PixyzʹΑΔ࣮૷
    LossɾϞσϧͷఆٛ
    83
    ɹmodel.py
    ෼෍ͷΠϯελϯε
    DRAW

    View Slide

  84. PixyzʹΑΔ࣮૷
    ݱࡏɼDRAWͷ෦෼ͰforϧʔϓͷதͰ1εςοϓ͝ͱʹlossΛධՁ͍ͯ͠Δ
    Q. ΋ͬͱ៉ྷʹ͔͚ͳ͍ͷʁ
    A. ࣍ͷόʔδϣϯ(0.0.5)ͰࣗݾճؼϞσϧʹରԠ͢Δ༧ఆ
    • ۙ೔தʹmasterʹϚʔδ༧ఆͩͦ͏Ͱ͢
    • ͔ͳΓ؆ܿʹͳΔ͸ͣ(࣍ʹ঺հ͢ΔTD-VAEͰ͸ར༻͍ͯ͠Δ)
    ͱ͸͍͑ɼ

    ωοτϫʔΫͱ֬཰෼෍ͷ࣮૷͕෼཭͞Ε͓ͯΓɼPixyzͷྑ͕͞ੜ͖͍ͯΔ
    84

    View Slide

  85. σΟεΧογϣϯ
    ݁ہજࡏදݱͱͯ͠Կ͕֫ಘ͞Ε͍ͯΔͷ͔ʁ(ঢ়ଶදݱֶशత؍఺)
    • ෺ମͷදݱɼγʔϯͦͷ΋ͷͷදݱɼࢹ఺ؒͷؔ܎ͱ؍ଌͷؔ܎
    • ͜ΕΒΛͲ͏΍ͬͯऔΓग़ͯ͠ར༻͢Δͷ͔ʁ
    • ࣮ੈքʹసҠͤ͞ΔͱͲ͏ͳΔ͔ʁ
    • ݱ࣮ੈքͰࡱӨͨ͠ը૾Λ࢖ͬͯGQNΛֶशͤ͞ΔϓϩδΣΫτ

    https://github.com/brettgohre/still_life_rendering_gqn
    Ͳ͏΍ͬͯΤʔδΣϯτͷߦಈͷܾఆ(ڧԽֶश)ʹ࢖ͬͯΏ͔͘ʁ
    • ͲΜͳΞϓϦέʔγϣϯ͕͋ΓಘΔ͔ʁ
    ϝλֶशͷจ຺
    • λεΫΛͲ͏ఆٛ͢Δͷ͔ɼԿͰ৚͚݅ͮΔͷ͔͕໰୊ʹͳ͖͍ͬͯͯΔ 85

    View Slide

  86. Temporal Difference Variational Auto-Encoder (TD-VAE)
    86

    View Slide

  87. TD-VAEͱ͸ʁ
    5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS<(SFHPS>
    • ,BSPM(SFHPS (FPSHF1BQBNBLBSJPT FUBM *$-30SBM

    • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧΛఏҊͨ͠
    • ܥྻΛѻ͏ਂ૚ੜ੒ϞσϧͰ͸ɼεςοϓ͝ͱʹਪ࿦Λߦ͏͜ͱ͕ओྲྀͰ͋ͬͨ

    ୈ෦ͷॱϞσϧ
    ͕ɼ5%7"&͸೚ҙͷεςοϓ·Ͱδϟϯϓͯ͠ਪ࿦Ͱ͖Δ
    • ͜ΕΛ࢖ͬͯ࣌ܥྻͷந৅ԽʹऔΓ૊Ή͜ͱ͕Ͱ͖ͳ͍ͩΖ͏͔ ண໨͍ͯ͠Δཧ༝

    • 3//Λ༻͍ͨʮ৴೦ঢ়ଶʯͷಋೖͱ&-#0ͷ

    ෼ղͷ࢓ํ͕ΧΪ
    87

    View Slide

  88. TD-VAEͷৄ͍͠ࢿྉ
    (2/ʹൺ΂ͯ͋Μ·Γͳ͍ʜ
    <%-ྠಡձ>5FNQPSBM%JGGFSFODF7BSJBUJPOBM"VUP&ODPEFS
    • ౦େদඌݚླ໦͞Μ

    IUUQTXXXTMJEFTIBSFOFU%FFQ-FBSOJOH+1EMUFNQPSBMEJGGFSFODFWBSJBUJPOBMBVUPFODPEFS
    88

    View Slide

  89. ͲΜͳঢ়ଶදݱ͕޷·͍͔͠ʁ
    ࿦จதͰݴٴ͞Ε͍ͯΔɼΤʔδΣϯτͷঢ়ଶදݱ͕࣋ͭ΂͖ੑ࣭
    σʔλͷந৅తͳঢ়ଶදݱΛֶश͠ɼ؍ଌͰ͸ͳ͘ঢ়ଶͷϨϕϧͰ༧ଌͰ͖Δ͜ͱ
    ͋Δ࣌ؒ·Ͱͷશͯͷ؍ଌ͕༩͑ΒΕͨ΋ͱͰɼ

    ঢ়ଶͷϑΟϧλϦϯά෼෍Λܾఆతʹίʔυͨ͠৴೦ঢ়ଶ CFMJFGTUBUF
    ΛֶशͰ͖Δ͜ͱ
    • ৴೦ঢ়ଶ͸ΤʔδΣϯτ͕࣋ͭੈքʹؔ͢Δঢ়ଶͷશͯͷ৘ใͱɼ࠷దʹߦಈ͢Δํ๏ΛؚΜ
    Ͱ͍Δ
    ਺εςοϓઌͷδϟϯϓͨ͠ະདྷΛ༧ଌ͢Δ͜ͱɽ

    ࣌ܥྻશͯΛޡࠩٯ఻೻ͤͣʹ࣌ؒతʹ཭Εͨ࣌఺͔ΒֶशͰ͖ΔΑ͏ʹ͢Δ͜ͱʹΑͬ
    ͯɼ࣌ܥྻతͳந৅ԽΛߦΘΕΔ͜ͱ
    ͜ΕΒͷੑ࣭Λຬͨ͢Ϟσϧͱͯ͠ɼ5%7"&ΛఏҊ
    89

    View Slide

  90. લఏ: ࣗݾճؼϞσϧ
    ࣗݾճؼϞσϧ (Autoregressive Model)
    • ܥྻσʔλɹɹɹɹɹɹɹΛϞσϦϯά͢Δํ๏
    • νΣʔϯϧʔϧΛ༻͍ͯɼ໬౓Λ৚݅෇͖෼෍ͷੵʹ෼ղ (ࣜ͸྆ลର਺Λͱͬͨ)
    • RNNΛ༻͍࣮ͯ૷Ͱ͖Δ
    • ໰୊఺
    • ؍ଌۭؒͰ͔͠༧ଌ͠ͳ͍ͨΊɼσʔλͷѹॖͨ͠දݱΛֶश͠ͳ͍
    • ֤εςοϓͰσίʔυɾΤϯίʔυΛ͢ΔͨΊܭࢉྔ͕େ͖͍
    • ܇࿅࣌ʹ͸࣍ͷεςοϓͷσʔλ͕ೖͬͯ͘Δ͕(ڭࢣڧ੍)ɼςετ࣌ʹ͸ࣗ਎ͷ༧ଌΛೖྗ
    ͢ΔͨΊෆ҆ఆ
    90
    x = (x1
    , …, xT)
    log p (x1
    , …, xT) = ∑
    t
    log p (xt
    |x1
    , …, xt−1)
    ht
    = f (ht−1
    , xt)

    View Slide

  91. લఏ: ঢ়ଶۭؒϞσϧ
    ঢ়ଶۭؒϞσϧ (State-space Model)
    • ܥྻσʔλɹɹɹɹɹɹ Λજࡏม਺(ঢ়ଶ) ɹ Λ༻͍ͯϞσϦϯά͢Δํ๏
    • ɹͱɹͷಉ࣌෼෍:
    • Τϯίʔμ:
    • ɹͷೖྗͱͯ͠ɹ·ͰͷܥྻɹɹɹɹɹΛ༻͍Δ৔߹ɿϑΟϧλϦϯά

    ɹɹɹɹɹɹɹܥྻશମɹΛ༻͍Δ৔߹ɿεϜʔδϯά
    • ม෼Լք:ELBO (ෛͷLoss)
    • ঢ়ଶؒͰͷભҠΛϞσϧԽ͢Δ
    • ςετ࣌ʹ֤εςοϓͰͷσίʔυɾΤϯίʔυ͕ඞཁͳ͍ 91
    x = (x1
    , …, xT) z = (z1
    , …, zT)
    x z p(x, z) = ∏
    t
    p (zt
    |zt−1) p (xt
    |zt)
    q(z|x) = ∏
    t
    q (zt
    |zt−1
    , ϕt
    (x))
    log p(x) ≥ z∼q(z|x) [∑
    t
    log p (xt
    |zt) + log p (zt
    |zt−1) − log q (zt
    |zt−1
    , ϕt
    (x))
    ]
    σίʔμ
    ঢ়ଶભҠ
    ϕt
    t
    x
    (x1
    , …, xt)
    !"#$
    %"#$
    !"
    %"

    View Slide

  92. ϑΟϧλϦϯά෼෍ͷಋೖ
    ঢ়ଶۭؒϞσϧͰ͸ɼঢ়ଶɹΛಘΔͨΊʹલͷεςοϓͷঢ়ଶɹ ͕ඞཁ
    • ͦͷͨΊʹ͸࣍ʑʹɹɹɹɹɹɹɹͷϦαϯϓϦϯά͕ඞཁ
    ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖ
    • ؍ଌͷܥྻɹɹɹɹͷΈʹґଘ͢ΔΑ͏ʹ͢Δ
    • POMDPͷڧԽֶशʹ͓͚Δ৴೦ঢ়ଶʹ૬౰
    92
    zt
    zt−1
    !"#$
    %"#$
    !"
    %"
    zt−1
    , zt−2
    , …, z1
    p(zt
    |x1
    , …, xt
    )
    (x1
    , …, xt)
    !"#$
    !"
    %"#$
    %"
    & %"
    !$
    , . . , !"

    View Slide

  93. ϑΟϧλϦϯά෼෍ͷಋೖ
    ϑΟϧλϦϯά෼෍ɹɹɹɹɹɹΛಋೖͯ͠ELBOΛಋग़
    • ϑΟϧλϦϯά෼෍ʹΑͬͯɼજࡏม਺͸ ͷ2͚ͭͩͰදݱͰ͖Δ
    93
    log(x) = ∑
    t
    log p(xt
    |xx)
    = ∑
    t
    log

    p(xt
    |zt
    )p(zt
    |xx)dzt
    ≥ ∑
    t
    q(zt
    ,zt−1
    |x≤t
    ) [
    log
    p(xt
    |zt
    )p(zt
    |x)
    q(zt
    , zt−1
    |x≤t
    ) ]
    = ∑
    t
    q(zt
    |x≤t
    )q(zt−1
    |zt
    ,x≤t
    ) [log p (xt
    |zt) + log p (zt−1
    |x|zt−1)
    −log q (zt
    |x≤t) − log q (zt−1
    |zt
    , x≤t)]
    p(zt
    |x1
    , …, xt
    )
    (zt−1
    , zt
    )
    ঢ়ଶભҠ
    ϑΟϧλϦϯά෼෍
    ϑΟϧλϦϯά෼෍ Τϯίʔμ
    σίʔμ
    Jensenͷෆ౳ࣜΑΓ
    !"#$
    !"
    %"#$
    %"
    Τϯίʔμ͸աڈʹ
    ޲͔͏ਪ࿦ʹͳ͍ͬͯΔ

    View Slide

  94. ϑΟϧλϦϯά෼෍ͷ࣮૷
    TD-VAEͰ͸ϑΟϧλϦϯά෼෍ΛRNNΛ༻͍࣮ͯ૷͍ͯ͠Δ
    • ৴೦ঢ়ଶΛද͢ม਺Λɹɹͱͯ͠ɼ֤εςοϓͷ৴೦ঢ়ଶΛɹɹɹɹɹɹͱϞσϧԽ
    • ৴೦ঢ়ଶɹ͸աڈͷ؍ଌͷܥྻ ͷ৘ใΛؚΜͰ͍Δͱߟ͑ΒΕΔ
    • ͜ͷͱ͖ɼม෼Լք:ELBO (ෛͷLoss)͸
    94
    bt
    bt
    = f (bt−1
    , xt)
    bt (x1
    , …, xt)
    pB(zt
    |bt)q(zt−1
    |zt
    , bt−1
    , bt) [log p (xt
    |zt) + log pB (zt−1
    |bt−1) + log p (zt
    |zt−1)
    −log pB (zt
    |bt) − log q (zt−1
    |zt
    , bt−1
    , bt)]
    ঢ়ଶભҠ
    ϑΟϧλϦϯά෼෍
    ϑΟϧλϦϯά෼෍ Τϯίʔμ
    σίʔμ
    !"#$
    %"#$
    !"
    %"
    &"#$
    &"

    View Slide

  95. ࣌ؒεςοϓͷδϟϯϓ
    ࠓ·Ͱͷٞ࿦Λ1εςοϓͷભҠ͔Βɼ਺εςοϓͷભҠʹ֦ு͢Δ
    • දه͕มΘΔ͚ͩɼม෼Լք:ELBO(ෛͷLoss)͸
    • ֶश࣌͸ɼδϟϯϓ͢Δεςοϓ਺ ΛɹɹɹͷൣғͰαϯϓϦϯάֶͯ͠श
    • ঢ়ଶભҠͷೖྗʹɹΛՃ͑Δ
    • ςετ࣌͸ɼ

    ɹˠϑΟϧλϦϯά෼෍→ →ঢ়ଶભҠ→ →σίʔμ→ ͱͯ͠༧ଌ͕Ͱ͖Δ
    95
    pB
    (zt2
    |bt2
    )q(zt1
    |zt2
    ,bt1
    ,bt2
    ) [log p (xt2
    |zt2
    ) + log pB (zt1
    |bt1
    ) + log p (zt2
    |zt1
    )
    −log pB (zt2
    |bt2
    ) − log q (zt1
    |zt2
    , bt1
    , bt2
    )]
    ঢ়ଶભҠ
    ϑΟϧλϦϯά෼෍
    ϑΟϧλϦϯά෼෍ Τϯίʔμ
    σίʔμ
    xt1
    zt1
    zt2
    ̂
    xt2
    δ = t2
    − t1
    [1,D]
    δ p(zt2
    |z
    t
    1
    , δ)

    View Slide

  96. TD-VAEͷֶश
    TD-VAEͷม෼Լք: ELBO (ෛͷLoss)
    • ᶃϑΟϧλϦϯά෼෍ɹɹɹ ͔ΒɹΛαϯϓϧ
    • ᶄͦΕΛ࢖ͬͯɼΤϯίʔμɹɹɹɹ ͔ΒɹɹΛαϯϓϧ
    • ୈ2߲ͱୈ4߲͸ɼΤϯίʔμͱϑΟϧλϦϯά෼෍ͷKLμΠόʔδΣϯεʹͳΔ
    96
    zt2
    ∼pB
    (zt2
    |bt2
    ),zt1
    ∼q(zt1
    |zt2
    ,bt1
    ,bt2
    ) [log p (xt2
    |zt2
    ) + log pB (zt1
    |bt1
    ) + log p (zt2
    |zt1
    )
    −log pB (zt2
    |bt2
    ) − log q (zt1
    |zt2
    , bt1
    , bt2
    )]
    ᶃϑΟϧλϦϯά෼෍͔Β

    ɹαϯϓϧ
    ᶄΤϯίʔμ͔Β

    ɹαϯϓϧ
    pB
    (zt2
    |bt2
    ) zt2
    zt1
    q(zt1
    |zt2
    , bt1
    , bt2
    )
    KL [q(zt1
    |zt2
    , bt1
    , bt2
    )||pB
    (zt1
    |bt1
    )]

    View Slide

  97. PixyzʹΑΔ࣮૷
    Pixyzooͷதʹଘࡏ https://github.com/masa-su/pixyzoo/tree/master/TD-VAE
    • ModelΫϥεΛܧঝ
    • ࣗݾճؼ༻ͷ IterativeLoss Λ༻͍͍ͯΔ
    • Pixyz v0.0.5Ҏ্͕ඞཁ
    • ΤϯίʔμपΓͷLoss͕

    ਧͬඈͿࣗମ͕ى͖͍ͯΔͷͰ

    ϋΠύϥνϡʔχϯά͕ඞཁ͔ʁ
    • GQNͷͱ͖΋େมͩͬͨ
    97
    ෼෍
    Loss

    View Slide

  98. ࿦จதͷ࣮ݧ
    ෦෼؍ଌMiniPacman
    • 1εςοϓ͝ͱʹ༧ଌ͢ΔϞσϧͰɼఏҊख๏ͷΤϯίʔμͱଞͷΤϯίʔμΛൺֱ

    ɹɹTD-VAEͷΤϯίʔμ

    ɹɹfilteringϞσϧͷΤϯίʔμ

    ɹɹmean-fieldϞσϧͷΤϯίʔμ
    • ELBOͱෛͷର਺໬౓ʹؔͯ͠TD-VAEͷΤϯίʔμ͕ྑ͍͜ͱΛࣔ͢
    98
    q (zt−1
    |zt
    , bt−1
    , bt)
    q (zt
    |zt−1
    , bt)
    q (zt
    |bt)
    !"#$
    %"#$
    !"
    %"
    &"#$
    &"

    View Slide

  99. ࿦จதͷ࣮ݧ
    MovingMNIST
    • ਺ࣈ͕ࠨӈʹಈ͘MNISTͰɼεςοϓΛεςοϓΛඈ͹ͨ͠༧ଌΛͤ͞Δ࣮ݧ
    • 1͔Β4εςοϓͷؒͰඈ͹ֶͯ͠श
    • ͳΜ͔஌ͬͯΔMovingMNIST͡Όͳ͍Α͏ͳؾ͕͢Δ…..
    • ଞʹ΋DeepMind LabΛ༻͍࣮ͨݧΛ͍ͯ͠Δ
    • ΞʔΩςΫνϟʹConvDRAW[Gregor+ 2016]Λ࢖ͬͨͱͷ͜ͱ
    99

    View Slide

  100. σΟεΧογϣϯ
    TDͷҙຯ͢Δͱ͜Ζ
    • Τϯίʔμ͕աڈ΁ͷਪ࿦ʹͳ͍ͬͯΔ෦෼͕Temporal DIfferenceͬΆ͍
    • ҰԠɼ4.3અʹهड़͸͋Δ
    ࣌ؒํ޲ͷந৅Խ
    • ࣌ܥྻͷந৅Խ͸ڧԽֶशʹͱͬͯେ͖ͳ՝୊
    • TD-VAEͰ͸ɼ௚઀తʹ͸ߦಈΛѻ͍ͬͯͳ͍
    • τϧΫϨϕϧͷ੍ޚͷ࣌ܥྻ͔Βɼ࣌ܥྻతʹந৅Խ͞ΕͨߦಈϓϦϛςΟϒ͕࡞ΒΕͦͷϓϦ
    ϛςΟϒ্Ͱ୳ࡧͰ͖Δͱ୳ࡧޮ཰తʹ΋ྑͦ͞͏ͩࣗ͠વͳؾ΋͢Δ
    TD-VAEͷRNNʹશͯΛୗ͍ͯ͠Δײ
    • RNNͷදݱྗͷ໰୊ 100

    View Slide

  101. ୈ3෦ͷAppendix
    101

    View Slide

  102. References
    [Eslami+ 2018] Eslami, S. M. Ali, Danilo Jimenez Rezende, Frédéric Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo,
    Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol
    Vinyals, Dan Rosenbaum, Neil C. Rabinowitz, Helen King, Chloe Hillier, Matthew M Botvinick, Daan Wierstra, Koray
    Kavukcuoglu and Demis Hassabis. “Neural scene representation and rendering.” Science 360 (2018): 1204-1210. http://
    science.sciencemag.org/content/360/6394/1204
    [Gregor+ 2015] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra. DRAW: A Recurrent Neural
    Network For Image Generation. https://arxiv.org/abs/1502.04623
    [Gregor+ 2016] Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra. Towards Conceptual
    Compression. https://arxiv.org/abs/1604.08772
    [Gregor+ 2019] Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber. Temporal Difference
    Variational Auto-Encoder. https://openreview.net/forum?id=S1x4ghC9tQ
    [Kingma+ 2014] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. Semi-Supervised Learning with Deep
    Generative Models. https://arxiv.org/abs/1406.5298
    [Sohn+ 2015] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional
    generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 3483–3491, 2015. https://papers.nips.cc/
    paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models
    [Tschannen+ 2018] Michael Tschannen, Olivier Bachem, Mario Lucic (2018). Recent Advances in Autoencoder-Based
    Representation Learning. https://arxiv.org/abs/1812.05069
    [Wagstaff+ 2019] Edward Wagstaff, Fabian B. Fuchs, Martin Engelcke, Ingmar Posner, Michael Osborne. On the Limitations of
    Representing Functions on Sets. https://arxiv.org/abs/1901.09006
    102

    View Slide

  103. ࣭ٙԠ౴ɾσΟεΧογϣϯ
    103

    View Slide

  104. ँࣙ
    ຊൃදʹ͋ͨΓɼಛʹҎԼͷํʑͷ͝ڠྗΛ͍͖ͨͩ·ͨ͠
    ླ໦խେ͞Μ
    • (2/ɾ5%7"&ͷྠಡࢿྉͷఏڙ
    • 1JYZ[։ൃɼ࣮૷ͷ૬ஊ
    ୩ޱঘฏ͞Μ
    • (2/࣮૷
    େม͋Γ͕ͱ͏͍͟͝·ͨ͠
    104

    View Slide