Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Review about MERLIN (RL Architecture Seminar)

Review about MERLIN (RL Architecture Seminar)

Title: Unsupervised Predictive Memory in a Goal-Directed Agent
https://arxiv.org/abs/1803.10760

Yuma Kajihara

August 21, 2018
Tweet

Other Decks in Research

Transcript

  1. ൃද಺༰ʹ͍ͭͯ • ࿦จλΠτϧɿʮUnsupervised Predictive Memory in a Goal-Directed Agentʯ ʢarXiv:1803.10760ʣɼ2018/03/28.

    • ஶऀɿGreg Wayne, et al.ʢDeepmindʣ
 Greg WayneɿNeural Turing Machine(NTM)ɼDifferentiable Neural Computers(DNC)ͷڞஶऀɽਆܦՊֶͰPh. DΛऔ͍ͬͯΔʁ
 ʢhttp://columbia.academia.edu/GregWayneʣ • ಺༰ɿ
 - ܦݧΛ௕ظؒอ࣋͢ΔͨΊͷػߏʢ֎෦ϝϞϦΛ࢖༻ʣ
 - ෳࡶͳ؍ଌ৘ใͷద੾ͳજࡏදݱ΁ͷม׵ʴํࡦΛEnd to Endʹֶश͢Δɽ
 - 3Dۭؒ୳ࡧͰɼLSTMϕʔεͷϞσϧΛѹ౗ɽ • ࿦จ಺Ͱɼೝ஌ਆܦՊֶʹؔ͢Δ஌ݟΛଟ࣋ͪ͘ग़͍ͯ͠·͢ɽϞσϧΛཧղ͢ Δࡍʹ΋ɼͦΕΒΛԡ͍͑ͯ͘͜͞ͱ͕ඞཁʹͳΓ·͢ɽ͜ͷൃදͰ΋ɼͰ͖Δ ͚ͩଟ͘ͷਆܦՊֶͷݚڀΛࢀর͍ͯ͜͠͏ͱࢥ͍·͢ɽʢڧԽֶशʹؔͯ͠͸ ·ͩ·ͩૉਓͳͷͰɼޡΓ͕͋Ε͹͝ࢦఠ͍͚ͨͩΔͱ޾͍Ͱ͢ʣ 2
  2. ڧԽֶशͷىݯ • ΋͏Ұͭͷىݯʢ਺ֶతͳʣɿ࠷ద੍ޚɽ - Markov Decision ProcessʢMDPʣʢ1950sʣɿঢ়ଶભҠ͕Ϛϧίϑੑʢ࣍ͷ ߦಈ͕ࠓͷঢ়ଶʹΑܾͬͯ·ΔʣΛຬͨ͢ಈతγεςϜͷ֬཰Ϟσϧ - BellmanํఔࣜʢBellman,

    1957ʣɿMDPʹ͓͍ͯɼ࠷దͳߦಈଇʢpolicyʣ ͷݩ੒Γཱͭɼঢ়ଶՁ஋ʹؔ͢Δ࠶ؼతͳํఔࣜ • ෆ࣮֬ͳ؀ڥʹ͓͚ΔɼαϯϓϦϯάΛओମͱͨ֬͠཰తͳγεςϜ੍ޚͷख๏ ͱͯ͠ɼڧԽֶश͕੒Γཱ͖ͬͯͨɽ - TD(Temporal Difference)ֶशʢSutton, 1984ʣɿใुͷظ଴஋ͷζϨΛޡࠩ৴ ߸ͱͯ͠ɼ1εςοϓ͝ͱʹঢ়ଶՁ஋ؔ਺V(x)Λߋ৽ֶ͍ͯ͘͠शํ๏ɽ - QֶशʢWatkinsɼ1989ʣɿ֤ঢ়ଶʹ͓͍ͯɺՄೳͳߦಈͷதͰ࠷΋ߦಈධՁ ؔ਺Q(x)ͷ஋͕ߴ͍ߦಈΛͱΔΑ͏ʹֶशΛߦ͏ํ๏ɽ • ͜ͷ࣌఺Ͱ͸ɼ͋͘·ͰϒϥοΫϘοΫεԽͨ͠γεςϜ಺ʹ͓͚Δɼߦಈଇͷ ֶशํ๏Ͱ͋Δɽʢ೴಺ͷ۩ମతͳϝΧχζϜ͸Կ΋ݴٴ͍ͯ͠ͳ͍ʣ 7
  3. جఈ֩৽ൽ࣭ϧʔϓ • ओʹ4ͭͷϧʔϓ͕͋Δͱߟ͑ΒΕ͍ͯͯɼͦΕΒ͸ฒྻతʹϧʔϓ͍ͯ͠ΔΈ ͍ͨɽ - ӡಈܥϧʔϓʢmotor loopʣɿے೑ͷ੍ޚ - લ಄લ໺ܥϧʔϓʢprefrontal loopʣɿߦಈܭը

    - ؟ٿӡಈܥϧʔϓʢoculomotor loopʣɿαοέʔυ؟ٿӡಈͷ੍ޚ - ลԑܥϧʔϓʢlimbic loopʣɿߦಈͷಈػ෇͚ɼ৘ಈ 11 ਤ͸ҎԼΑΓഈआ IUUQXXXBDUJPGPSNBOFULPLJLBXB&WPMVUJPOBM@BTQFDUT&WPMVUJPOBM@BTQFDUTIUNM ͜ͷதͰ͸Ұ൪*OUSJTUJDʢຊೳతʣ
  4. ఏҊϞσϧ .&3-*/ • Memory, Reinforcement Learning and Inference (MERLIN)ͷ໊ͷ௨Γɼओʹ3ͭ ͷ෦෼͔Β੒Δɽ

    - ݱࡏͷ؍ଌ৘ใΛજࡏۭؒʹຒΊࠐΈɼ֎෦ϝϞϦΛ׆༻ͯ͠ɼલεςοϓ ͷࣗ෼ͷߦಈͱ࣍ͷใुΛ༧ଌ͢Δਪ࿦෦෼ʢMemory Based Predictorʣ - Τϯίʔυ͞Εͨ؍ଌ৘ใΛอଘ͢Δɼ֎෦ϝϞϦػߏʢMemoryʣ - Τϯίʔυ͞Εͨݱࡏͷ؍ଌ৘ใͱաڈͷ৘ใ͔ΒɼߦಈΛબ୒͢ΔPolicy LSTM 16 Environment Memory Based Predictor Memory Policy LSTM .&3-*/ !(#) , &(#) !(#$%)
  5. ϝϞϦػߏʢ/5.ͷ΋ͷͱ΄΅ಉ͡ʣ • m × nͷߦྻMʹɼ஋Λ֨ೲ͢ΔɽʢΠϝʔδͱͯ͠͸ɼmݸͷΞυϨε͕͋Γɼ ͦΕͧΕʹ௕͞nͷϕΫτϧ஋Λ֨ೲͰ͖Δɽʣ • ॻ͖ࠐΈɿॻ͖ࠐΉϕΫτϧmʢ௕͞͸n/2ʣʹؔͯ͠ɼҎԼͷܭࢉͰϝϞϦߦ ྻM͕ߋ৽͞ΕΔɽ -

    ɹͷࢉग़͸ɼޙड़ɽ • ಡΈࠐΈɿΩʔkͱݱࡏͷϝϞϦߦྻMʹؔͯ͠ɼҎԼͷܭࢉΑΓϕΫτϧm͕ ಡΈࠐ·ΕΔɽʢ͍ΘΏΔ Soft Attentionʣ 17 ! = #$% ! " = $%& '() ∑ $%& '() +́ - !" = $ % & ",( %(& ",( !" = !"$% + '" () *, 0 - + '" )." 0, * - !" #$" = &!"'( #$" + 1 − & !"'( ,# !" #$
  6. .FNPSZ#BTFE1SFEJDUPS • جຊతʹ͸Conditional Variational AutoEncoderɽ 18 !" !" !" #

    Policy LSTM Memory &ODPEJOHͨ͠[Λߦಈܾఆʹ༻͍Δɽ &ODPEJOHͨ͠[Λ֎෦ϝϞϦʹ֨ೲ !" , … , !%&" , '" , … , '%&" 1SJPSϞσϧ !" ! "# |"% … "#'% ; )% , … , )# ! "# |"% … "#'% ; )% , … , )#'% ! "# |%# !" !" !" !"#$ ɿը૾ ɿ଎౓ ɿςΩετ ɿલճใु !" = $" , &" , '" , (")* , +")* !"#$ ɿલճߦಈ
  7. .FNPSZ#BTFE1SFEJDUPS • جຊతʹ͸Conditional Variational AutoEncoderɽ • ࣍ʹ໯͑Δใु΋༧ଌ͢ΔɽʢCriticʣ 19 !" !"

    !" # Policy LSTM Memory &ODPEJOHͨ͠[Λߦಈܾఆʹ༻͍Δɽ &ODPEJOHͨ͠[Λ֎෦ϝϞϦʹ֨ೲ !" , … , !%&" , '" , … , '%&" 1SJPSϞσϧ !" !" # ɼ ! "# , %# |'# ! "# |"% … "#'% ; )% , … , )# ! "# |"% … "#'% ; )% , … , )#'% !" !" !" !"#$ ɿը૾ ɿ଎౓ ɿςΩετ ɿલճใु !" = $" , &" , '" , (")* , +")* !"#$ ɿલճߦಈ
  8. .FNPSZ#BTFE1SFEJDUPS • جຊతʹ͸Conditional Variational AutoEncoderɽ • Objective ɿपล໬౓ͷ࠷େԽ 20 !"

    ! "# |%& , … , %#)& , "& , … , "#)& , *# !" ! "# |%& , … , %#)& , *& , … , *# !" # !" , … , !%&" , '" , … , '%&" ! "# |%& , … , %#)& , "& , … , "#)& 1SJPSϞσϧ !" log $ %& , … , %) ; +, , … , +) ≥ " #$ %&:()*|,&:()* #$ %(|%&:()*,,&:( log 1 23 , 43 |53 − 78 9 53 |5::3;< ; 2::3 ||1 53 |5::3;< ; >::3;< ? 3@: ɼ !" # 3FDPOTUSVDUJPO&SSPS 1SJPSͷΞοϓσʔτ ֤εςοϓͷฏۉΛͱΔ !" !" !" !"#$ ɿը૾ ɿ଎౓ ɿςΩετ ɿલճใु !" = $" , &" , '" , (")* , +")* !"#$ ɿલճߦಈ ͜͜ͷಋग़ո͍͠ɽɽɽ
  9. .#1ͷશମਤ 22 !" !" !" !"#$ !"#$ $// MLP !"

    MLP ℎ"#$ !"#$ !" ! "# 0: & + ()*+, "# & + 1: + log Σ)*+ !" $// MLP !" # !" # !" # !"#$ % !"#$ % !" # Policy LSTM !" LSTM LSTM Linear Memory ["# $, … , "$ '(] !" ℎ" MLP ! " #$%, Σ " #$% ! "#$ %&' Σ "#$ %&' ! " # $%&, Σ # $%& 1SJPS 1PTUFSJPS 3FQBSBNFUBSJB[UJPO5SJDL .#1-45. ޙड़
  10. • 2ͭͷଛࣦΛఆٛ͢ΔɽʢActor-CriticͰ͍͏ɼPolicy BasedͱValue Basedʣ - MBP LossɿMBPͷVLBʢม෼ԼݶʣͱValue Based - Policy

    LossɿPolicy LSTMʹ͓͚ΔPolicy GradientʢPolicy Basedʣ ύϥϝʔλߋ৽ʹ͍ͭͯ 26 .#1-PTT 1PMJDZ-PTT
  11. • 2ͭͷ߲ʹ෼͔ΕΔɽ • ਖ਼ن෼෍ͷKLڑ཭ • ೖྗͷ࠶ߏ੒ޡࠩͱɼظ଴ใुޡ߲ࠩʹ෼͚Δɽ .#1-PTT 27 log $

    %& , … , %) ; +, , … , +) ≥ " #$ %&:()*|,&:()* #$ %(|%&:()*,,&:( log 1 23 , 43 |53 − 78 9 53 |5::3;< ; 2::3 ||1 53 |5::3;< ; >::3;< ? 3@: ֤εςοϓͷฏۉΛͱΔ !" # $% |$':%)* ; ,:% ||- $% |$':%)* ; .':%)* = !" 0 1% 234, Σ% 234 ||0 1% 278, Σ% 278 !" #$|&':$)*,&,:$ log 01 , 21 |31 ≡ −{789: ℒ89: + α>?@ ℒ>?@ + 7BCD ℒBCD + 7E?FBEG ℒE?FBEG + 7D?HD ℒD?HD } − 7E?DJEK ℒE?DJEK #BZFTJBO4VSQSJTFͱ΋ղऍՄೳʁ
  12. • ௕͞vͷTruncation WindowΛઃఆ͢Δɽ
 Tɿ૯εςοϓ਺ɼkɿݱࡏͷεςοϓ ظ଴ใुʢ3FUVSOʣޡࠩ 28 log $% &% |(%)*

    , ,:% !" !" ɹMLP !" # ɹMLP !" # !" # (SBEJFOU4UPQ !" = $ %" + '%"() + '*%"(* + ⋯ + ',-"(). / 0 1,() , log 6,() 89 : < <, %" + '%"() + '*%"(* + ⋯ + '=-" 89 < ≤ :. !" #$ , log )$ !" #$ , &$ "EWBOUBHFؔ਺ ঢ়ଶՁ஋ؔ਺ ߦಈՁ஋ؔ਺ͱΈͳͤΔ ℒ"#$%"& = ( ) *$ − ,- .$ , log 3$ ) + *$ − 56789:;<=>?6 ,- .$ , log 3$ + @- .$ , ;$ )
  13. • MBPͰ࢖༻ͨ͠ঢ়ଶՁ஋ؔ਺Λɼͦͷ··࢖༻͢Δɽ • Bootstrapύϥϝʔλɿγɼλ • TDޡࠩ • Generalised Advantage EstimationͰύϥϝʔλθΛߋ৽͢Δɽ

    • ࿦จʹॻ͍͍ͯΔԾ૝ίʔυɿ
 Τϯτϩϐʔ߲Λ͚ͭͯɼہॴղʹ
 མͪͮΒ͍Α͏ʹ͍ͯ͠Δɽ 1PMJDZ-PTT 29 !" ≡ $" + &'( )"*+ , log 0"*+ − '( )" , log 0" !" ← !" + % % &' ( )*(+( ) ∇- log 1- 2( |ℎ( , 6( 789 : ( );( 789 : (;7:
  14. • Memory Based PredictorʢMBPʣɿલεςοϓͷߦಈʹΑΓಘΒΕͨ؀ڥ͔Βͷ ೖྗΛɼજࡏۭؒʹམͱ͠ࠐΉCondtional VAEɽલεςοϓʹ͓͚ΔMemory͔Β ͷ஋΋Ϟσϧʹೖྗ͢Δ͜ͱͰɼࠓ·Ͱͷ؍ଌ஋ͱߦಈΛ৚݅෇͚Δ͜ͱ͕Ͱ͖ Δɽજࡏۭؒʹམͱ͠ࠐΜͩදݱ͸ɼPolicyΛܾఆ͢Δࡍʹͱͯ΋༗ޮͱͳΔͱ͍ ͏ͷ͕ɼ͜ͷ࿦จͷओு͍ͯ͠Δͱ͜ΖͰ͋Δɽ
 ·ͨɼظ଴ใु΋߹Θͤͯ༧ଌ͠ɼͦͷࡍͷؔ਺͕ঢ়ଶՁ஋ؔ਺ͱͯ͠Έͳ͢͜

    ͱ͕Ͱ͖ΔɽʢCriticͱͯ͠ಇ͘ɽʣ • Policy LSTMɿMBPͰΤϯίʔυ͞ΕͨϕΫτϧͱɼMemoryͷ஋ΛೖྗʹऔΓɼ ํࡦͷ֬཰෼෍Λग़ྗ͢ΔLSTMɽʢActorͱͯ͠ಇ͘ɽʣ • MemoryɿNTM΍DNCͷ΋ͷΑΓ؆ૉԽ͞Ε͍ͯΔɽʢDNCͰ͸ॻ͖ࠐΈͷॏΈ ·ͰNN͕ίϯτϩʔϧ͢Δ͕ɼ͜ͷϞσϧͰ͸ػցతʹܾఆ͞ΕΔɽʣෳ਺ͷ Attention Mapͱߟ͑Δͷ͕ɼҰ൪ཧղ͠΍͍͢ɽSoft AttentionΛ࢖͏࠷େͷཧ༝ ͸ɼඍ෼ՄೳͰ͋Δͱ͍͏͜ͱɽ • ׬શʹEnd-to-EndͳϞσϧͱͳ͍ͬͯΔɽ શମΛ·ͱΊΔͱɽɽɽ 30
  15. • Predictive Codingɿ೴಺Ͱ͸ɼ্ҐγεςϜ͸ɼԼҐγεςϜʹ͓͚Δਆܦ׆ಈ ͷ༧ଌ஋Λ఻ୡ͠ʢFeedbackʣɼԼҐγεςϜ͸༧ଌ஋ͱ࣮ଌ஋ͷޡࠩΛ্Ґγ εςϜʹ఻ୡ͍ͯ͠ΔʢFeedforwardʣͷͰ͸ʁɼͱ͍͏Ծઆɽ • ࢹ֮໺Ͱ͸ɼRaoΒͷܭࢉϞσϧͷݚڀ͕ͱͯ΋༗໊ɿΤϥʔ৴߸Λհͯ͠ϑΟʔ υόοΫΛड͚औΔγʔϯੜ੒Ϟσϧ͕ଘࡏ͢Δ͜ͱͷࣔࠦ[Rao et al,

    Nat Neuro 1999] • ࠷ۙͷݚڀͰ͸ɼࢹ֮໺಺ͷॠؒతͳ༧ଌޡ͚ࠩͩͰͳ͘ɼࢹ֮໺ͱهԱʹ·ͭ ΘΔւഅͷܹࢗύλʔϯͷؒʹ૬͕ؔ͋Δ͜ͱ΋ௐ΂ΒΕ͍ͯΔ[Hindy et al, Nat Neuro 2016]ɽ
 (MERLINͰ֎෦ϝϞϦ͕૊Έࠐ·Ε͍ͯΔͷ͸ɼ͜ͷݚڀʹӨڹΛड͚͍ͯΔɽ) .#1Λಋೖ͢Δഎܠɿ1SFEJDUJWF$PEJOH 32
  16. • Predictive CodingΛ૊ΈࠐΜͩਂ૚ֶशϞσϧͱͯ͠͸ɼPredNet͕༗໊ɽ[Lotter et al, ICLR 2017] • Deepmind͕࠷ۙग़ͨ͠ɼContrastive Predictive

    Coding[Oord, NIPS 2018]͸ɼਆ ܦՊֶతͳPredictive Codingͱ͸͋·Γؔ܎ͳ͍ؾ͕͢Δɽ • MERLINʹ͓͍ͯ͸ɼલεςοϓͷPriorͱ؍ଌ஋Λ༻͍ͯPosteriorͷಋग़ˠLSTM Λ௨ͯ͠PriorΛߋ৽ͱ͍͏ϧʔϓΛ࡞͍ͬͯΔͱ͜Ζ͕ɼRaoͷϞσϧͱࣅ͍ͯ ͯɼPredictive Codingͷߟ͑ʹ૬౰͢Δʁ .#1Λಋೖ͢Δഎܠɿ1SFEJDUJWF$PEJOH 33
  17. • GluckͱMyersͷܭࢉϞσϧ[1993]ɿAutoencoder[Hinton, 1989]Λ༻͍ͯɼւഅ͕ ೖྗܹࢗͷѹॖදݱΛڭࢣͳ͠Ͱֶश͍ͯ͠Δ͜ͱΛओுͨ͠ʢ͓ͦΒ͘ʣॳΊ ͯͷݚڀɽʢϞσϧ͸੍ݶ෇͖ϘϧπϚϯϚγϯͱ΄΅ಉ͡ʣ • Deepmind͸࠷ཱۙͯଓ͚ʹɼSpatial Encodingʹؔ͢ΔܭࢉਆܦՊֶతͳݚڀΛ ൃද͍ͯ͠Δɽʢجຊతʹ͸ɼ࣍εςοϓͷܹࢗΛ༧ଌ͢ΔѹॖදݱʢSuccessor Representationʣ͕༗ޮͩͱ͢Δओுʣ

    - The hippocampus as a predictive map [Stachenfeld, Nat Neuro 2017] - The successor representation in human reinforcement learning [Momennejad, Nat Human 2017] • ࠓ೥࿩୊ʹͳͬͨGrid-Like Navigation[Banino, Nature 2018]ͰͷϞσϧ΋ɼදݱ Λ֫ಘ͢ΔLSTMͱPolicy LSTMʹ෼͚͍ͯΔɽͦͷҙຯͰ͸ɼMERLINͱ͘͢͝ ࣅ͍ͯΔɽ • MERLINͰ͸ɼࣗ෼ͷաڈߦಈʹ৚͚݅ͮΒΕͨજࡏۭؒදݱͷॏཁੑΛࣔࠦͯ͠ ͍Δɽ .#1Λಋೖ͢Δഎܠɿւഅͷۭؒදݱ 34
  18. • The Kanerva Machine[Wu, ICLR 2018]ɿҰݴͰݴ͏ͱ֎෦ϝϞϦʹ৚͚݅ͮΒΕ ͨConditional VAE͕ͩɼ֎෦ϝϞϦ΁ͷ”ಡΈࠐΈ”ͱ”ॻ͖ࠐΈ”΋ؚΊͯɼશͯ֬ ཰ਪ࿦ͰදͤΔɽ •

    MERLIN͸ɼॻ͖ࠐΈʹؔͯ͠͸ֶशͷ༨஍͕ͳ͍ػցతͳૢ࡞ɽ ֎෦ϝϞϦΛ࣋ͬͨਂ૚ੜ੒Ϟσϧ 36 (FOFSBUJWF NPEFM 3FBEJOH *OGFSFODF 8SJUJOH *OGFSFODF
  19. 44

  20. 45

  21. 46

  22. • MERLINɿۙ೥ͷਂ૚ֶश෼໺ͰఏҊ͞Εͨओཁͳٕज़ʢVAEɼMemory Augmented Neural NetworkɼA3CͳͲʣΛͰ͖Δ͚ͩ૊Έ߹ΘͤɼPredictive Codingͱւഅཧ࿦ʹج͍ͮͨEnd to EndͳϞσϧɽWorld Modelͷֶशͱɼํࡦ ͷֶशΛಉ࣌ʹߦ͍ͬͯΔɽ

    • ͜͜ͰѻΘΕ͍ͯΔλεΫ͸શͯGoal-Directedɼͭ·ΓΰʔϧʹͨͲΓண͍ͯ· ͱ·ͬͨใु͕΋Β͑ΔΑ͏ͳλεΫͰ͋Δɽ࿦จͰ΋৮ΕΒΕ͍ͯΔ௨Γɼੜ ෺͕ੜଘ͢ΔͨΊʹඞཁͳຊೳΛϞσϧԽ͠Α͏ͱͨ͠ͱ͍͏ͷ͕ݩʑͷಈػɽ ʢੜଘʹେࣄͳܦݧΛ͠ɼ௕ཱ࣌ؒͬͨޙʹͦͷܦݧΛਖ਼֬ʹ૝ى͠ߦಈܾఆʹ ੜ͔͢͜ͱ͕Ͱ͖Δ͔Ͳ͏͔ɽ֎෦ϝϞϦ͸ͦͷҙຯͰ্ख͘ػೳ͍ͯ͠Δͱ͍ ͏͜ͱ͕Ͱ͖Δɽʣ ·ͱΊ 48
  23. 1. https://www.chiikunote.com/entry/conditioning 2. R. S. Sutton. “Learning to Predict by

    the Methods of Temporal Differences,” 1988 3. C.J.C.H. Watkins. “Learning from delayed rewards,” 1989 4. http://discovermagazine.com/2015/may/17-resetting-the-addictive-brain 5. Schultz W, Dayan P, Montague PR. “A neural substrate of prediction and reward,” 1997 6. Doya, K. “Metalearning and neuromodulation,” 2002 7. http://www.actioforma.net/kokikawa/Evolutional_aspects/Evolutional_aspects.html 8. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu. “Asynchronous Methods for Deep Reinforcement Learning,” 2016 9. John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel. “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” 2015 10. Alex Graves, Greg Wayne, Ivo Danihelka. “Neural Turing Machine,” 2014 11. Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory,” 2016 12. Rajesh P. N. Rao, Dana H. Ballard. “Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects,” 1999 13. Nicholas C Hindy, Felicia Y Ng & Nicholas B Turk-Browne. “Linking pattern completion in the hippocampus to predictive coding in visual cortex,” 2016 14. William Lotter, Gabriel Kreiman, David Cox. “Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning,” 2016 15. Aaron van den Oord, Yazhe Li, Oriol Vinyals. “Representation Learning with Contrastive Predictive Coding,” 2018 16. Karl J. Friston and Stefan Kiebel. “Predictive coding under the free-energy principle,” 2009 17. Karl J. Friston , Jean Daunizeau, Stefan J. Kiebel. “Reinforcement Learning or Active Inference?,” 2009 18. Karl J. Friston. “The free-energy principle: a unified brain theory?,” 2010 19. Andy Clark. “Whatever next? Predictive brains,situated agents, and the future ofcognitive science,” 2013 20. Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani. “Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop,” 2018 21. Mark A. Gluck Catherine E. Myers. “Hippocampal mediation of stimulus representation: A computational theory,” 1993 22. G. E. Hinton and R. R. Salakhutdinov. “Reducing the Dimensionality of Data with Neural Networks,” 2006 23. Kimberly L Stachenfeld, Matthew M Botvinick & Samuel J Gershman. “The hippocampus as a predictive map,” 2017 24. I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw & S. J. Gershman. “The successor representation in human reinforcement learning,” 2017 25. Andrea Banino et al. “Vector-based navigation using grid-like representations in artificial agents,” 2018 26. David Ha, Jürgen Schmidhuber. “World Models,” 2018 27. Juergen Schmidhuber. “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models,” 2015 28. Yan Wu, Greg Wayne, Alex Graves, Timothy Lillicrap. “The Kanerva Machine: A Generative Distributed Memory,” 2018 29. Wojciech Zaremba, Ilya Sutskever. “Reinforcement Learning Neural Turing Machines - Revised,” 2015 ࢀߟจݙ 50