Slide 1

Slide 1 text

޷ح৺ۦಈ୳ࡧ Curiosity-driven Exploration ๺ւಓେֶ޻ֶ෦ ػց஌ೳ޻ֶՊ೥ େాɹ߁ब

Slide 2

Slide 2 text

ڧԽֶशͷྲྀΕͱ໰୊఺ ؀ڥʹରͯ͠ߦಈΛͯ͠ใुΛಘΔ ࣮ࡍʹใुΛಘΒΕΔঢ়ଶ͕كʹ ͔͠ى͜Βͳ͍έʔε͕ଟʑ͋Δ ex. ໎࿏಺ͰΰʔϧΛݟͨΒใु ಺ൃతใुͷಋೖ ୳ࡧͷޮ཰Խ

Slide 3

Slide 3 text

ڧԽֶशʹ͓͚Δใु ֎ൃతใु extrinsic reward ؀ڥࣗମʹΑͬͯఏڙ͞ΕΔ ex. ໨త஍఺΁ͷ౸ୡɺࢮ๢etc ಺ൃతใु intrinsic reward ΤʔδΣϯτࣗମʹΑͬͯੜ੒͞ΕΔ ex. ໨৽͠͞ɺ޷ح৺etc

Slide 4

Slide 4 text

ɹ޷ح৺ۦಈ୳ࡧ 2ͭͷผʑͷχϡʔϥϧωοτϫʔΫΛτϨʔχϯά 1.ٯϞσϧ inverse dynamics model 2.ॱϞσϧ forward dynamics model

Slide 5

Slide 5 text

ɹಛ௃ϕΫτϧԽํ๏ͷ࠷దԽ ٯϞσϧ inverse dynamics model st , st+1Λ؍ଌޙCNN౳Ͱಛ௃ϕΫτϧԽ ؀ڥ͔Βঢ়ଶ ؒͷߦಈ st , st+1 at Λਪଌ = ̂ at ࣮ࡍͷߦಈͱਪଌ͞ΕͨߦಈͷෆҰக౓Λද͢ ଛࣦؔ਺ͷ࠷খԽ ϕ(st ), ϕ(st+1 ) ̂ at = g(st , st+1 ; θI ) st , st+1 → at ɹͷ࠷దԽ͕໨త ϕ

Slide 6

Slide 6 text

ɹ޷ح৺ͷಋೖ ॱϞσϧ forward dynamics model st , at ͔Β࣍ͷঢ়ଶ ঢ়ଶͱߦಈ Λਪଌ ࣮ࡍͷঢ়ଶͱਪଌ͞Εͨঢ়ଶͷෆҰக౓Λද͢ ޡࠩؔ਺Λ࠷খԽ ϕ(st+1 ) ̂ ϕ(st+1 ) = f(ϕ(st ), at ; θF ) ಺෦ใु͸࣍ঢ়ଶͷ༧૝͕೉͍͠΄Ͳେ͖͍ ri t = η 2 || ̂ ϕ(st+1 ) − ϕ(st+1 )||2 st , at → st+1 ٯϞσϧͰಘͨ ಛ௃ϕΫτϧ

Slide 7

Slide 7 text

Intrinsic Curiosity Module (ICM)ͷߏ଄ ৞ΈࠐΈ૚: 4૚ ϑΟϧλʔ: ֤ʑ32ݸ ΧʔωϧαΠζ: 3x3 શ݁߹૚: 288+1 → 256 → 288 શ݁߹૚: 288x2 → 256 → 4

Slide 8

Slide 8 text

PPO+CuriosityͱPPOͷൺֱʢPyramids؀ڥʣ

Slide 9

Slide 9 text

ࢀߟจݙ • ݩ࿦จ
 https://pathak22.github.io/noreward-rl/resources/ icml17.pdf • Unity Blog
 https://blogs.unity3d.com/jp/2018/06/26/solving-sparse- reward-tasks-with-curiosity/ • ୈ44ճCVษڧձʮڧԽֶश࿦จಡΈձʯൃදࢿྉ
 https://www.slideshare.net/takmin/curiosity-driven- exploration