Slide 1

Slide 1 text

#FMMNBOํఔࣜͷಋग़ .BUI$PEJOHڧԽֶशΛޠΖ͏ 

Slide 2

Slide 2 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 3

Slide 3 text

ڧԽֶशͷ໨తͱ͸ 5IBUBMMPGXIBUXFNFBOCZHPBMTBOEQVSQPTFTDBOCF XFMMUIPVHIUPGBTNBYJNJ[BUJPOPGUIFFYQFDUFEWBMVF PGUIFDVNVMBUJWFTVNPGBSFDFJWFETDBMBSTJHOBM DBMMFESFXBSE 4VUUPO#BSUP  3FXBSE)ZQPUIFTJT ྦྷੵใुͷ࠷େԽ

Slide 4

Slide 4 text

Gt ≐ Rt+1 + γ Rt+2 + γ2 Rt+3 + ⋯ = ∞ ∑ k=0 γkRt+1+k = Rt+1 + γ ∞ ∑ k=0 γkRt+1+k+1 = Rt+1 + γGt+1 ˡʹண໨ S0 π A0 → R1 → S1 π A1 → R2 → S2 π ⋯ ྦྷੵใु ऩӹ γ ∈ [0,1]

Slide 5

Slide 5 text

ϕϧϚϯ࠷దੑͷݪཧ ࠶ؼతߏ଄Λ࣋ͭ໰୊ ࠷దͳํࡦ͸ɺॳظঢ়ଶͱॳظܾఆ͕ͲΜͳ΋ͷͰ͋Εɺͦͷ݁Ռಘ ΒΕΔ࣍ͷঢ়ଶʹؔͯ͠ɺҎ߱ͷܾఆ͕ඞͣ࠷దํࡦʹͳ͍ͬͯΔͱ ͍͏ੑ࣭Λ΋ͭɻ ࢀরɿɹ#FMMNBO ɺ$IBQ*** 1SJODJQMFPG0QUJNBMJUZ ಈతܭը๏ %1 Ͱղ͚Δ ͔΋͠Εͳ͍

Slide 6

Slide 6 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 7

Slide 7 text

ه߸ͷ४උ ํࡦ ঢ়ଶભҠ֬཰ π(a|s) ≐ (At = a|St = s) p(s′|s, a) ≐ (St+1 = s′|St = s, At = a) r(s, a, s′) ≐ [Rt+1 |St = s, At = a, St+1 = s′] ˠঢ়ଶTͰߦಈBΛબ୒͢Δ֬཰ ˠঢ়ଶTͰߦಈBΛબ୒ͯ࣍͠ঢ়ଶT`ʹભҠ͢Δ֬཰ ˠঢ়ଶTͰߦಈBΛબ୒ͯ࣍͠ঢ়ଶT`ʹભҠͨ͠৔߹ͷଈ࣌ใुͷظ଴஋ ؀ڥͷμΠφϛΫε FOWJSPONFOU`TEZOBNJDT ଈ࣌ใु ใुؔ਺

Slide 8

Slide 8 text

Ձ஋ؔ਺ zঢ়ଶzՁ஋ؔ਺ lঢ়ଶɾߦಈzՁ஋ؔ਺ Vπ(s) ≐ π [Gt |St = s] Qπ(s, a) ≐ π [Gt |St = s, At = a] ྆ऀͷؔ܎ Vπ(s) ≐ π [Gt |St = s] = ∑ a π(a|s)π [Gt |St = s, At = a] = ∑ a π(a|s)Qπ(s, a) ˡ݁Ռ

Slide 9

Slide 9 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 10

Slide 10 text

Vπ(s) ≐ π [Gt |St = s] = π [Rt+1 + γGt+1 |St = s] = ∑ a π(a|s)∑ s′ p(s′|s, a)π [Rt+1 + γGt+1 |St = s, At = a, St+1 = s′] = ∑ a π(a|s)∑ s′ p(s′|s, a)(r(s, a, s′) + γπ [Gt+1 |St = s, At = a, St+1 = s′]) = ∑ a π(a|s)∑ s′ p(s′|s, a)(r(s, a, s′) + γVπ(s′)) ˡ݁Ռ #FMMNBOํఔࣜGPS Vπ(s)

Slide 11

Slide 11 text

Qπ(s, a) ≐ π [Gt |St = s, At = a] = π [Rt+1 + γGt+1 |St = s, At = a] = ∑ s′ p(s′|s, a)π [Rt+1 + γGt+1 |St = s, At = a, St+1 = s′] = ∑ s′ p(s′|s, a)(r(s, a, s′) + γπ [Gt+1 |St = s, At = a, St+1 = s′]) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γVπ(s′)) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γ∑ a′ π(a′|s′)Qπ(s′, a′)) #FMMNBOํఔࣜGPS Qπ(s, a) ˡ݁Ռ ˣ݁ՌΛ୅ೖ ˡ݁Ռ

Slide 12

Slide 12 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 13

Slide 13 text

#BDLVQEJBHSBN ɾঢ়ଶͱߦಈͷܥྻΛਤͰද͢ ɾ˓͸ঢ়ଶɺ˔͸ߦಈ ·ͨ͸ঢ়ଶɾߦಈର Λද͢ ɾϧʔτϊʔυͷՁ஋Λܭࢉ͢Δ࣌ʹ࢖͏ ɾϧʔτϊʔυ Ұ൪্ͷϊʔυ ͷՁ஋͕ ͲΜͳཁૉ͔Β੒Γཱ͍ͬͯΔ͔Λදݱ͠ ͍ͯΔ

Slide 14

Slide 14 text

#BDLVQEJBHSBNͰ #FMMNBOํఔࣜΛ֬ೝ Vπ(s) = ∑ a π(a|s)Qπ(s, a) Qπ(s, a) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γVπ(s′)) π(a|s) Vπ(s) s a1 a2 Qπ(s, a1 ) Qπ(s, a2 ) Qπ(s, a) p(s′|s, a) (s, a) r(s, a, s′1 ) s′1 s′2 r(s, a, s′2 ) ˠ݁Ռ ࠶ Vπ(s′1 ) Vπ(s′2 ) ˠ݁Ռ ࠶

Slide 15

Slide 15 text

#BDLVQEJBHSBNͰ #FMMNBOํఔࣜΛ֬ೝ Vπ(s) = ∑ a π(a|s)Qπ(s, a) = ∑ a π(a|s)∑ s′ p(s′|s, a)(r(s, a, s′) + γVπ(s′)) Qπ(s, a) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γVπ(s′)) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γ∑ a′ π(a′|s′)Qπ(s′, a′)) Vπ(s) Vπ(s′) Qπ(s, a) Qπ(s, a) Qπ(s′, a′) π ˠ݁Ռ ࠶ p p r(s, a, s′) π(a|s) Vπ(s′) r(s, a, s′) ˣ݁ՌΛ୅ೖ ˠ݁Ռ ࠶

Slide 16

Slide 16 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 17

Slide 17 text

࠷దՁ஋ؔ਺ V*(s) = max π Vπ(s) for any Q*(s, a) = max π Qπ(s, a) for any s ∈ s ∈ , a ∈ ɾ͜ͷؔ܎Λຬͨ͢ ͕গͳ͘ͱ΋ͭଘࡏ͢Δ ࠷దํࡦ ɾ͜ͷ ʹΑͬͯɺऩӹͷ࠷େԽ͕ୡ੒͞ΕΔ π π

Slide 18

Slide 18 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 19

Slide 19 text

#BDLVQEJBHSBNͰ #FMMNBO࠷దํఔࣜΛ֬ೝ V*(s) = ∑ a π(a|s) max a Q*(s, a) Q*(s, a) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γV*(s′)) V*(s) = max a ∑ s′ p(s′|s, a)(r(s, a, s′) + γV*(s′)) Q*(s, a) = ∑ s′ p(s′|s, a)(r(s, a, s′) + γ max a′ Q*(s′, a′)) max a max a max a max a

Slide 20

Slide 20 text

໨࣍ ͸͡Ίʹ Ձ஋ؔ਺ͷಋೖ Ձ஋ؔ਺ͷఆ͔ٛΒ#FMMNBOํఔࣜΛಋग़ #BDLVQEJBHSBN͔Β#FMMNBOํఔࣜΛΈΔ ࠷దՁ஋ؔ਺ͷ঺հ #BDLVQEJBHSBN͔Β࠷దՁ஋؍਺ΛΈΔ ·ͱΊ

Slide 21

Slide 21 text

·ͱΊ ɾ΍ͬͨ͜ͱ ˠ#FMMNBOํఔࣜΛಋ͘ ɾ͜ͷํఔ͕ࣜͨͪղ͚ΔέʔεͰ͸࠷దղ͕ಘΒΕΔ ˠ%ZOBNJD1SPHSBNNJOH #FMMNBO࠷దੑͷݪཧ ɾ௨ৗ͸ɺ͍Ζ͍Ζͳཧ༝Ͱ%1ͷ࣮ߦ͸ෆՄೳ ˠαϯϓϦϯάతख๏ͷग़൪ .POUF$BSMP๏ɺ5%๏ͳͲ ɾͨͩ͠ɺଟ͘ͷΞϧΰϦζϜ͸ɺ#FMMNBOํఔࣜͷۙࣅతͳղ๏ͱͯ͠ ཧղͰ͖Δ ˠ.$ͱ5%ɺ4BSTBͱ2MFBSOJOHͷಈ࡞ͷҧ͍ͳͲʹ͍ͭͯ #FMMNBOํఔࣜͷ؍఺͔Β΋ཧղͰ͖Δ

Slide 22

Slide 22 text

͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠