Upgrade to Pro — share decks privately, control downloads, hide ads and more …

これからの強化学習2.6

moyomot
May 19, 2017
190

 これからの強化学習2.6

moyomot

May 19, 2017
Tweet

Transcript

  1. ͜Ε͔ΒͷڧԽֶश
    2.6 ϦεΫߟྀܕڧԽֶश
    GUNOSY σʔλϚΠχϯάݚڀձ #121

    View Slide

  2. INTRODUCTION
    ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍໰୊
    ▸ ڧԽֶशͰ͸ใु࿨ͷظ଴஋ʢϦλʔϯʣͷ࠷େԽΛ໨తͱ͢Δ
    ▸ ظ଴஋ͷ࠷େԽʢ࠷খԽʣ໰୊ͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ
    ▸ ى͜Δ֬཰͕௿͍͕ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏৔߹Ͱ͋ΓϢʔ
    βʔ͕ϦεΫճආʹڵຯͷ͋Δ৔߹
    ▸ େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢Δ࢓૊ΈͰͳ͍
    ▸ גࣜ౤ࢿͷΑ͏ͳ৔߹͸খ͞ͳ֬཰Ͱى͜Δେ͖ͳଛࣦΛճආ͠
    ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ
    ▸ Ϧλʔϯʹظ଴஋Ҏ֎ͷ৘ใ͕ͳ͍ͨΊ

    View Slide

  3. INTRODUCTION
    ๅ͘͡ͷظ଴஋
    ▸ ߴ͍֬཰Ͱ1ηϯτṶ͔Δ
    ▸ ଟ͘ͷਓ͸Ṷ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ
    ͖͍ͱߟ͑ΔͷͰ͸
    ▸ http://citeseerx.ist.psu.edu/viewdoc/download?
    doi=10.1.1.45.8264&rep=rep1&type=pdf

    View Slide

  4. INTRODUCTION
    ໨࣍
    ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ
    ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏
    ▸ ͋Δछͷ࠷ѱέʔεධՁ
    ▸ ޮ༻ؔ਺΍࣌ؒࠩ෼(TD)ޡࠩͷඇઢܗԽ
    ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ
    ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯ෼෍ਪఆ
    ▸ Ϧλʔϯͷ֬཰෼෍͕Θ͔Ε͹ Value-atRisk౳ɺ༷ʑͳϦεΫ
    ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ
    ▸ 2.6.4 ͓ΘΓʹ

    View Slide

  5. 2.6.2 ϦεΫߟྀܕڧԽֶश๏
    ͋Δछͷ࠷ѱέʔεධՁ
    ▸ Q-learningΛ֦ு͢Δํ๏
    ▸ Q-learningʢ෮शʣ
    ▸ ϕϧϚϯํఔࣜ
    ▸ TD(࣌ؒࠩ෼)ֶश

    View Slide

  6. 2.6.2 ϦεΫߟྀܕڧԽֶश๏
    Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger
    ▸ maximinͱ͸
    ▸ ૝ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ
    ▸ ͱ͍͏ͷఆࣜԽ
    ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ
    ▸ Q-learningͷTDֶशΛ࢖༻Ͱ͖ΔϝϦοτ
    ؔvs৿ຊ Aઓུ Bઓུ
    Aઓུ 100 -100
    Bઓུ 10 -10

    View Slide

  7. 2.6.2 ϦεΫߟྀܕڧԽֶश๏
    ޮ༻ؔ਺΍࣌ؒࠩ෼ޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν
    ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧ࿦Ͱར༻͞ΕΔඇઢ
    ܗͳޮ༻ؔ਺Λར༻͢ΔΞϓϩʔν
    ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜
    ͱ͸Ͱ͖ͳ͍
    ▸ TDޡࠩΛඇઢܗม׵͠ɺϢʔβʔͷϦεΫબ޷ੑΛ൓ө͢
    ΔΞϓϩʔν

    View Slide

  8. 2.6.2 ϦεΫߟྀܕڧԽֶश๏
    ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν
    ▸ ใुʹ௚઀ؔ܎͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν
    ▸ ϦεΫؔ਺Λಋೖρ

    View Slide

  9. 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯ෼෍ਪఆ
    Ϧλʔϯ෼෍ͷਪఆ͕伴
    ▸ Ϧλʔϯ෼෍͔ΒϦεΫࢦඪΛಋग़͢Δ
    ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf

    View Slide

  10. Ϧλʔϯ෼෍ਪఆͷΞϓϩʔν
    ▸ γϛϡϨʔγϣϯΞϓϩʔν
    ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛे෼େ͖͘͢Ε͹ɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ
    λʔϯ෼෍ͷਪఆ͕Մೳ
    ▸ ܭࢉίετ͕๲େ
    ▸ ղੳతΞϓϩʔν
    ▸ Ϧλʔϯ෼෍Λղੳతʹղ͘෼෍ϕϧϚϯํఔࣜ
    ▸ ෼෍ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ
    ෼෍ਪఆΞϧΰϦζϜ
    ▸ https://pdfs.semanticscholar.org/
    1ec2/6e05c2577154213e1668ddd374e4da663309.pdf
    2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯ෼෍ਪఆ

    View Slide

  11. ෼෍ϕϧϚϯํఔࣜ
    2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯ෼෍ਪఆ

    View Slide

  12. ϊϯύϥϝτϦοΫɾϦλʔϯ෼෍ਪఆ
    2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯ෼෍ਪఆ
    ▸ ύʔςΟΫϧͰϦλʔϯ෼෍Λۙࣅ
    ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf

    View Slide