Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Soft Actor-Critic 解説

Soft Actor-Critic 解説

Soft Actor-Critic 解説

Avatar for K.Takiguchi

K.Takiguchi

April 28, 2018
Tweet

Other Decks in Technology

Transcript

  1. Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic

    Actor      Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
  2.  4 48.: • >,$ &  KJ .:6 •

    MGLIC-0ICmnistD+AF9)@F9 48 .: •  .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
  3.     (2/2) !" = $ ∑ &'"

    ( ) &*" + ,& , .&   5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 )         5 ," = argmax> Q0 sA , aA
  4. Actor Critic 4 Policy     Critic 

          Environment  Actor Critic 
  5.  8 ! = −1 %& = 0.5 ! =

    0 %& = 1 ! = 1 %& = 3.0
  6. Soft Actor-Critic • Maximum Entropy Reinforcement Learning "  

     12  #   # " # $! log $% &' = )* (&' )  
  7.  • 0/) DDPGME* :   $1,3' • 0-08

    7 9 =; $14 $1@6<! • 2?+.(%&$1)   05#">   19