Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
K.Takiguchi
April 28, 2018
Technology
620
0
Share
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Other Decks in Technology
See All in Technology
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4.3k
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
yu4u
0
330
AIを共同作業者にして書籍を執筆する方法 / How to Write a Book with AI as a Co-Creator
ama_ch
2
130
ハーネスエンジニアリングをやりすぎた話 ~そのハーネスは解体された~
gotalab555
2
1.2k
最近の技術系の話題で気になったもの色々(IoT系以外も) / IoTLT 花見予定会(たぶんBBQ) @都立潮風公園バーベキュー広場
you
PRO
1
220
レビューしきれない?それは「全て人力でのレビュー」だからではないでしょうか
amixedcolor
0
290
KGDC_13_Amazon Q Developerで挑む! 13事例から見えたAX組織変革の最前線_公開情報
kikugawa
0
110
AI時代のガードレールとしてのAPIガバナンス
nagix
0
210
Do Ruby::Box dream of Modular Monolith?
joker1007
1
310
ワールドカフェI /チューターを改良する / World Café I and Improving the Tutors
ks91
PRO
0
280
明日からドヤれる!超マニアックなAWSセキュリティTips10連発 / 10 Ultra-Niche AWS Security Tips
yuj1osm
0
540
AIが書いたコードを信じられない問題 〜レビュー負荷を下げるために変えたこと〜 / The AI Code Trust Gap: Reducing the Review Burden
bitkey
PRO
3
850
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
890
My Coaching Mixtape
mlcsv
0
98
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.3k
Abbi's Birthday
coloredviolet
2
7k
Into the Great Unknown - MozCon
thekraken
40
2.3k
Technical Leadership for Architectural Decision Making
baasie
3
320
AI: The stuff that nobody shows you
jnunemaker
PRO
6
570
Game over? The fight for quality and originality in the time of robots
wayneb77
1
160
Building an army of robots
kneath
306
46k
Typedesign – Prime Four
hannesfritz
42
3k
Rails Girls Zürich Keynote
gr2m
96
14k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8k
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19