Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
K.Takiguchi
April 28, 2018
Technology
0
600
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
Webhook best practices for rock solid and resilient deployments
glaforge
2
300
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
260
Context Engineeringの取り組み
nutslove
0
360
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
230
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
340
SREのプラクティスを用いた3領域同時 マネジメントへの挑戦 〜SRE・情シス・セキュリティを統合した チーム運営術〜
coconala_engineer
2
670
What happened to RubyGems and what can we learn?
mikemcquaid
0
300
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
200
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.5k
超初心者からでも大丈夫!オープンソース半導体の楽しみ方〜今こそ!オレオレチップをつくろう〜
keropiyo
0
110
Featured
See All Featured
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
51
Facilitating Awesome Meetings
lara
57
6.8k
Making Projects Easy
brettharned
120
6.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
How to make the Groovebox
asonas
2
1.9k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
The browser strikes back
jonoalderson
0
390
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
170
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
200
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
320
Navigating Weather and Climate Data
rabernat
0
110
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19