Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
K.Takiguchi
April 28, 2018
Technology
0
590
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
MariaDB Connector/C のcaching_sha2_passwordプラグインの仕様について
boro1234
0
1k
[Neurogica] 採用ポジション/ Recruitment Position
neurogica
1
130
Entity Framework Core におけるIN句クエリ最適化について
htkym
0
120
ハッカソンから社内プロダクトへ AIエージェント「ko☆shi」開発で学んだ4つの重要要素
sonoda_mj
6
1.7k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
事業の財務責任に向き合うリクルートデータプラットフォームのFinOps
recruitengineers
PRO
2
210
Amazon Quick Suite で始める手軽な AI エージェント
shimy
1
1.9k
LayerX QA Night#1
koyaman2
0
260
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
1
410
たまに起きる外部サービスの障害に備えたり備えなかったりする話
egmc
0
410
AI with TiDD
shiraji
1
290
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
2
250
Featured
See All Featured
Crafting Experiences
bethany
0
22
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
49
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
120
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.1k
The Invisible Side of Design
smashingmag
302
51k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
49k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
69
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
33
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19