Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
K.Takiguchi
April 28, 2018
Technology
0
600
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
セキュリティについて学ぶ会 / 2026 01 25 Takamatsu WordPress Meetup
rocketmartue
1
310
OCI Database Management サービス詳細
oracle4engineer
PRO
1
7.4k
Why Organizations Fail: ノーベル経済学賞「国家はなぜ衰退するのか」から考えるアジャイル組織論
kawaguti
PRO
1
110
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
240
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
650
Oracle AI Database移行・アップグレード勉強会 - RAT活用編
oracle4engineer
PRO
0
100
OWASP Top 10:2025 リリースと 少しの日本語化にまつわる裏話
okdt
PRO
3
820
生成AIを活用した音声文字起こしシステムの2つの構築パターンについて
miu_crescent
PRO
3
210
超初心者からでも大丈夫!オープンソース半導体の楽しみ方〜今こそ!オレオレチップをつくろう〜
keropiyo
0
110
コスト削減から「セキュリティと利便性」を担うプラットフォームへ
sansantech
PRO
3
1.5k
ClickHouseはどのように大規模データを活用したAIエージェントを全社展開しているのか
mikimatsumoto
0
260
Bedrock PolicyでAmazon Bedrock Guardrails利用を強制してみた
yuu551
0
250
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Abbi's Birthday
coloredviolet
1
4.8k
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
830
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
70
The agentic SEO stack - context over prompts
schlessera
0
640
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
380
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
Statistics for Hackers
jakevdp
799
230k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
440
The untapped power of vector embeddings
frankvandijk
1
1.6k
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19