$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
K.Takiguchi
April 28, 2018
Technology
0
590
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
Snowflake導入から1年、LayerXのデータ活用の現在 / One Year into Snowflake: How LayerX Uses Data Today
civitaspo
0
2.3k
Connection-based OAuthから学ぶOAuth for AI Agents
flatt_security
0
360
モダンデータスタックの理想と現実の間で~1.3億人Vポイントデータ基盤の現在地とこれから~
taromatsui_cccmkhd
2
260
『君の名は』と聞く君の名は。 / Your name, you who asks for mine.
nttcom
1
120
AI駆動開発ライフサイクル(AI-DLC)の始め方
ryansbcho79
0
160
「もしもデータ基盤開発で『強くてニューゲーム』ができたなら今の僕はどんなデータ基盤を作っただろう」
aeonpeople
0
240
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
230
事業の財務責任に向き合うリクルートデータプラットフォームのFinOps
recruitengineers
PRO
2
200
障害対応訓練、その前に
coconala_engineer
0
190
意外と知らない状態遷移テストの世界
nihonbuson
PRO
1
240
re:Invent2025 3つの Frontier Agents を紹介 / introducing-3-frontier-agents
tomoki10
0
400
日本Rubyの会: これまでとこれから
snoozer05
PRO
5
230
Featured
See All Featured
New Earth Scene 8
popppiees
0
1.2k
Making the Leap to Tech Lead
cromwellryan
135
9.7k
Automating Front-end Workflow
addyosmani
1371
200k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
73
What the history of the web can teach us about the future of AI
inesmontani
PRO
0
370
Being A Developer After 40
akosma
91
590k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
60
37k
Embracing the Ebb and Flow
colly
88
4.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
Designing for Timeless Needs
cassininazir
0
93
Unsuck your backbone
ammeep
671
58k
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19