Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
K.Takiguchi
April 28, 2018
Technology
0
590
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
Fashion×AI「似合う」を届けるためのWEARのAI戦略
zozotech
PRO
2
1.1k
マイクロサービスへの5年間 ぶっちゃけ何をしてどうなったか
joker1007
17
7.3k
2025年 開発生産「可能」性向上報告 サイロ解消からチームが能動性を獲得するまで/ 20251216 Naoki Takahashi
shift_evolve
PRO
2
210
Amazon Connect アップデート! AIエージェントにMCPツールを設定してみた!
ysuzuki
0
120
Bedrock AgentCore Memoryの新機能 (Episode) を試してみた / try Bedrock AgentCore Memory Episodic functionarity
hoshi7_n
2
1.4k
障害対応訓練、その前に
coconala_engineer
0
150
Amazon Quick Suite で始める手軽な AI エージェント
shimy
1
1.4k
20251218_AIを活用した開発生産性向上の全社的な取り組みの進め方について / How to proceed with company-wide initiatives to improve development productivity using AI
yayoi_dd
0
530
モダンデータスタックの理想と現実の間で~1.3億人Vポイントデータ基盤の現在地とこれから~
taromatsui_cccmkhd
1
220
SREには開発組織全体で向き合う
koh_naga
0
400
フィッシュボウルのやり方 / How to do a fishbowl
pauli
2
340
ペアーズにおけるAIエージェント 基盤とText to SQLツールの紹介
hisamouna
2
1.3k
Featured
See All Featured
RailsConf 2023
tenderlove
30
1.3k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
180
Ruling the World: When Life Gets Gamed
codingconduct
0
93
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
0
1.8k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
13
Automating Front-end Workflow
addyosmani
1371
200k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
62
Odyssey Design
rkendrick25
PRO
0
430
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
0
98
Mind Mapping
helmedeiros
PRO
0
36
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19