Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Soft Actor-Critic 解説
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
K.Takiguchi
April 28, 2018
Technology
0
600
Soft Actor-Critic 解説
Soft Actor-Critic 解説
K.Takiguchi
April 28, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
【Ubie】AIを活用した広告アセット「爆速」生成事例 | AI_Ops_Community_Vol.2
yoshiki_0316
1
110
Cosmos World Foundation Model Platform for Physical AI
takmin
0
940
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.6k
SREチームをどう作り、どう育てるか ― Findy横断SREのマネジメント
rvirus0817
0
320
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
250
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
93k
【Oracle Cloud ウェビナー】[Oracle AI Database + AWS] Oracle Database@AWSで広がるクラウドの新たな選択肢とAI時代のデータ戦略
oracle4engineer
PRO
2
170
Webhook best practices for rock solid and resilient deployments
glaforge
2
300
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
600
OCI Database Management サービス詳細
oracle4engineer
PRO
1
7.4k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
120
Featured
See All Featured
Documentation Writing (for coders)
carmenintech
77
5.3k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
69
Bash Introduction
62gerente
615
210k
Optimizing for Happiness
mojombo
379
71k
Test your architecture with Archunit
thirion
1
2.2k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
130
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
730
Context Engineering - Making Every Token Count
addyosmani
9
660
Designing for Timeless Needs
cassininazir
0
130
Transcript
Soft Actor-Critic: Off-Policy Maximum Deep Reinforcement Learning with a Stochastic
Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine NIPS 2017 Keio Machine Learning Seminar
3
4 48.: • >,$ & KJ .:6 •
MGLIC-0ICmnistD+AF9)@F9 48 .: • .:6 • M$%(?=#&.: 3*.: • /2B 6E1 7H;.:6 • M'!<5"AIAlphaGoD
(1/2) 4 Environment Agent Action Reward
State
(2/2) !" = $ ∑ &'"
( ) &*" + ,& , .& 5 /0 ," , ." = $0,1 +" + )/0 ,"34 , 5 ,"34 60 ," = $0,1 +" + )60(,"34 ) 5 ," = argmax> Q0 sA , aA
Actor Critic 4 Policy Critic
Environment Actor Critic
7 State Action
σ μ σ μ
8 ! = −1 %& = 0.5 ! =
0 %& = 1 ! = 1 %& = 3.0
Soft Actor-Critic 7
Soft Actor-Critic 10
Maximum Entropy Reinforcement Learning • Soft Actor-Critic Soft Q-Learning 11
Soft Actor-Critic • Maximum Entropy Reinforcement Learning "
12 # # " # $! log $% &' = )* (&' )
Soft Actor-Critic Algorithm 13
14
half-cheetah 15
Another Tasks 16
17
18 Pendulum-v0 MountainCarContinuous-v0
• 0/) DDPGME* : $1,3' • 0-08
7 9 =; $14 $1@6<! • 2?+.(%&$1) 05#"> 19