$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DRL 組み合わせ最適化
Search
newzy
November 24, 2021
Research
8
93
DRL 組み合わせ最適化
newzy
November 24, 2021
Tweet
Share
Other Decks in Research
See All in Research
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
satai
3
390
生成AI による論文執筆サポート・ワークショップ ─ サーベイ/リサーチクエスチョン編 / Workshop on AI-Assisted Paper Writing Support: Survey/Research Question Edition
ks91
PRO
0
120
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
820
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
15
8k
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
satai
3
130
Nullspace MPC
mizuhoaoki
1
470
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
110
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
satai
3
520
Pythonでジオを使い倒そう! 〜それとFOSS4G Hiroshima 2026のご紹介を少し〜
wata909
0
1.2k
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
satai
3
140
J-RAGBench: 日本語RAGにおける Generator評価ベンチマークの構築
koki_itai
0
1k
多言語カスタマーインタビューの“壁”を越える~PMと生成AIの共創~ 株式会社ジグザグ 松野 亘
watarumatsuno
0
160
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
720
Statistics for Hackers
jakevdp
799
230k
Bash Introduction
62gerente
615
210k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.3k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Thoughts on Productivity
jonyablonski
73
5k
Transcript
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning Kwon,
Yeong-Dae, et al. NeurIPS, 2020, vol.33
ཁ •Έ߹Θͤ࠷దԽʹ͓͚ΔɼਂڧԽֶश ͰͷFOEUPFOEͷۙࣅղ๏ɽ •طଘͷਂڧԽֶशख๏ͱൺֱͯ͠ɼ ܭࢉ࣌ؒɾਫ਼ͱʹେ͖͘վળͨ͠ •८ճηʔϧεϚϯͳͲͰݕূɽ 2/26
ಋೖ
Έ߹Θͤ࠷దԽ •८ճηʔϧεϚϯૹܭըɼφοϓβοΫ ͳͲʹද͞ΕΔΑ͏ͳ࠷దͳΈ߹ΘͤΛٻΊΔɽ 4/26 精度 計算時間 厳密解法 最適 遅い 近似解法
最適に 近い 早い https://onl.tw/vzkASMX
ڧԽֶशʢ3FJOGPSDFNFOU-FBSOJOH3-ʣ •3-ɿஞ࣍తͳҙࢥܾఆΛղ͘ख๏ɽ ྦྷੵใु͕࠷େʹͳΔΑ͏ͳํࡦΛݟ͚ͭΔ͜ͱ͕తɽ 5/26 ઃఆͱͯ͠ɼঢ়ଶू߹ɼߦಈू߹ɼใुؔΛ ઃఆ͢Δඞཁ͕͋Δɽ https://onl.tw/98fQVvW
ํࡦϕʔεͷ3&*/'03$& 6/26 •ํࡦ 𝜋 𝑠 ɿঢ়ଶ𝑠ʹ͓͚Δߦಈ𝑎Λग़ྗ͢Δؔ •𝜋! ɿύϥϝʔλ 𝜃ͰύϥϝʔλԽ͞Εͨํࡦ •ํࡦͷߋ৽ࣜɿ𝛼ֶशɼ𝐽
𝜋! తؔ 𝜃 ← 𝜃 + 𝛼∇! 𝐽 𝜋! •ํࡦޯͷࣜɿ𝔼ظɼ𝑅" ऩӹɼ𝑏 𝑠 ϕʔεϥΠϯ ∇! 𝐽 𝜋! = 𝔼#! ∇! log 𝜋! ⋅ 𝑅" − 𝑏 𝑠
ઌߦݚڀ
1PJOUFS/FUXPSLTʢʣ Έ߹Θͤ࠷దԽͰར༻͢ΔωοτϫʔΫ •ॏෳͳ͘બ͠ɼग़ྗύλʔϯྻΛੜ͢Δɽ •ೖྗใ͔Βಛநग़Λߦ͏FODPEFSͱɼFODPEFS ͷग़ྗΛར༻ͯ͑͠ͱͳΔܦ࿏Λग़ྗ͢ΔEFDPEFS͔ ΒͳΔɽ •FODPEFSͱEFDPEFSʹ-45.Λ༻ɽ 8/26
"UUFOUJPO .PEFMʢʣ 1PJOUFS/FUXPSLTͷվྑ൛ •1PJOUFS/FUXPSLTಉ༷ɼ&ODPEFSͱ%FDPEFSΛ༻͢Δ Ϟσϧɽ •-45.ഇࢭ͠ɼ.VMUJIFBE"UUFOUJPOΛ࠾༻ɽ 9/26
ख๏
ຊจͷख๏ͷΞΠσΞ 11/26 ࠷ॳͷߦಈɼޙͷΤʔδΣϯτͷߦಈʹେ͖͘ӨڹΛ༩͑Δɽ Έ߹Θͤ࠷దԽʹΑ͘ݟΒΕΔରশੑΛར༻ɽ
10.0 •3&*/'03$&XJUI#BTFMJOFɿయܕతͳํࡦޯϕʔεͷ 3-ΞϧΰϦζϜΛ༻ɽ •ෳͷҟͳΔ։࢝ߦಈΛࢦఆ͠ɼෳͷߦಈܥྻʢيಓʣ ΛಘΔɽ •ʻ45"35ʼτʔΫϯΛ༻͍ͳ͍ɽ 12/26 従来 POMO
10.0 ∇! 𝐽 𝜃 ≈ 1 𝑁 6 $%& '
𝑅 𝜏$ − 𝑏$ 𝑠 ∇! log 𝑝! 𝜏$ ∣ 𝑠 𝑤ℎ𝑒𝑟𝑒 𝑝! 𝝉$ ∣ 𝑠 ≡ @ "%( ) 𝑝! 𝑎" $ ∣ 𝑠, 𝑎&:"+& $ يಓ 𝝉$ = 𝑎& $ , 𝑎( $ , … , 𝑎) $ GPS 𝑖 = 1,2, … , 𝑁 ڞ༗ϕʔεϥΠϯ 𝑏$(𝑠) = 𝑏TIBSFE (𝑠) = 1 𝑁 6 ,%& ' 𝑅 𝝉, GPS 𝑖 = 1,2, … , 𝑁 13/26
܇࿅෦ͷٖࣅίʔυ 14/26
*OTUBODF"VHNFOUBUJPOɿਪख๏ •ը૾ॲཧͷσʔλΦʔάϝϯςʔγϣϯ͔Βணɽ •ࠓճ͏࠲ඪɼYͷ୯Ґਖ਼ํܗʢୈҰݶʣͷ ͷΛར༻ɽ 15/26 今回使う Instance Augmentation
ਪ෦ͷٖࣅίʔυ 16/26
࣮ݧ
࣮ݧ ࣮ݧ༰ •10.0Λ༻͍ͯɼҎԼͷΛղ͍ͨ݁ՌΛଞͷදతख๏ͱ ൺֱɽ ८ճηʔϧεϚϯ ༰ྔ੍͋Γͷૹܭը φοϓβοΫ
18/26
ֶशۂઢɿ८ճηʔϧεϚϯ 19/26 50地点 100地点
८ճηʔϧεϚϯʢ541ʣ 20/26
८ճηʔϧεϚϯʢ541ʣ 21/26
༰ྔ੍͋Γͷૹܭըʢ$731ʣ 22/26
φοϓβοΫʢ,1ʣ 23/26
࣮ݧͷ·ͱΊ •ҟͳΔઃఆͷͭͷΈ߹Θͤ࠷దԽʹରͯ͠ɼ ಉҰͷ܇࿅ख๏ͱ//ΞʔΩςΫνϟΛ༻͍ͯ༗ͳ݁ՌΛ ಘͨɽ •܇࿅ɾਪख๏ͱͯ͠ͷ10.0ɼਪख๏ͱͯ͠ͷ *OTUBODF"VHNFOUBUJPOͲͪΒޮՌతͳख๏Ͱ͋Δ͜ͱ Λ֬ೝͨ͠ɽ 24/26
·ͱΊ ຊจͰΈ߹Θͤ࠷దԽʹ͓͍ͯɼରশੑΛར༻ ͯ͠3-ͷαϯϓϧޮਫ਼ ਪ࣌ؒΛॖ͢Δख๏Λ հͨ͠ɽ 25/26
ࢀߟจݙ ,XPO :FPOH%BF FUBM10.01PMJDZ0QUJNJ[BUJPOXJUI .VMUJQMF0QUJNBGPS3FJOGPSDFNFOU-FBSOJOH "EWBODFTJO /FVSBM*OGPSNBUJPO1SPDFTTJOH4ZTUFNT
,PPM 8PVUFS )FSLF WBO)PPG BOE.BY8FMMJOH"UUFOUJPO -FBSOUP4PMWF3PVUJOH1SPCMFNT *OUFSOBUJPOBM$POGFSFODF PO-FBSOJOH3FQSFTFOUBUJPOT 7JOZBMT 0SJPM .FJSF 'PSUVOBUP BOE/BWEFFQ+BJUMZ1PJOUFS /FUXPSLT "EWBODFTJO/FVSBM*OGPSNBUJPO1SPDFTTJOH 4ZTUFNT 26/26