Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DRL 組み合わせ最適化
Search
newzy
November 24, 2021
Research
8
90
DRL 組み合わせ最適化
newzy
November 24, 2021
Tweet
Share
Other Decks in Research
See All in Research
ロボット学習における大規模検索技術の展開と応用
denkiwakame
1
150
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
420
Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice
masakat0
0
190
SNLP2025:Can Language Models Reason about Individualistic Human Values and Preferences?
yukizenimoto
0
200
「どう育てるか」より「どう働きたいか」〜スクラムマスターの最初の一歩〜
hirakawa51
0
1k
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
160
超高速データサイエンス
matsui_528
1
210
CoRL2025速報
rpc
2
3.2k
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
200
GPUを利用したStein Particle Filterによる点群6自由度モンテカルロSLAM
takuminakao
0
590
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
satai
3
330
令和最新技術で伝統掲示板を再構築: HonoX で作る型安全なスレッドフロート型掲示板 / かろっく@calloc134 - Hono Conference 2025
calloc134
0
420
Featured
See All Featured
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
980
It's Worth the Effort
3n
187
29k
A better future with KSS
kneath
239
18k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.4k
Side Projects
sachag
455
43k
Thoughts on Productivity
jonyablonski
73
4.9k
Unsuck your backbone
ammeep
671
58k
BBQ
matthewcrist
89
9.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
How STYLIGHT went responsive
nonsquared
100
5.9k
Transcript
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning Kwon,
Yeong-Dae, et al. NeurIPS, 2020, vol.33
ཁ •Έ߹Θͤ࠷దԽʹ͓͚ΔɼਂڧԽֶश ͰͷFOEUPFOEͷۙࣅղ๏ɽ •طଘͷਂڧԽֶशख๏ͱൺֱͯ͠ɼ ܭࢉ࣌ؒɾਫ਼ͱʹେ͖͘վળͨ͠ •८ճηʔϧεϚϯͳͲͰݕূɽ 2/26
ಋೖ
Έ߹Θͤ࠷దԽ •८ճηʔϧεϚϯૹܭըɼφοϓβοΫ ͳͲʹද͞ΕΔΑ͏ͳ࠷దͳΈ߹ΘͤΛٻΊΔɽ 4/26 精度 計算時間 厳密解法 最適 遅い 近似解法
最適に 近い 早い https://onl.tw/vzkASMX
ڧԽֶशʢ3FJOGPSDFNFOU-FBSOJOH3-ʣ •3-ɿஞ࣍తͳҙࢥܾఆΛղ͘ख๏ɽ ྦྷੵใु͕࠷େʹͳΔΑ͏ͳํࡦΛݟ͚ͭΔ͜ͱ͕తɽ 5/26 ઃఆͱͯ͠ɼঢ়ଶू߹ɼߦಈू߹ɼใुؔΛ ઃఆ͢Δඞཁ͕͋Δɽ https://onl.tw/98fQVvW
ํࡦϕʔεͷ3&*/'03$& 6/26 •ํࡦ 𝜋 𝑠 ɿঢ়ଶ𝑠ʹ͓͚Δߦಈ𝑎Λग़ྗ͢Δؔ •𝜋! ɿύϥϝʔλ 𝜃ͰύϥϝʔλԽ͞Εͨํࡦ •ํࡦͷߋ৽ࣜɿ𝛼ֶशɼ𝐽
𝜋! తؔ 𝜃 ← 𝜃 + 𝛼∇! 𝐽 𝜋! •ํࡦޯͷࣜɿ𝔼ظɼ𝑅" ऩӹɼ𝑏 𝑠 ϕʔεϥΠϯ ∇! 𝐽 𝜋! = 𝔼#! ∇! log 𝜋! ⋅ 𝑅" − 𝑏 𝑠
ઌߦݚڀ
1PJOUFS/FUXPSLTʢʣ Έ߹Θͤ࠷దԽͰར༻͢ΔωοτϫʔΫ •ॏෳͳ͘બ͠ɼग़ྗύλʔϯྻΛੜ͢Δɽ •ೖྗใ͔Βಛநग़Λߦ͏FODPEFSͱɼFODPEFS ͷग़ྗΛར༻ͯ͑͠ͱͳΔܦ࿏Λग़ྗ͢ΔEFDPEFS͔ ΒͳΔɽ •FODPEFSͱEFDPEFSʹ-45.Λ༻ɽ 8/26
"UUFOUJPO .PEFMʢʣ 1PJOUFS/FUXPSLTͷվྑ൛ •1PJOUFS/FUXPSLTಉ༷ɼ&ODPEFSͱ%FDPEFSΛ༻͢Δ Ϟσϧɽ •-45.ഇࢭ͠ɼ.VMUJIFBE"UUFOUJPOΛ࠾༻ɽ 9/26
ख๏
ຊจͷख๏ͷΞΠσΞ 11/26 ࠷ॳͷߦಈɼޙͷΤʔδΣϯτͷߦಈʹେ͖͘ӨڹΛ༩͑Δɽ Έ߹Θͤ࠷దԽʹΑ͘ݟΒΕΔରশੑΛར༻ɽ
10.0 •3&*/'03$&XJUI#BTFMJOFɿయܕతͳํࡦޯϕʔεͷ 3-ΞϧΰϦζϜΛ༻ɽ •ෳͷҟͳΔ։࢝ߦಈΛࢦఆ͠ɼෳͷߦಈܥྻʢيಓʣ ΛಘΔɽ •ʻ45"35ʼτʔΫϯΛ༻͍ͳ͍ɽ 12/26 従来 POMO
10.0 ∇! 𝐽 𝜃 ≈ 1 𝑁 6 $%& '
𝑅 𝜏$ − 𝑏$ 𝑠 ∇! log 𝑝! 𝜏$ ∣ 𝑠 𝑤ℎ𝑒𝑟𝑒 𝑝! 𝝉$ ∣ 𝑠 ≡ @ "%( ) 𝑝! 𝑎" $ ∣ 𝑠, 𝑎&:"+& $ يಓ 𝝉$ = 𝑎& $ , 𝑎( $ , … , 𝑎) $ GPS 𝑖 = 1,2, … , 𝑁 ڞ༗ϕʔεϥΠϯ 𝑏$(𝑠) = 𝑏TIBSFE (𝑠) = 1 𝑁 6 ,%& ' 𝑅 𝝉, GPS 𝑖 = 1,2, … , 𝑁 13/26
܇࿅෦ͷٖࣅίʔυ 14/26
*OTUBODF"VHNFOUBUJPOɿਪख๏ •ը૾ॲཧͷσʔλΦʔάϝϯςʔγϣϯ͔Βணɽ •ࠓճ͏࠲ඪɼYͷ୯Ґਖ਼ํܗʢୈҰݶʣͷ ͷΛར༻ɽ 15/26 今回使う Instance Augmentation
ਪ෦ͷٖࣅίʔυ 16/26
࣮ݧ
࣮ݧ ࣮ݧ༰ •10.0Λ༻͍ͯɼҎԼͷΛղ͍ͨ݁ՌΛଞͷදతख๏ͱ ൺֱɽ ८ճηʔϧεϚϯ ༰ྔ੍͋Γͷૹܭը φοϓβοΫ
18/26
ֶशۂઢɿ८ճηʔϧεϚϯ 19/26 50地点 100地点
८ճηʔϧεϚϯʢ541ʣ 20/26
८ճηʔϧεϚϯʢ541ʣ 21/26
༰ྔ੍͋Γͷૹܭըʢ$731ʣ 22/26
φοϓβοΫʢ,1ʣ 23/26
࣮ݧͷ·ͱΊ •ҟͳΔઃఆͷͭͷΈ߹Θͤ࠷దԽʹରͯ͠ɼ ಉҰͷ܇࿅ख๏ͱ//ΞʔΩςΫνϟΛ༻͍ͯ༗ͳ݁ՌΛ ಘͨɽ •܇࿅ɾਪख๏ͱͯ͠ͷ10.0ɼਪख๏ͱͯ͠ͷ *OTUBODF"VHNFOUBUJPOͲͪΒޮՌతͳख๏Ͱ͋Δ͜ͱ Λ֬ೝͨ͠ɽ 24/26
·ͱΊ ຊจͰΈ߹Θͤ࠷దԽʹ͓͍ͯɼରশੑΛར༻ ͯ͠3-ͷαϯϓϧޮਫ਼ ਪ࣌ؒΛॖ͢Δख๏Λ հͨ͠ɽ 25/26
ࢀߟจݙ ,XPO :FPOH%BF FUBM10.01PMJDZ0QUJNJ[BUJPOXJUI .VMUJQMF0QUJNBGPS3FJOGPSDFNFOU-FBSOJOH "EWBODFTJO /FVSBM*OGPSNBUJPO1SPDFTTJOH4ZTUFNT
,PPM 8PVUFS )FSLF WBO)PPG BOE.BY8FMMJOH"UUFOUJPO -FBSOUP4PMWF3PVUJOH1SPCMFNT *OUFSOBUJPOBM$POGFSFODF PO-FBSOJOH3FQSFTFOUBUJPOT 7JOZBMT 0SJPM .FJSF 'PSUVOBUP BOE/BWEFFQ+BJUMZ1PJOUFS /FUXPSLT "EWBODFTJO/FVSBM*OGPSNBUJPO1SPDFTTJOH 4ZTUFNT 26/26