Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DRL 組み合わせ最適化
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
newzy
November 24, 2021
Research
110
8
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
DRL 組み合わせ最適化
newzy
November 24, 2021
Other Decks in Research
See All in Research
LLMアプリケーションの透明性について
fufufukakaka
0
240
量子コンピュータの紹介
oqtopus
0
330
英語教育 “研究” のあり方:学術知とアウトリーチの緊張関係
terasawat
1
990
「なんとなく」の顧客理解から脱却する ──顧客の解像度を武器にするインサイトマネジメント
tajima_kaho
10
7.6k
SOTAのさらに先へ:厳しい推論制約下での高性能モデルのPost-Training
analokmaus
0
1.3k
Research Engineerという仕事 / Research Engineering: Bridging Research and Business
chck
1
210
Ghost in the 7‑Zip: The Shadow of Residential Proxies Creeping into Your Life
nttcom
0
1.2k
AGI4OPT:自然言語から数理最適化を導くエ ージェントスキル Translating Human Intent into Mathematical Optimization
mickey_kubo
0
140
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
150
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
860
第66回コンピュータビジョン勉強会@関東 Epona: Autoregressive Diffusion World Model for Autonomous Driving
kentosasaki
0
630
[BlackHatAsia2026] Hidden Telemetry: Uncovering TraceLogging ETW Providers You're Not Using (Yet)
asuna_jp
1
530
Featured
See All Featured
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
610
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.8k
Art, The Web, and Tiny UX
lynnandtonic
304
22k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
Navigating Weather and Climate Data
rabernat
0
220
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
150
BBQ
matthewcrist
89
10k
Building the Perfect Custom Keyboard
takai
2
800
The agentic SEO stack - context over prompts
schlessera
0
820
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Transcript
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning Kwon,
Yeong-Dae, et al. NeurIPS, 2020, vol.33
ཁ •Έ߹Θͤ࠷దԽʹ͓͚ΔɼਂڧԽֶश ͰͷFOEUPFOEͷۙࣅղ๏ɽ •طଘͷਂڧԽֶशख๏ͱൺֱͯ͠ɼ ܭࢉ࣌ؒɾਫ਼ͱʹେ͖͘վળͨ͠ •८ճηʔϧεϚϯͳͲͰݕূɽ 2/26
ಋೖ
Έ߹Θͤ࠷దԽ •८ճηʔϧεϚϯૹܭըɼφοϓβοΫ ͳͲʹද͞ΕΔΑ͏ͳ࠷దͳΈ߹ΘͤΛٻΊΔɽ 4/26 精度 計算時間 厳密解法 最適 遅い 近似解法
最適に 近い 早い https://onl.tw/vzkASMX
ڧԽֶशʢ3FJOGPSDFNFOU-FBSOJOH3-ʣ •3-ɿஞ࣍తͳҙࢥܾఆΛղ͘ख๏ɽ ྦྷੵใु͕࠷େʹͳΔΑ͏ͳํࡦΛݟ͚ͭΔ͜ͱ͕తɽ 5/26 ઃఆͱͯ͠ɼঢ়ଶू߹ɼߦಈू߹ɼใुؔΛ ઃఆ͢Δඞཁ͕͋Δɽ https://onl.tw/98fQVvW
ํࡦϕʔεͷ3&*/'03$& 6/26 •ํࡦ 𝜋 𝑠 ɿঢ়ଶ𝑠ʹ͓͚Δߦಈ𝑎Λग़ྗ͢Δؔ •𝜋! ɿύϥϝʔλ 𝜃ͰύϥϝʔλԽ͞Εͨํࡦ •ํࡦͷߋ৽ࣜɿ𝛼ֶशɼ𝐽
𝜋! తؔ 𝜃 ← 𝜃 + 𝛼∇! 𝐽 𝜋! •ํࡦޯͷࣜɿ𝔼ظɼ𝑅" ऩӹɼ𝑏 𝑠 ϕʔεϥΠϯ ∇! 𝐽 𝜋! = 𝔼#! ∇! log 𝜋! ⋅ 𝑅" − 𝑏 𝑠
ઌߦݚڀ
1PJOUFS/FUXPSLTʢʣ Έ߹Θͤ࠷దԽͰར༻͢ΔωοτϫʔΫ •ॏෳͳ͘બ͠ɼग़ྗύλʔϯྻΛੜ͢Δɽ •ೖྗใ͔Βಛநग़Λߦ͏FODPEFSͱɼFODPEFS ͷग़ྗΛར༻ͯ͑͠ͱͳΔܦ࿏Λग़ྗ͢ΔEFDPEFS͔ ΒͳΔɽ •FODPEFSͱEFDPEFSʹ-45.Λ༻ɽ 8/26
"UUFOUJPO .PEFMʢʣ 1PJOUFS/FUXPSLTͷվྑ൛ •1PJOUFS/FUXPSLTಉ༷ɼ&ODPEFSͱ%FDPEFSΛ༻͢Δ Ϟσϧɽ •-45.ഇࢭ͠ɼ.VMUJIFBE"UUFOUJPOΛ࠾༻ɽ 9/26
ख๏
ຊจͷख๏ͷΞΠσΞ 11/26 ࠷ॳͷߦಈɼޙͷΤʔδΣϯτͷߦಈʹେ͖͘ӨڹΛ༩͑Δɽ Έ߹Θͤ࠷దԽʹΑ͘ݟΒΕΔରশੑΛར༻ɽ
10.0 •3&*/'03$&XJUI#BTFMJOFɿయܕతͳํࡦޯϕʔεͷ 3-ΞϧΰϦζϜΛ༻ɽ •ෳͷҟͳΔ։࢝ߦಈΛࢦఆ͠ɼෳͷߦಈܥྻʢيಓʣ ΛಘΔɽ •ʻ45"35ʼτʔΫϯΛ༻͍ͳ͍ɽ 12/26 従来 POMO
10.0 ∇! 𝐽 𝜃 ≈ 1 𝑁 6 $%& '
𝑅 𝜏$ − 𝑏$ 𝑠 ∇! log 𝑝! 𝜏$ ∣ 𝑠 𝑤ℎ𝑒𝑟𝑒 𝑝! 𝝉$ ∣ 𝑠 ≡ @ "%( ) 𝑝! 𝑎" $ ∣ 𝑠, 𝑎&:"+& $ يಓ 𝝉$ = 𝑎& $ , 𝑎( $ , … , 𝑎) $ GPS 𝑖 = 1,2, … , 𝑁 ڞ༗ϕʔεϥΠϯ 𝑏$(𝑠) = 𝑏TIBSFE (𝑠) = 1 𝑁 6 ,%& ' 𝑅 𝝉, GPS 𝑖 = 1,2, … , 𝑁 13/26
܇࿅෦ͷٖࣅίʔυ 14/26
*OTUBODF"VHNFOUBUJPOɿਪख๏ •ը૾ॲཧͷσʔλΦʔάϝϯςʔγϣϯ͔Βணɽ •ࠓճ͏࠲ඪɼYͷ୯Ґਖ਼ํܗʢୈҰݶʣͷ ͷΛར༻ɽ 15/26 今回使う Instance Augmentation
ਪ෦ͷٖࣅίʔυ 16/26
࣮ݧ
࣮ݧ ࣮ݧ༰ •10.0Λ༻͍ͯɼҎԼͷΛղ͍ͨ݁ՌΛଞͷදతख๏ͱ ൺֱɽ ८ճηʔϧεϚϯ ༰ྔ੍͋Γͷૹܭը φοϓβοΫ
18/26
ֶशۂઢɿ८ճηʔϧεϚϯ 19/26 50地点 100地点
८ճηʔϧεϚϯʢ541ʣ 20/26
८ճηʔϧεϚϯʢ541ʣ 21/26
༰ྔ੍͋Γͷૹܭըʢ$731ʣ 22/26
φοϓβοΫʢ,1ʣ 23/26
࣮ݧͷ·ͱΊ •ҟͳΔઃఆͷͭͷΈ߹Θͤ࠷దԽʹରͯ͠ɼ ಉҰͷ܇࿅ख๏ͱ//ΞʔΩςΫνϟΛ༻͍ͯ༗ͳ݁ՌΛ ಘͨɽ •܇࿅ɾਪख๏ͱͯ͠ͷ10.0ɼਪख๏ͱͯ͠ͷ *OTUBODF"VHNFOUBUJPOͲͪΒޮՌతͳख๏Ͱ͋Δ͜ͱ Λ֬ೝͨ͠ɽ 24/26
·ͱΊ ຊจͰΈ߹Θͤ࠷దԽʹ͓͍ͯɼରশੑΛར༻ ͯ͠3-ͷαϯϓϧޮਫ਼ ਪ࣌ؒΛॖ͢Δख๏Λ հͨ͠ɽ 25/26
ࢀߟจݙ ,XPO :FPOH%BF FUBM10.01PMJDZ0QUJNJ[BUJPOXJUI .VMUJQMF0QUJNBGPS3FJOGPSDFNFOU-FBSOJOH "EWBODFTJO /FVSBM*OGPSNBUJPO1SPDFTTJOH4ZTUFNT
,PPM 8PVUFS )FSLF WBO)PPG BOE.BY8FMMJOH"UUFOUJPO -FBSOUP4PMWF3PVUJOH1SPCMFNT *OUFSOBUJPOBM$POGFSFODF PO-FBSOJOH3FQSFTFOUBUJPOT 7JOZBMT 0SJPM .FJSF 'PSUVOBUP BOE/BWEFFQ+BJUMZ1PJOUFS /FUXPSLT "EWBODFTJO/FVSBM*OGPSNBUJPO1SPDFTTJOH 4ZTUFNT 26/26