Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
[ICML2021 論文読み会]Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research
ninohira
August 18, 2021
0
1.2k
[ICML2021 論文読み会]Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research
イベントURL
https://line.connpass.com/event/221309/
論文URL
https://arxiv.org/abs/2011.14826
ninohira
August 18, 2021
Tweet
Share
More Decks by ninohira
See All by ninohira
[論文紹介]Jukebox: A Generative Model for Music
ninohira
0
500
無駄分析を避ける為にデータサイエンティストに求められる能力
ninohira
3
10k
アーティストにとっての「愛」とは?~What is ”Love" for artist?~
ninohira
1
9.1k
Data Gateway Talk Vol.5運営資料
ninohira
1
340
今再びのRによる因果推論_Causal Interference by R_#japanr
ninohira
2
9.2k
因果推論の基礎とその罠 _Basic and Trap of Causal Inference_#白金鉱業
ninohira
5
11k
ドキュメンテーションのすヽめ_#MLbeginners
ninohira
1
440
Data Gateway Talk Vol.1運営資料
ninohira
1
2.6k
新卒が考えた理想のDS新卒研修
ninohira
1
570
Featured
See All Featured
How New CSS Is Changing Everything About Graphic Design on the Web
jensimmons
213
11k
Code Reviewing Like a Champion
maltzj
506
37k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
107
16k
Bootstrapping a Software Product
garrettdimon
296
110k
Building Better People: How to give real-time feedback that sticks.
wjessup
344
17k
How To Stay Up To Date on Web Technology
chriscoyier
780
250k
5 minutes of I Can Smell Your CMS
philhawksworth
196
18k
Keith and Marios Guide to Fast Websites
keithpitt
404
21k
The Cult of Friendly URLs
andyhume
68
4.8k
Web Components: a chance to create the future
zenorocha
303
40k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
181
15k
Side Projects
sachag
450
37k
Transcript
ICML2021 論文読み会 Revisiting Rainbow: Promoting more Insightful and Inclusive Deep
Reinforcement Learning Research Masato Ninohira, LINE Corporation 2021.08
จબఆͷϓϩηε จ༰ ॴײ Contents
จબఆͷϓϩηε จ༰ ॴײ Contents
ࣗॳͷจಡΈձ จબఆϓϩηε • ͍ͰࢀՃ • ୲จͲ͏͠Αʁ ͱΓ͋͑ͣ(PPHMF͕ग़͍ͯ͠Δจ୳ͦ͏ • (PPHMFͱ͍͏ڊਓͷݞʹ͍ͬͯ͘ελΠϧ •
(PPHMFBU*$.-Λݟ͚ͭΔ ڧԽֶश ϒϩάهࣄͷ͋ΔຊจΛબఆ • ڧԽֶशͳΒଟগಡΊΔ • (PPHMF"*#MPHʹղઆϒϩάهࣄͷ͋Δຊจʹબఆ
จબఆͷϓϩηε จ༰ ॴײ Contents
Πϯτϩɾഎܠ ڧԽֶशͷख๏ͱͯ͠3BJOCPX͕ΒΕ͍ͯΔ • %2/ ʹ࣌Ͱ͋ͬͨԼهΞϧΰϦζϜΛՃ • %PVCMF2MFBSOJOH 1SJPSJUJ[FEFYQFSJFODFSFQMBZ
%VFMJOHOFUXPSL .VMUJTUFQMFBSOJOH %JTUSJCVUJPOBM3- /PJTZOFUT • Γ߹͍͕ϩοΫϚϯͷதϘεϥογϡͬͯᄻ͑ͯ·ͨ͠ খنͳݚڀάϧʔϓ͕࠶ݱ PSཱ͔ͪ͏ͷࠔ • "SDBEF-FBSOJOH&OWJSPONFOU "-& ήʔϜΛࢼߦ͢Δͷʹ 1Ͱ (16EBZ খنڥͰՁͷ͋ΔΞϧΰͰ͋Δ͔Λূ໌Ͱ͖Δ͜ͱࠓޙͷݚڀऀʹͱͬͯॏཁ • *OMJHIUPGUIJT XFXJTIUPJOWFTUJHBUFUISFFRVFTUJPOT • 8PVME)FTTFMFUBM IBWFBSSJWFEBUUIFTBNFRVBMJUBUJWFDPODMVTJPOT IBEUIFZSVOUIFJSFYQFSJNFOUTPOBTFUPG TNBMMFSTDBMFFYQFSJNFOUT খ͍͞ڥͰಉ݁͡Ռͩͬͨʁ • %PUIFSFTVMUTPG)FTTFMFUBM HFOFSBMJ[FXFMMUPOPO"-&FOWJSPONFOUT PSBSFUIFJSSFTVMUT PWFSMZTQFDJGJD UPUIFDIPTFOCFODINBSL "SDBEF-FBSOJOH&OWJSPONFOUҎ֎Ͱ௨༻͢Δʁ • *TUIFSFTDJFOUJGJDWBMVFJODPOEVDUJOHFNQJSJDBMSFTFBSDIJOSFJOGPSDFNFOUMFBSOJOHXIFOSFTUSJDUJOHPOFTFMGUP TNBMM UPNJETDBMFFOWJSPONFOUT ڧԽֶशͰখɾதنʹݶఆ࣮ͨ͠ূݚڀՊֶతʹՁ͕͋Δ͔ʁ
3FWJTJUJOH3BJOCPX ͭͷΞϧΰΛશ෦Θͳ͍ ڥ • ݹయతͳTNBMMͳڥ $BSU1PMF "DSPCPU -VOBS-BOEFS BOE.PVOUBJO$BS •
$16Ͱ࣌ؒະຬ • ৮ͬͨ͜ͱ͕͋Δਓʹ͔Δͱࢥ͏͕ɺຊʹTNBMM • "-&.JO"UBS "TUFSJYɺ#SFBLPVUɺ'SFFXBZɺ4FBRVFTUɺ4QBDF*OWBEFST • ؆қ൛"-& • 1Ͱd࣌ؒ ͜ΕͰݩͷ"UBSJͷHBNFʹൺͨΒ͍ ݁Ռ • جຊతʹɺ%2/୯ΑΓɺ͍͔ͭ͘ͷཁૉΛͨ͠ํ͕ྑ͍ • %JTUSJCVUJPOBM3-୯ͰೖΕͯྑ͘ͳΒͳ͍έʔε͋Δ
#FZPOEUIF3BJOCPX &YBNJOJOHOFUXPSLBSDIJUFDUVSFTBOECBUDITJ[FT • ݕূ༰ϋΠύϥ ϨΠϠʔ தؒͷVOJU όοναΠζ • ݁ՌϨΠϠʔd
ͩͱਫ਼Լ͕Δ VOJUҎ্ όοναΠζ Ҏ্ &YBNJOJOHEJTUSJCVUJPOQBSBNFUFSJ[BUJPOT • ݕূ༰%JTUSJCVUJPOBM3-ͷؔ࿈ख๏ͱͯ͠ɺҎԼͭͷख๏͕ΒΕ͍ͯΔ • %JTUSJCVUJPOBM3FJOGPSDFNFOU-FBSOJOHXJUI2VBOUJMF3FHSFTTJPO 23%2/ • *NQMJDJURVBOUJMFOFUXPSLTGPSEJTUSJCVUJPOBMSFJOGPSDFNFOUMFBSOJOH *2/ • ݁Ռ • 23%2/ • ݹయతͳڥ 3BJOCPXNFUIPEجຊతʹྑ͘ͳΔҰํ3BJOCPXʹ23Λ͢ͱѱԽ • .JOBUBS3BJOCPX23BJOCPX ը૾ೖΕέʔεͰྑ͍ DPOW͕͋Δ//ʹ23͕༗ޮ͔ • *2/ • ݹయతͳڥڥʹΑΓ3BJOCPXNFUIPEͷޮՌ༷ʑ • .JOBUBS 3BJOCPX*3BJOCPX 23%2/ͱಉ༷ͷ݁Ռ
#FZPOEUIF3BJOCPX .VODIBVTFO3FJOGPSDFNFOU-FBSOJOH .%2/ • ͭͷಛ UIFVTFPGTUPDIBTUJDQPMJDJFTBVHNFOUJOHUIFSFXBSEXJUIUIFTDBMFEMPHr QPMJDZ • ݁Ռ
• %2/ • ݹయతͳڥ.VODIBVTFOͳ͍ํ͕ྑ͍ • .JOBUBS%2/.%2/͕ͩɺ*2/.*2/ • 3BJOCPX • ݹయతͳڥ.3BJOCPX3BJOCPX.*3BJOCPX • .JOBUBSͳΜͱ͍͑ͳ͍͕ɺ "TUFSJY #SFBLPVU BOE4QBDF*OWBEFST Ͱ.VODIBVTFO͕ޮ͍ͯͦ͏ 3FFWBMVBUJOHUIF)VCFSMPTT • ݕূ༰ݩจ-PTT)VCFSMPTT PQUJNJ[FS3.41SPQ͕ͩɺ.4& BEBNʹͨ͠ΒͲ͏ͳΔ͔ • ݁Ռ • )VCFS-PTT 3.41SPQ .4& "EBN • 3.41SPQݻఆͩͱɺ.4&)VCFSMPTT
1VUUJOHJUBMMUPHFUIFS 3BJOCPXGMBWPST • %2/3BJOCPX #FZPOE.FUIPE 23BJOCPX *3BJOCPX • 3BJOCPX #FZPOE.FUIPEͷதͰͲΕ͕Ұ൪ྑ͍͔Ұ֓ʹݴ͑ͳ͍
&OWJSPONFOUQSPQFSUJFT ֤छڥʹ͍ͭͯͷߟɻʹৄࡉ͕هࡌ • ݹయతͳڥ • $BSU1PMF୯७ͳλεΫ͕ނʹɺֶशʹහײͰPQUJNJ[FSͷ҆ఆੑͷݕূʹ͍͍ͯΔ OPJTZOFUͱ.4&͕༗ޮ • -VOBS-BOEFS%JTUSJCVUJPOBM3-ܥͷݕূʹ͍͍ͯΔ • "DSPCPU .PVOUBJO$BSใु͕গͳ͘ɺ୳ࡧઓུͷݕূʹ͍͍ͯΔ • .JO"UBSੜ"UBSJΛ͏ΑΓɺܰྔతʹ$// 3-ͷݕূΛͰ͖Δ • ݩ"UBSJͩͱɺҰൠతͳը૾ೝࣝతͳ$//෦ͷܭࢉίετ͕େ͖͗͢Δ • 4FBRVFTU 'SFFXBZ෦తͳ؍ଌՄೳੑͱใुͷεύʔεੑ͔Βɺ୳ࡧख๏ͷݕূʹ͍͍ͯΔ
$PODMVTJPO ݶΒΕͨܭࢉڥͰ༷ʑͳݕূΛߦͬͨ • Կ͔Λൃݟ͢ΔΑΓݟ͢ํ͕Δ͔ʹ؆୯͕ͩ͜ͷݚڀͷҙਤதখنͷڥʹ͓͚Δ࣮ূݚڀͷଥੑͱҙٛͷשى • ܭࢉྔͷগͳ͍ڥͰͷݕূݚڀͷՁʹͳΔ େنͳϕϯνϚʔΫΛ൷͍ͯ͠ΔΘ͚Ͱͳ͘ɺখنͳϕϯνϚʔΫʹՁ͕͋Δ͜ͱΛओு • େنͳϕϯνϚʔΫΛॏࢹ͠ͳ͍Α͏ٻΊ͍ͯΔͷͰͳ͍ •
ݚڀऀʹখنͳڥΛௐࠪͷوॏͳπʔϧͱͯ͠ߟྀ͢ΔΑ͏ɺࠪಡऀʹখنͳʹযΛ࣮ͯͨূతͳݚڀΛ ൱ఆ͠ͳ͍Α͏ٻΊ͍ͯΔ͚ͩ • ݚڀͷঢ়گΛΑΓ໌֬ʹѲ͢Δ͜ͱ͕Ͱ͖ɺ·ͨɺଟ༷Ͱܙ·Εͳ͍ίϛϡχςΟ͔Βͷ৽نࢀೖऀͷোนΛݮΒ͢͜ͱ͕ Ͱ͖Δͱ৴͍ͯ͡Δ
จબఆͷϓϩηε จ༰ ॴײ Contents
ॴײ • 3BJOCPXपΓΛ͋Δఔ͍ͬͯͨͷͰεϥεϥಡΊͨ • ܭࢉࢿݯ͕ͳ͍ͱݚڀͷελʔτϥΠϯʹཱͯͳ͍͜ͱʹҰੴΛ͡Δ͜ͱڵຯਂ͍