Top recent ᶇղઆॻΛͬͨࢦಋ
(ݪจ: Teaching with Commentaries)
σΟʔϓχϡʔϥϧωοτϫʔΫͷޮՌతͳֶशࠔͰ͋Γɺ͜ΕΒͷϞσϧΛ࠷దʹֶ
श͢Δํ๏ʹ͍ͭͯଟ͘ͷະղܾͷ͕͍ͬͯ·͢ɻ࠷ۙ։ൃ͞Εͨχϡʔϥϧωο
τϫʔΫͷֶशΛվળ͢ΔͨΊͷख๏ɺςΟʔνϯάʢֶशใΛֶशϓϩηεதʹఏڙ
ͯ͠ԼྲྀͷϞσϧͷੑೳΛ্ͤ͞Δ͜ͱʣΛݕ౼͍ͯ͠ΔɻຊจͰɺςΟʔνϯάͷ
ൣғΛ͛ΔͨΊͷҰาΛ౿Έग़͢ɻຊจͰɺಛఆͷλεΫσʔληοτͰͷֶशʹ
ཱͭϝλֶशใͰ͋ΔղઆΛ༻͍ͨॊೈͳςΟʔνϯάϑϨʔϜϫʔΫΛఏҊ͢Δɻຊ
จͰɺ࠷ۙͷ҉ͷࠩҟԽʹؔ͢ΔݚڀՌΛ׆༻ͯ͠ɺޮతͰεέʔϥϒϧͳޯ
ϕʔεͷղઆจֶश๏ΛఏҊ͢Δɻݸʑͷ܇࿅ྫʹର͢ΔॏΈͷֶश͔Βɺϥϕϧʹґଘ͠
ͨσʔλ૿ڧϙϦγʔͷύϥϝʔλԽɺݦஶͳը૾ྖҬΛڧௐ͢ΔҙϚεΫͷදݱ·Ͱɺ
༷ʑͳ༻్Λ୳Δɻ͜ΕΒͷઃఆʹ͓͍ͯɺίϝϯλϦʔ܇࿅ੑೳΛ্ͤ͞ɺ
σʔληοτͱ܇࿅ϓϩηεʹؔ͢ΔجຊతͳಎΛఏڙ͢Δ͜ͱ͕Ͱ͖Δ͜ͱΛൃݟ͢
Δɻ
http://arxiv.org/abs/2011.03037v1
Google Research / MIT / University of Toronto
ˠڭࢣσʔλΛՃֶͯ͠शΛิॿ͢ΔϞσϧ ղઆϞσϧ
Λ࡞Δ൚༻తͳΞϧΰϦζϜΛߟ࣮͑ͯূͨ͠Α
Top10 Recent
1. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2. Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network
Representations Vary with Width and Depth
3. RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
4. Intriguing Properties of Contrastive Losses
5. Teaching with Commentaries
6. A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and
Challenges
7. Learning Invariances in Neural Networks
8. Underspecification Presents Challenges for Credibility in Modern Machine Learning
9. Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
10. Training Generative Adversarial Networks by Solving Ordinary Differential Equations
Slide 16
Slide 16 text
Top10 Hype
1. Fourier Neural Operator for Parametric Partial Differential Equations
2. Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network
Representations Vary with Width and Depth
3. Viewmaker Networks: Learning Views for Unsupervised Representation Learning
4. Large-scale multilingual audio visual dubbing
5. Text-to-Image Generation Grounded by Fine-Grained User Attention
6. Self Normalizing Flows
7. An Attack on InstaHide: Is Private Learning Possible with Instance Encoding?
8. Hyperparameter Ensembles for Robustness and Uncertainty Quantification
9. The geometry of integration in text classification RNNs
10. Scaling Laws for Autoregressive Generative Modeling
Slide 17
Slide 17 text
Top recent: Best10
Slide 18
Slide 18 text
ᶃը૾16×16ͷݴ༿ͷՁ͕͋Δɻେنը૾ೝࣝͷͨΊͷτϥϯε
ϑΥʔϚʔ
(ݪจ: An Image is Worth 16x16 Words: Transformers for Image
Recognition at Scale)
τϥϯεϑΥʔϚʔͷΞʔΩςΫνϟࣗવݴޠॲཧλεΫͷσϑΝΫτ
ελϯμʔυͱͳ͍ͬͯ·͕͢ɺίϯϐϡʔλϏδϣϯͷԠ༻·ͩ
ݶΒΕ͍ͯ·͢ɻϏδϣϯͰɺΈࠐΈωοτϫʔΫͱΈ߹
Θͤͯద༻͞ΕΔ͔ɺΈࠐΈωοτϫʔΫͷશମతͳߏΛҡ࣋ͨ͠
··ɺΈࠐΈωοτϫʔΫͷಛఆͷߏཁૉΛஔ͖͑ΔͨΊʹ༻
͞ΕΔɻզʑɺ͜ͷΑ͏ͳCNNͷґଘඞཁͳ͘ɺը૾ύονͷ
γʔέϯεʹద༻͞ΕΔ७ਮͳมث͕ը૾ྨλεΫʹ͓͍ͯඇ
ৗʹ༏ΕͨੑೳΛൃش͢Δ͜ͱΛࣔ͢ɻେྔͷσʔλͰࣄલʹֶश͠ɺ
ෳͷதن·ͨখنͷը૾ೝࣝϕϯνϚʔΫʢImageNetɺCIFAR-
100ɺVTABͳͲʣʹసૹ͢ΔͱɺVision Transformer (ViT)࠷ઌͷ
ΈࠐΈωοτϫʔΫͱൺֱͯ͠༏Εͨ݁ՌΛಘΔ͜ͱ͕Ͱ͖ɺֶशʹඞ
ཁͳܭࢉࢿݯେ෯ʹগͳ͘ͳΓ·͢ɻ
http://arxiv.org/abs/2010.11929v1
݄ͱ
ॏෳ
Google Research
Slide 19
Slide 19 text
ᶄϫΠυωοτϫʔΫͱσΟʔϓωοτϫʔΫಉ͜͡ͱΛֶͿͷ͔ʁχϡʔϥ
ϧωοτϫʔΫͷදݱ͕෯ͱਂ͞ʹΑͬͯͲͷΑ͏ʹมԽ͢Δ͔Λ໌Β͔ʹ͢Δ
(ݪจ: Do Wide and Deep Networks Learn the Same Things? Uncovering
How Neural Network Representations Vary with Width and Depth)
σΟʔϓɾχϡʔϥϧɾωοτϫʔΫͷޭͷ伴ͱͳΔཁҼɺΞʔΩςΫνϟͷਂ͞ͱ෯ΛมԽͤ͞
ͯੑೳΛ্ͤ͞ΔͨΊʹϞσϧΛεέʔϦϯάͰ͖Δ͜ͱͰ͢ɻχϡʔϥϧωοτϫʔΫઃܭͷ͜ͷ
୯७ͳಛੑɺ༷ʑͳλεΫʹରͯ͠ඇৗʹޮՌతͳΞʔΩςΫνϟΛੜΈग़͖ͯ͠·ͨ͠ɻͦΕʹ
͔͔ΘΒͣɺֶश͞Εͨදݱʹର͢Δਂ͞ͱ෯ͷޮՌʹ͍ͭͯͷཧղݶΒΕ͍ͯΔɻຊจͰɺ͜
ͷجຊతͳΛݚڀ͢Δɻ·ͣɺਂ͞ͱ෯ͷมԽ͕ϞσϧͷӅΕදݱʹͲͷΑ͏ͳӨڹΛ༩͑Δ͔Λ
ௐΔ͜ͱ͔Β࢝ΊɺΑΓେ͖ͳ༰ྔͷʢ෯͕͍·ͨਂ͍ʣϞσϧͷӅΕදݱʹಛతͳϒϩοΫ
ߏΛൃݟ͢Δɻ͜ͷϒϩοΫߏɺϞσϧͷ༰ྔ͕܇࿅ηοτͷαΠζʹରͯ͠େ͖͍߹ʹੜ͡
Δ͜ͱΛ࣮ূ͠ɺجૅͱͳΔ͕ͦͷදݱͷࢧతͳओΛอ࣋͠ɺ͍ͯ͠Δ͜ͱΛ͍ࣔͯ͠·
͢ɻ͜ͷൃݟɺҟͳΔϞσϧʹΑֶͬͯश͞ΕΔಛʹॏཁͳӨڹΛ༩͑Δɻ͢ͳΘͪɺϒϩοΫߏ
ͷ֎ଆͷදݱɺ෯ͱਂ͕͞ҟͳΔΞʔΩςΫνϟؒͰྨࣅ͍ͯ͠Δ͜ͱ͕ଟ͍͕ɺϒϩοΫߏ
֤Ϟσϧʹݻ༗ͷͷͰ͋ΔɻզʑɺҟͳΔϞσϧΞʔΩςΫνϟͷग़ྗ༧ଌΛੳ͠ɺશମతͳਫ਼
͕ࣅ͍ͯΔ߹Ͱɺ෯ͷ͍ϞσϧͱԞߦ͖ͷਂ͍ϞσϧͰɺΫϥεؒͰಠಛͷΤϥʔύλʔϯ
ͱมಈ͕ݟΒΕΔ͜ͱΛൃݟͨ͠ɻ
http://arxiv.org/abs/2010.15327v1
Google Research
ˠ෯ͱਂ͞ͷҧ͏ϞσϧΛੳͯ͠ɺͦΕͧΕͷಛੑΛௐͨɻ
ᶆରরతଛࣦͷັྗతͳಛੑ
(ݪจ: Intriguing Properties of Contrastive Losses)
ରরతଛࣦͱͦͷมछɼ࠷ۙɼಜͳ͠Ͱࢹ֮දݱΛֶश͢ΔͨΊʹඇৗʹΑ͘ΘΕΔΑ͏ʹͳͬ
͖͍ͯͯΔɽຊݚڀͰɼ·ͣɼΫϩεΤϯτϩϐʔʹجͮ͘ඪ४తͳରরతଛࣦΛɼ L alignment ͷ
நܗΛڞ༗͢ΔଛࣦͷΑΓ͍ϑΝϛϦʹҰൠԽ͢Δɽ+ ͜͜ͰɺӅ͞Εͨදݱɺ(1)͍͔ͭ͘ͷม
ɾ֦ுͷԼͰྻ͞Εɺ(2)ߴ͍ΤϯτϩϐʔͷࣄલͱҰக͢ΔΑ͏ʹྭ͞ΕΔɻզʑɺҰ
ൠԽ͞Εͨଛࣦͷ༷ʑͳΠϯελϯε͕ɺଟඇઢܗӨϔουͷଘࡏԼͰಉ༷ʹಈ࡞͢Δ͜ͱΛࣔ
͠ɺඪ४తͳରরతଛࣦͰ͘༻͍ΒΕ͍ͯΔԹεέʔϦϯά(τ)͕ɺ2ͭͷଛࣦ߲ؒͷॏΈ͚(λ)ʹ
ൺྫ͍ͯ͠Δ͜ͱΛࣔ͢ɻͦ͜ͰɺຊݚڀͰɺʮ৭ʯͱʮΦϒδΣΫτΫϥεʯͷΑ͏ͳɺ֦
ுϏϡʔͰڞ༗͞ΕΔڝ߹͢ΔಛͷؒͰಛ੍͕͞ΕΔͱ͍͏ڵຯਂ͍ݱΛݚڀ͍ͯ͠Δɻ໌ࣔ
తͰ੍ޚՄೳͳڝ߹ಛΛ࣋ͭσʔληοτΛߏங͠ɺରൺֶशͰɺֶश͍͢͠ڞ༗ಛͷϏο
τ͕ɺଞͷڝ߹ಛͷֶशΛ੍͠ɺ͞Βʹશʹ͙͜ͱ͕Ͱ͖Δ͜ͱΛࣔ͢ɻڵຯਂ͍͜ͱʹɺ
͜ͷಛੑ࠶ߏଛࣦʹجͮࣗ͘ಈΤϯίʔμʔͰɺΔ͔ʹ༗Ͱ͋Γ·ͤΜɻطଘͷରরతֶ
श๏ɺಛఆͷಛηοτΛଞͷಛηοτΑΓ༗རʹ͢ΔͨΊʹɺσʔλͷ૿ڧʹܾఆతʹґଘ͠
͍ͯ·͕͢ɺωοτϫʔΫ͕ͦͷ༰ྔ͕ڐ͢ݶΓɺڝ߹͢Δͯ͢ͷಛΛֶश͢Δ͜ͱΛΉ͜ͱ
Ͱ͖·͢ɻ
http://arxiv.org/abs/2011.02803v1
Google Research
ˠ$POUSBTUJWF-FBSOJOHͷಛੑݚڀɻڝ߹͢Δ̎ͭͷཁૉΛ
Έ߹ΘͤͨσʔληοτͰֶशͯ͠ɺׯবͷํΛௐͨ
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
ᶇղઆॻΛͬͨࢦಋ
(ݪจ: Teaching with Commentaries)
σΟʔϓχϡʔϥϧωοτϫʔΫͷޮՌతͳֶशࠔͰ͋Γɺ͜ΕΒͷϞσϧΛ࠷దʹֶ
श͢Δํ๏ʹ͍ͭͯଟ͘ͷະղܾͷ͕͍ͬͯ·͢ɻ࠷ۙ։ൃ͞Εͨχϡʔϥϧωο
τϫʔΫͷֶशΛվળ͢ΔͨΊͷख๏ɺςΟʔνϯάʢֶशใΛֶशϓϩηεதʹఏڙ
ͯ͠ԼྲྀͷϞσϧͷੑೳΛ্ͤ͞Δ͜ͱʣΛݕ౼͍ͯ͠ΔɻຊจͰɺςΟʔνϯάͷ
ൣғΛ͛ΔͨΊͷҰาΛ౿Έग़͢ɻຊจͰɺಛఆͷλεΫσʔληοτͰͷֶशʹ
ཱͭϝλֶशใͰ͋ΔղઆΛ༻͍ͨॊೈͳςΟʔνϯάϑϨʔϜϫʔΫΛఏҊ͢Δɻຊ
จͰɺ࠷ۙͷ҉ͷࠩҟԽʹؔ͢ΔݚڀՌΛ׆༻ͯ͠ɺޮతͰεέʔϥϒϧͳޯ
ϕʔεͷղઆจֶश๏ΛఏҊ͢Δɻݸʑͷ܇࿅ྫʹର͢ΔॏΈͷֶश͔Βɺϥϕϧʹґଘ͠
ͨσʔλ૿ڧϙϦγʔͷύϥϝʔλԽɺݦஶͳը૾ྖҬΛڧௐ͢ΔҙϚεΫͷදݱ·Ͱɺ
༷ʑͳ༻్Λ୳Δɻ͜ΕΒͷઃఆʹ͓͍ͯɺίϝϯλϦʔ܇࿅ੑೳΛ্ͤ͞ɺ
σʔληοτͱ܇࿅ϓϩηεʹؔ͢ΔجຊతͳಎΛఏڙ͢Δ͜ͱ͕Ͱ͖Δ͜ͱΛൃݟ͢
Δɻ
http://arxiv.org/abs/2011.03037v1
Google Research / MIT / University of Toronto
ˠڭࢣσʔλΛՃֶͯ͠शΛิॿ͢ΔϞσϧ ղઆϞσϧ
Λ࡞Δ൚༻తͳΞϧΰϦζϜΛߟ࣮͑ͯূͨ͠Α
ϐοΫΞοϓจ
Slide 25
Slide 25 text
ᶈσΟʔϓϥʔχϯάʹ͓͚Δෆ࣮֬ੑఆྔԽͷϨϏϡʔɻٕज़ɺԠ༻ɺ՝
(ݪจ: A Review of Uncertainty Quantification in Deep Learning:
Techniques, Applications and Challenges)
ෆ࣮֬ੑఆྔԽʢUQʣɺ࠷దԽϓϩηεͱҙࢥܾఆϓϩηεͷ྆ํʹ͓͍ͯɺෆ࣮֬ੑΛ
ݮ͢Δ্ͰۃΊͯॏཁͳׂΛՌͨ͠·͢ɻ͜ΕɺՊֶֶͷͰͷ༷ʑͳ࣮ੈքͰͷΞ
ϓϦέʔγϣϯΛղܾ͢ΔͨΊʹద༻͢Δ͜ͱ͕Ͱ͖·͢ɻϕΠζۙࣅ๏ͱΞϯαϯϒϧֶश๏
ɺจݙͷதͰ࠷͘ΘΕ͍ͯΔUQख๏Ͱ͢ɻ͜Εʹؔ࿈ͯ͠ɺݚڀऀ༷ʑͳUQ๏Λఏ
Ҋ͠ɺίϯϐϡʔλϏδϣϯʢྫɿࣗಈӡసंମݕग़ʣɺը૾ॲཧʢྫɿը૾෮ݩʣɺҩ༻
ը૾ղੳʢྫɿҩ༻ը૾ͷྨηάϝϯςʔγϣϯʣɺࣗવݴޠॲཧʢྫɿςΩετྨɺ
ιʔγϟϧϝσΟΞͷςΩετ࠶൜ϦεΫείΞϦϯάʣɺόΠΦΠϯϑΥϚςΟΫεͳͲͷ
༷ʑͳΞϓϦέʔγϣϯͰͷੑೳΛݕূ͖ͯͨ͠ɻຊݚڀͰɺσΟʔϓϥʔχϯάʹ༻͍ΒΕ
ΔUQ๏ͷ࠷ۙͷਐาΛϨϏϡʔ͢Δɻ͞ΒʹɺڧԽֶश(RL)ʹ͓͚Δ͜ΕΒͷख๏ͷԠ༻ʹ͍ͭ
ͯௐࠪ͢Δɻ࣍ʹɺUQ๏ͷ͍͔ͭ͘ͷॏཁͳԠ༻ྫΛ֓આ͢Δɻ࠷ޙʹɺUQ๏͕໘͍ͯ͠
Δجຊతͳݚڀ՝ʹ؆୯ʹϋΠϥΠτΛͯɺ͜ͷʹ͓͚Δকདྷͷݚڀͷํੑʹ͍ͭͯ
ٞ͢Δɻ
http://arxiv.org/abs/2011.06225v3
IEEE
ˠ*&&&ʹΑΔσΟʔϓϥʔχϯάશൠͷ62๏ͷแׅతϨϏϡʔจ
Slide 26
Slide 26 text
ᶉχϡʔϥϧωοτϫʔΫͷֶशෆมੑ
(ݪจ: Learning Invariances in Neural Networks)
༁ʹର͢ΔෆมੑɺΈࠐΈχϡʔϥϧωοτϫʔΫʹڧྗͳҰൠ
ԽಛੑΛ༩͍͑ͯ·͢ɻ͔͠͠ɺσʔλதʹͲͷΑ͏ͳෆมੑ͕ଘࡏ͢
Δͷ͔ɺ·ͨɺϞσϧ͕༩͑ΒΕͨରশੑ܈ʹରͯ͠ͲͷఔෆมͰ͋
Δ͖ͳͷ͔ΛࣄલʹΔ͜ͱͰ͖ͳ͍͜ͱ͕ଟ͍ɻզʑɺෆมੑ
ͱෆมੑͷΛύϥϝʔλԽ͠ɺωοτϫʔΫύϥϝʔλͱ֦ுύϥ
ϝʔλʹؔͯ͠ಉ࣌ʹֶशଛࣦΛ࠷దԽ͢Δ͜ͱͰɺෆมੑͱෆมੑΛ
ʮߟ͑Δʯʮ֮͑Δʯํ๏Λࣔ͢ɻ͜ͷ؆୯ͳํ๏Ͱɺ܇࿅σʔλ͚ͩ
Ͱɺը૾ྨɺճؼɺηάϝϯςʔγϣϯɺࢠಛੑ༧ଌͷෆมྔͷਖ਼
͍͠ηοτͱൣғΛɺେنͳΦʔάϝϯςʔγϣϯͷۭ͔ؒΒճ෮͢
Δ͜ͱ͕Ͱ͖Δɻ
http://arxiv.org/abs/2010.11882v1
New York University
ˠ"VHVNFOUBUJPOͷൣғΛܾΊΔͨΊͷ
൚༻తͳϑϨʔϜϫʔΫΛ࡞ͬͨ
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
ᶊݱͷػցֶशʹ͓͚Δ৴པੑͷ՝Λఏࣔ͢ΔΞϯμʔεϖγϑΟ
έʔγϣϯ
(ݪจ: Underspecification Presents Challenges for Credibility in
Modern Machine Learning)
MLϞσϧΛ࣮ੈքʹల։͢Δͱɼ͠͠༧ظͤ͵ѱ͍ڍಈΛࣔ͢͜ͱ͕͋Γ·͢ɽզʑɺ͜Ε
Βͷࣦഊͷओͳཧ༝ͱͯ͠ɺ༷ෆΛಛఆ͍ͯ͠ΔɻMLύΠϓϥΠϯɼֶशྖҬʹ͓͍ͯಉ
ͷڧ͍ϗʔϧυΞτੑೳΛ࣋ͭଟ͘ͷ༧ଌมΛฦ͢͜ͱ͕Ͱ͖Δ߹ʹɼ ෆಛఆԽ͞Ε͍ͯ
ΔɽෆಛఆԽɼਂֶशʹجͮ͘MLύΠϓϥΠϯͳͲͰҰൠతͳͷͰ͢ɽෆಛఆԽ͞Εͨύ
ΠϓϥΠϯʹΑͬͯฦ͞ΕΔ༧ଌثɺ͠͠܇࿅ྖҬͷੑೳʹج͍ͮͯಉͷͷͱͯ͠ѻΘΕ
·͕͢ɺզʑɺͦͷΑ͏ͳ༧ଌث͕උྖҬͰඇৗʹҟͳΔৼΔ͍Λ͢Δ͜ͱΛ͜͜Ͱࣔͯ͠
͍·͢ɻ͜ͷᐆດ͞ɺ࣮ࡍʹෆ҆ఆੑϞσϧͷৼΔ͍ͷѱ͞ʹͭͳ͕ΔՄೳੑ͕͋Γɺ܇࿅
ྖҬͱల։ྖҬͷؒͷߏతͳϛεϚον͔Βੜ͡ΔͱҟͳΔނোϞʔυͰ͋Δ͜ͱ͕ࢦఠ͞
Ε͍ͯΔɻզʑɺίϯϐϡʔλϏδϣϯɺҩྍը૾ɺࣗવݴޠॲཧɺిࢠΧϧςʹجͮ͘ྟচϦε
Ϋ༧ଌɺϝσΟΧϧήϊϛΫεͳͲͷྫΛ༻͍ͯɺ͜ͷ͕༷ʑͳ࣮༻తͳMLύΠϓϥΠϯʹݱΕ
͍ͯΔ͜ͱΛࣔͨ͠ɻզʑͷ݁ՌɺͲͷΑ͏ͳυϝΠϯͰ࣮ੈքͰͷల։Λతͱͨ͠ϞσϦϯ
άύΠϓϥΠϯʹ͓͍ͯɺ༷ෆΛ໌ࣔతʹߟྀ͢Δඞཁ͕͋Δ͜ͱΛ͍ࣔͯ͠Δɻ
http://arxiv.org/abs/2011.03395v1
Google
ˠ.-ϞσϧΛ࣮ੈքʹద༻ͯ͠ࠔΔࣄྫհ ҩྍܥଟΊ
ɻ
࣮༻ʹԿΛֶश͍ͯ͠Δ͔ɺԿֶ͕शͰ͖͍ͯͳ͍͔Λ
ཧղֶͯ͠शɾར༻͢Δࣄ͕ͱͯେࣄͱ͍͏ɻ
Slide 29
Slide 29 text
ᶋϦοδϥΠμʔ: ϔγΞϯͷݻ༗ϕΫτϧʹै͏͜ͱͰଟ༷ͳղΛݟ͚ͭΔ
(ݪจ: Ridge Rider: Finding Diverse Solutions by Following
Eigenvectors of the Hessian)
աڈ 10 ؒͰɺ1 ͭͷΞϧΰϦζϜ͕ࢲͨͪͷੜ׆ͷଟ͘ͷ໘Λม͖͑ͯ·ͨ͠ɻଛࣦ͕ؔݮগ
͠ଓ͚Δ࣌ʹ͋ͬͯɺSGD ͱͦͷ͞·͟·ͳࢠଙɺػցֶशʹ͓͚Δ࠷దԽπʔϧͱͯ͠ɺ
σΟʔϓχϡʔϥϧωοτϫʔΫ (DNN) ͷޭͷ伴ΛѲΔॏཁͳཁૉͱͳ͍ͬͯ·͢ɻSGD ʢ؇
͍ԾఆͷԼͰʣہॴ࠷దʹऩଋ͢Δ͜ͱ͕อূ͞Ε͍ͯ·͕͢ɺ ߹ʹΑͬͯɺͲͷہॴ࠷ద͕ݟ
͔͔͕ͭͬͨʹͳΔ͜ͱ͋Γɺ͜Ε͠͠จ຺ʹґଘ͠·͢ɻ͜ͷΑ͏ͳྫͱͯ͠ɺػց
ֶशͰɺܗঢ়ରςΫενϟಛ͔ΒɺΞϯαϯϒϧ๏ θϩγϣοτڠௐ·Ͱɺසൟʹൃੜ͠·
͢ɻ͜ΕΒͷઃఆͰɺʮඪ४తͳʯଛ্ࣦؔͷ SGD ʮ؆୯ͳʯղʹऩଋ͢ΔͨΊɺʮඪ४త
ͳʯଛ্ࣦؔͷ SGD Ͱݟ͚ͭΒΕͳ͍ղ͕ଘࡏ͠·͢ɻ͜ͷจͰɺผͷΞϓϩʔνΛఏҊ
͠·͢ɻہॴతʹᩦཉͳํʹରԠ͢ΔޯΛḷΔͷͰͳ͘ɺϔγΞϯͷݻ༗ϕΫτϧΛḷΓ·
͢ɻඌࠜΛ෮తʹḷͬͨΓɺඌࠜͷؒͰذͨ͠Γ͢Δ͜ͱͰɺଛࣦ໘ΛޮՌతʹԣஅ͠ɺ࣭తʹ
ҟͳΔղΛݟ͚ͭΔ͜ͱ͕Ͱ͖·͢ɻզʑɺϦοδϥΠμʔ(RR)ͱݺΕΔզʑͷख๏͕ɺ༷ʑͳ
ࠔͳʹରͯ͠༗ͳํੑΛఏڙ͢Δ͜ͱΛɺཧతʹ࣮ݧతʹ͍ࣔͯ͠Δɻ
http://arxiv.org/abs/2011.06505v1
University of Oxford / Google Research
ˠ৽͍͠࠷దԽΞϧΰϦζϜ3JEHF3JEFS 33
Λ࡞ͬͨ
ᶄϫΠυωοτϫʔΫͱσΟʔϓωοτϫʔΫಉ͜͡ͱΛֶͿͷ͔ʁ
χϡʔϥϧωοτϫʔΫͷදݱ͕෯ͱਂ͞ʹΑͬͯͲͷΑ͏ʹมԽ͢Δ͔
Λ໌Β͔ʹ͢Δ
(ݪจ: Do Wide and Deep Networks Learn the Same Things?
σΟʔϓɾχϡʔϥϧɾωοτϫʔΫͷޭͷ伴ͱͳΔཁҼɺΞʔΩςΫνϟͷਂ͞
ͱ෯ΛมԽͤͯ͞ੑೳΛ্ͤ͞ΔͨΊʹϞσϧΛεέʔϦϯάͰ͖Δ͜ͱͰ͢ɻ
χϡʔϥϧωοτϫʔΫઃܭͷ͜ͷ୯७ͳಛੑɺ༷ʑͳλεΫʹରͯ͠ඇৗʹޮՌత
ͳΞʔΩςΫνϟΛੜΈग़͖ͯ͠·ͨ͠ɻͦΕʹ͔͔ΘΒͣɺֶश͞Εͨදݱʹର͢
Δਂ͞ͱ෯ͷޮՌʹ͍ͭͯͷཧղݶΒΕ͍ͯΔɻຊจͰɺ͜ͷجຊతͳΛݚ
ڀ͢Δɻ·ͣɺਂ͞ͱ෯ͷมԽ͕ϞσϧͷӅΕදݱʹͲͷΑ͏ͳӨڹΛ༩͑Δ͔Λௐ
Δ͜ͱ͔Β࢝ΊɺΑΓେ͖ͳ༰ྔͷʢ෯͕͍·ͨਂ͍ʣϞσϧͷӅΕදݱʹಛత
ͳϒϩοΫߏΛൃݟ͢Δɻ͜ͷϒϩοΫߏɺϞσϧͷ༰ྔ͕܇࿅ηοτͷαΠζ
ʹରͯ͠େ͖͍߹ʹੜ͡Δ͜ͱΛ࣮ূ͠ɺجૅͱͳΔ͕ͦͷදݱͷࢧతͳओ
Λҡ࣋͠ɺ͍ͯ͠Δ͜ͱΛ͍ࣔͯ͠·͢ɻ͜ͷൃݟɺҟͳΔϞσϧʹΑֶͬͯश
͞ΕΔಛʹॏཁͳӨڹΛ༩͑Δɻ͢ͳΘͪɺϒϩοΫߏͷ֎ଆͷදݱɺ෯ͱਂ͞
͕ҟͳΔΞʔΩςΫνϟؒͰྨࣅ͍ͯ͠Δ͜ͱ͕ଟ͍͕ɺϒϩοΫߏ֤Ϟσϧʹݻ
༗ͷͷͰ͋ΔɻզʑɺҟͳΔϞσϧΞʔΩςΫνϟͷग़ྗ༧ଌΛੳ͠ɺશମతͳ
ਫ਼͕ࣅ͍ͯΔ߹Ͱɺ෯ͷ͍ϞσϧͱԞߦ͖ͷਂ͍ϞσϧͰɺΫϥεؒͰಠಛ
ͷΤϥʔύλʔϯͱมಈ͕ݟΒΕΔ͜ͱΛൃݟͨ͠ɻ
http://arxiv.org/abs/2010.15327v1
SFDFOU
ͱॏෳ
Google Research
Slide 34
Slide 34 text
ᶅϏϡʔϝʔΧʔωοτϫʔΫڭࢣͳ͠දݱֶशͷͨΊͷϏϡʔͷֶश
(ݪจ: Viewmaker Networks: Learning Views for Unsupervised
Representation Learning)
ڭࢣͳ͠දݱֶशͷͨΊͷ࠷ۙͷख๏ͷଟ͘ɺҟͳΔʮϏϡʔʯʢೖྗͷม͞Εͨόʔ
δϣϯʣʹෆมʹͳΔΑ͏ʹϞσϧΛ܇࿅͢Δ͜ͱΛؚΜͰ͍Δɻ͔͠͠ɺ͜ΕΒͷϏϡʔΛ
ઃܭ͢ΔͨΊʹɺ͔ͳΓͷઐࣝͱ࣮ݧ͕ඞཁͰ͋Γɺڭࢣͳ͠දݱֶशͷख๏͕ྖҬ
ϞμϦςΟΛ͑ͯ͘࠾༻͞ΕΔ͜ͱΛ͍͛ͯΔɻ͜ͷΛղܾ͢ΔͨΊʹɺզʑ
ϏϡʔϝʔΧʔωοτϫʔΫΛఏҊ͢Δɻզʑɺ͜ͷωοτϫʔΫΛΤϯίʔμωοτϫʔ
ΫͱڞಉͰ܇࿅͠ɺೖྗʹର͢Δఢରతͳ l p ΏΒ͗Λੜ͢Δɻ͜ͷֶशͨ͠ϏϡʔΛCIFAR-
10ʹద༻͢ΔͱɺSimCLRϞσϧͰ༻͞Ε͍ͯΔΑ͘ݚڀ͞Ε͍ͯΔ֦ுͱಉͷୡਫ਼
ΛಘΔ͜ͱ͕Ͱ͖ΔɻզʑͷϏϡʔɺԻʢઈର9%૿ʣͱΣΞϥϒϧηϯαʔʢઈର
17%૿ʣͷྖҬʹ͓͍ͯɺϕʔεϥΠϯͷΦʔάϝϯςʔγϣϯΛେ෯ʹ্ճΓ·ͨ͠ɻ·
ͨɺϏϡʔϝʔΧʔͷϏϡʔΛख࡞ۀͰ࡞ͨ͠ϏϡʔͱΈ߹ΘͤΔ͜ͱͰɺҰൠతͳը૾
ͷഁଛʹର͢ΔϩόετੑΛ্ͤ͞Δํ๏͍ࣔͯ͠·͢ɻզʑͷํ๏ɺֶश͞Εͨ
Ϗϡʔ͕ڭࢣͳֶ͠शʹඞཁͳઐࣝͱ࿑ྗΛݮ͢Δ༗ͳํ๏Ͱ͋Δ͜ͱΛ࣮ূ͠ɺͦ
ͷརΛΑΓ෯͍ྖҬʹ֦େ͢ΔՄೳੑ͕͋Δ͜ͱΛ͍ࣔͯ͠Δɻ
http://arxiv.org/abs/2010.07432v1
Stanford University
ˠ$POUSBTUJWF-FBSOJOH༻ͷϏϡʔը૾Λઐࣝͳ͠ͰࣗಈͰ࡞ΕΔɺ
ʮ7JFXNBLFSϞσϧʯΛ։ൃͨ͠ΒɺԻηϯαʔͰޮՌతͩͬͨɻ
ᶊϩόετੑͱෆ࣮֬ੑͷఆྔԽͷͨΊͷϋΠύʔύϥϝʔλΞϯαϯϒϧ
(ݪจ: Hyperparameter Ensembles for Robustness and Uncertainty
Quantification)
σΟʔϓΞϯαϯϒϧͱͯ͠ΒΕΔɺҟͳΔϥϯμϜͳॳظԽ͔Β܇࿅͞Εͨχϡʔϥϧ
ωοτϫʔΫͷॏΈΛ͑ΔΞϯαϯϒϧɺ࠷ઌͷਫ਼ͱΩϟϦϒϨʔγϣϯΛ࣮ݱ͠
·͢ɻ࠷ۙಋೖ͞ΕͨόονΞϯαϯϒϧɺΑΓύϥϝʔλޮͷߴ͍υϩοϓΠϯஔ
Λఏڙ͢ΔɻຊจͰɺॏΈ͚ͩͰͳ͘ɺϋΠύʔύϥϝʔλΛ༻͍ͨΞϯαϯϒϧΛઃ
ܭ͠ɺ྆ํͷઃఆͰ࠷ઌͷঢ়ଶΛվળ͢Δɻ༧ࢉʹґଘ͠ͳ͍࠷ߴͷੑೳΛಘΔͨΊʹɺ
զʑϋΠύʔσΟʔϓɾΞϯαϯϒϧΛఏҊ͍ͯ͠Δɻ͜ͷڧྗͳੑೳɺॏΈͱϋΠύʔ
ύϥϝʔλͷଟ༷ੑͷ྆ํΛ࣋ͭϞσϧΛΈ߹ΘͤΔ͜ͱͷརΛ໌Β͔ʹͨ͠ɻ͞Β
ʹɺզʑɺόονΞϯαϯϒϧͱࣗݾௐωοτϫʔΫͷߏΛϕʔεʹͨ͠ɺύϥ
ϝʔλޮͷߴ͍ϋΠύʔόονΞϯαϯϒϧΛఏҊ͢Δɻຊख๏ͷܭࢉίετͱϝϞϦί
ετɺҰൠతͳΞϯαϯϒϧʹൺͯஶ͍͘͠ɻը૾ྨͰɺMLP, LeNet, ResNet 20,
Wide ResNet 28-10ΞʔΩςΫνϟΛ༻͍ͯɺσΟʔϓΞϯαϯϒϧͱόονΞϯαϯϒϧͷ
྆ํΛվળͨ͠ɻ
http://arxiv.org/abs/2006.13570v2
Google Research
ˠΞϯαϯϒϧͷվળ
Slide 53
Slide 53 text
ᶋςΩετྨRNNʹ͓͚Δ౷߹ͷδΦϝτϦ
(ݪจ: The geometry of integration in text classification RNNs)
ϦΧϨϯτɾχϡʔϥϧɾωοτϫʔΫʢRNNʣ͕༷ʑͳλεΫʹ͘Ԡ༻͞Ε͍ͯΔʹ͔͔ΘΒͣɺ
RNN͕ͲͷΑ͏ʹ͜ΕΒͷλεΫΛղܾ͢Δͷ͔ʹ͍ͭͯͷ౷ҰతͳཧղಘΒΕ͍ͯ·ͤΜɻಛʹɺ܇
࿅͞ΕͨRNNʹͲͷΑ͏ͳಈతύλʔϯ͕ੜ͡Δͷ͔ɺ·ͨɺͦΕΒͷύλʔϯ͕܇࿅σʔληοτλ
εΫʹͲͷΑ͏ʹґଘ͢Δͷ͔ෆ໌Ͱ͋ΔɻຊݚڀͰɺಛఆͷࣗવݴޠॲཧλεΫͰ͋ΔςΩετͷ
ྨͱ͍͏จ຺Ͱ͜ΕΒͷʹऔΓΜͰ͍·͢ɻಈతγεςϜղੳͷπʔϧΛ༻͍ͯɺࣗવݴޠͱ߹
ݴޠͷ྆ํͷςΩετྨλεΫͰ܇࿅͞ΕͨϦΧϨϯτωοτϫʔΫΛݚڀ͍ͯ͠·͢ɻ͜ΕΒͷ܇
࿅͞ΕͨRNNͷμΠφϛΫεɺղऍՄೳͰ࣍ݩͰ͋Δ͜ͱ͕Θ͔Γ·ͨ͠ɻ۩ମతʹɺΞʔΩςΫ
νϟσʔληοτͷҧ͍ʹؔΘΒͣɺRNNςΩετΛॲཧ͢Δࡍʹ࣍ݩͷΞτϥΫλʔଟ༷ମΛج
ຊతͳϝΧχζϜͱͯ͠༻ͯ͠ɺ֤ΫϥεͷূڌΛੵ͠·͢ɻ͞ΒʹɺΞτϥΫλଟ༷ମͷ࣍ݩੑͱ
ܗঢ়ɺֶशσʔληοτͷߏʹΑܾͬͯఆ͞ΕΔʀಛʹɺֶशσʔληοτ্Ͱܭࢉ͞Εͨ୯७ͳ୯
ޠ౷ܭ͕ɺ͜ΕΒͷಛੑΛ༧ଌ͢ΔͨΊʹͲͷΑ͏ʹ༻Ͱ͖Δ͔ʹ͍ͭͯड़Δɻզʑͷ؍ଌɺෳ
ͷΞʔΩςΫνϟͱσʔληοτʹ·͕͓ͨͬͯΓɺRNN͕ςΩετྨΛ࣮ߦ͢ΔͨΊʹ࠾༻͍ͯ͠
Δڞ௨ͷϝΧχζϜΛө͍ͯ͠·͢ɻҙࢥܾఆʹ͚ͨূڌͷ౷߹͕ڞ௨ͷܭࢉݪཧͰ͋Δఔʹɺ
ຊݚڀɺಈతγεςϜٕज़Λ༻͍ͯRNNͷ෦ಈ࡞Λݚڀ͢ΔͨΊͷجૅΛங͘ͷͰ͋Δɻ
http://arxiv.org/abs/2010.15114v1
University of Washington / Google
ˠ3//͕Ͳ͏ͬͯλεΫΛղܾ͍ͯ͠Δ͔Λ
ղੳͨ͠ݚڀจ
Slide 54
Slide 54 text
ᶌࣗݾճؼతੜϞσϦϯάͷͨΊͷεέʔϦϯάଇ
(ݪจ: Scaling Laws for Autoregressive Generative
Modeling)
զʑɺੜతը૾ϞσϦϯάɺϏσΦϞσϦϯάɺϚϧνϞʔμϧը૾Ϟσϧɺֶతղܾͷ4ͭͷྖҬʹ͓
͍ͯɺΫϩεΤϯτϩϐʔଛࣦʹର͢ΔܦݧతͳεέʔϦϯάଇΛ໌Β͔ʹͨ͠ɻͯ͢ͷ߹ʹ͓͍ͯɺࣗݾճ
ؼܕτϥϯεϑΥʔϚʔɺϞσϧαΠζͱܭࢉ༧ࢉͷ૿Ճʹͬͯɺύϫʔͷ๏ଇͱҰఆͷεέʔϦϯά๏ଇʹ
ैͬͯɺεϜʔζʹੑೳ্͕͠·͢ɻ࠷దͳϞσϧαΠζ·ͨɺͯ͢ͷσʔλྖҬͰ΄΅ීวతͳࢦΛ࣋ͭ
ྗߦଇʹΑΔܭࢉ༧ࢉʹґଘ͠·͢ɻ ΫϩεΤϯτϩϐʔଛࣦɺใཧతʹɺ$S($True$) +
D_{\mathrm{KL}}}($True$||$Model$)$ͱͯ͠ղऍ͞ΕɺܦݧతͳεέʔϦϯάଇɺਅͷσʔλͷΤϯτϩ
ϐʔͱਅͷͱϞσϧͷؒͷKLൃࢄͷ྆ํΛ༧ଌ͢Δ͜ͱΛ͍ࣔࠦͯ͠·͢ɻ͜ͷղऍͰɺ10ԯύϥϝʔ
λͷTransformerɺYFCC100Mͷը૾Λ$8Times 8$ͷղ૾ʹμϯαϯϓϦϯάͨ͠΄΅શͳϞσϧͰ
͋Γɺଞͷղ૾ʹ͍ͭͯɺnats/imageͷҙͷ༩͑ΒΕͨݮՄೳͳଛࣦ(͢ͳΘͪɺ$D_{mathrm{KL}}}$Λୡ
͢ΔͨΊʹඞཁͳϞσϧαΠζΛ༧ଌ͢Δ͜ͱ͕Ͱ͖Δɻ զʑɺಛఆͷྖҬʹ͓͚Δ͍͔ͭ͘ͷՃͷεέʔ
ϦϯάଇΛൃݟͨ͠ɻ(a) ϚϧνϞʔμϧϞσϧʹ͓͚ΔΩϟϓγϣϯͱը૾ͷؒͷ૬ޓใͷεέʔϦϯάؔΛ
໌Β͔ʹ͠ɼ"Is a picture worth a thousand words? "ͱ͍͏࣭ʹͲͷΑ͏ʹ͑Δ͔Λࣔ͢ɽ(b) ֶతղܾ
ͷ߹ɼֶशΛ͑ͯ֎ૠ͢Δͱ͖ͷϞσϧੑೳͷεέʔϦϯάଇΛ໌Β͔ʹ͢Δɽ͜ΕΒͷ݁Ռɺεέʔ
Ϧϯάଇ͕ԼྲྀͷλεΫΛؚΉχϡʔϥϧωοτϫʔΫͷੑೳʹॏཁͳҙຯΛ࣋ͭ͜ͱΛ͍ࣔͯ͠·͢ɻ
http://arxiv.org/abs/2010.14701v2
Open AI
ˠޮతͳ5SBOTGPSNFSϞσϧαΠζ ύϥϝʔλɾϨΠϠɾਂ͞
ΛΔͨΊʹ
৭ʑͳੜ՝ʹ͍ͭͯɺϞσϧαΠζผͷύϑΥʔϚϯεΛௐͯੳͨ͠ɻ