Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI最新論文読み会2021年5月

 AI最新論文読み会2021年5月

AI最新論文読み会2021年5月【オンライン・Zoom配信】(旧 DL勉強会)
Arxivで直近1ヶ月人気の論文まとめ
https://deeplearning-b.connpass.com/event/209977/
の発表資料です。

M.Inomata

May 12, 2021
Tweet

More Decks by M.Inomata

Other Decks in Research

Transcript

  1. ࣗݾ঺հ ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ) גࣜձࣾ tech vein ୅දऔక໾ ݉

    σϕϩούʔ twitter: @ino2222 IUUQTXXXUFDIWFJODPN
  2. Top10 Recent 1. Ef fi cientNetV2: Smaller Models and Faster

    Training ← PickUp! 2. An Empirical Study of Training Self-Supervised Vision Transformers 3. Cross-validation: what does it estimate and how well does it do it? 4. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds 5. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 6. LocalViT: Bringing Locality to Vision Transformers 7. Keyword Transformer: A Self-Attention Model for Keyword Spotting 8. Multiscale Vision Transformers 9. SiT: Self-supervised vIsion Transformer 10. Self-supervised Video Object Segmentation by Motion Grouping
  3. Top10 Hype 1. Minimum-Distortion Embedding 2. StyleCLIP: Text-Driven Manipulation of

    StyleGAN Imagery 3. RepVGG: Making VGG-style ConvNets Great Again 4. Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities 5. Cross-validation: what does it estimate and how well does it do it? 6. Factors of In fl uence for Transfer Learning across Diverse Appearance Domains and Task Types 7. Why Do Local Methods Solve Nonconvex Problems? 8. Scaling Scaling Laws with Board Games 9. Vision Transformers for Dense Prediction 10. Ef fi cientNetV2: Smaller Models and Faster Training)
  4. Top recent ᶃEf fi cientNetV2ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά (ݪจ: Ef fi cientNetV2: Smaller

    Models and Faster Training) ຊ࿦จͰ͸ɺैདྷͷϞσϧΑΓ΋ߴ଎ͳֶश଎౓ͱ༏Εͨύϥϝʔλޮ཰Λ࣋ͭɺ৽͍͠৞ΈࠐΈωοτϫʔΫͷϑΝϛ ϦʔͰ͋ΔEf fi cientNetV2Λ঺հ͠·͢ɻ͜ͷϞσϧ܈Λ։ൃ͢ΔͨΊʹɺզʑ͸τϨʔχϯάΛߟྀͨ͠χϡʔϥϧɾ ΞʔΩςΫνϟͷ୳ࡧͱεέʔϦϯάͷ૊Έ߹ΘͤΛ༻͍ͯɺτϨʔχϯά଎౓ͱύϥϝʔλޮ཰ΛڞಉͰ࠷దԽͨ͠ɻ ͜ͷϞσϧ͸ɺFused-MBConvͳͲͷ৽͍͠ػೳͰڧԽ͞Εͨ୳ࡧۭ͔ؒΒ୳ࡧ͞Ε·ͨ͠ɻ࣮ݧͷ݁Ռɺ Ef fi cientNetV2Ϟσϧ͸ɺ࠷ઌ୺ͷϞσϧΑΓ΋͸Δ͔ʹߴ଎ʹֶशͰ͖ΔҰํͰɺ࠷ྑͰ6.8ഒখ͍͞αΠζʹͳΔ͜ͱ ͕෼͔Γ·ͨ͠ɻ ֶशதʹը૾αΠζΛஈ֊తʹେ͖͘͢Δ͜ͱͰɺֶशΛ͞Βʹߴ଎Խ͢Δ͜ͱ͕Ͱ͖·͕͢ɺ͠͹ ͠͹ਫ਼౓ͷ௿ԼΛҾ͖ى͜͠·͢ɻ͜ͷਫ਼౓௿ԼΛิ͏ͨΊʹɺυϩοϓΞ΢τ΍σʔλ૿ڧͳͲͷਖ਼ଇԽΛదԠతʹௐ ੔͢Δ͜ͱΛఏҊ͠ɺߴ଎ͳֶशͱྑ޷ͳਫ਼౓ͷཱ྆Λ࣮ݱ͍ͯ͠·͢ɻ ϓϩάϨογϒֶशʹΑΓɺEf fi cientNetV2͸ ImageNet͓ΑͼCIFAR/Cars/Flowersσʔληοτʹ͓͍ͯɺैདྷͷϞσϧΛେ෯ʹ্ճΔ݁ՌΛಘ·ͨ͠ɻಉ͡ ImageNet21kͰࣄલֶशΛߦ͏͜ͱͰɺզʑͷEf fi cientNetV2͸ImageNet ILSVRC2012ʹ͓͍ͯ87.3%ͷτοϓ1ਫ਼౓Λ ୡ੒͠ɺ࠷ۙͷViTΛ2.0%্ճΔਫ਼౓Λୡ੒͠·ͨ͠ɻҰํͰɺಉ͡ܭࢉࢿݯΛ༻͍ͯ5ഒ͔Β11ഒͷ଎౓ͰֶशΛߦ͍ ·ͨ͠ɻίʔυ͸ https://github.com/google/automl/ef fi cientnetv2 Ͱެ։͞Ε·͢ɻ http://arxiv.org/abs/2104.00298v1 Google Research, Brain Team. ˠ& ffi DJFOU/FU7ͷൃදɻ ɹ& ffi DJFOU/FUΛ࣮༻తʹֶशͰ͖ΔΑ͏ʹɺܰྔԽɾߴ଎Խɻ
  5. Ef fi cientNetV1ͷվળ • ֶशը૾αΠζΛॖখ͢Δ޻෉ 
 → Progressive Training •

    ϞσϧͷϘτϧωοΫղফ 
 →Fused-MBConv • Ϟσϧͷεέʔϧ(B0~B7)ͷ࢓ํΛ޻෉ 
 → εέʔϧͷϧʔϧมߋˍը૾ͷ࠷େαΠζΛΑ Γখ͘͞
  6. Ef fi cientNet V2 ॴײ • V2͕͍ܰͱ͍ͬͯ΋Ef fi cientNetB3~B4Ͱֶश Ͱ͖ΔεϖοΫ͸ඞཁɻ

    • ResNet-RS ͱ Ef fi cientNet V2 ͕ ConvNetͷࠓ ޙͷϕʔεϥΠϯʹͳΔʁ • ެࣜͷιʔεެ։ָ͕͠Έɻ
  7. ᶄSelf-Supervised Vision TransformersͷτϨʔχϯάʹؔ͢Δ࣮ূతͳݚڀ (ݪจ: An Empirical Study of Training Self-Supervised

    Vision Transformers) ͜ͷ࿦จͰ͸ɺ৽͍͠ख๏Λઆ໌͢Δ΋ͷͰ͸͋Γ·ͤΜɻͦͷ୅ΘΓʹɺ࠷ۙͷίϯϐϡʔλϏδϣ ϯͷਐาΛߟྀͯ͠ɺ୯७Ͱ઴ਐతͳɺ͔͠͠஌͓͔ͬͯͳ͚Ε͹ͳΒͳ͍ϕʔεϥΠϯɺ͢ͳΘͪϏ δϣϯτϥϯεϑΥʔϚʔʢViTʣͷͨΊͷࣗݾڭࢣ෇ֶ͖शʹ͍ͭͯݚڀ͢Δɻඪ४తͳ৞ΈࠐΈ ωοτϫʔΫͷֶशϨγϐ͸ඇৗʹ੒ख़͍ͯͯ͠ݎ࿚Ͱ͋Δ͕ɺViTͷֶशϨγϐ͸·ͩߏங͞Ε͓ͯ Βͣɺಛʹࣗݾڭࢣ෇͖ͷγφϦΦͰ͸ֶश͕ΑΓࠔ೉ʹͳΔɻຊݚڀͰ͸ɺجຊʹཱͪฦͬͯɺࣗݾ ڭࢣ෇͖ViTΛֶश͢ΔͨΊͷ͍͔ͭ͘ͷجຊతͳίϯϙʔωϯτͷӨڹΛௐࠪ͠·ͨ͠ɻͦͷ݁Ռɺ ෆ҆ఆੑ͸ਫ਼౓Λ௿Լͤ͞Δେ͖ͳ໰୊Ͱ͋ΓɺҰݟ͢Δͱྑ͍݁ՌʹӅ͞Ε͍ͯΔ͜ͱ͕෼͔Γ·͠ ͨɻ͜ΕΒͷ݁Ռ͸͔֬ʹ෦෼తͳࣦഊͰ͋ΓɺֶशΛΑΓ҆ఆͤ͞Ε͹վળͰ͖Δ͜ͱΛ໌Β͔ʹ͠ ͨɻViTͷ݁ՌΛMoCo v3΍ଞͷ͍͔ͭ͘ͷࣗݾ؂ࢹܕϑϨʔϜϫʔΫͰϕϯνϚʔΫͨ͠ͱ͜Ζɺ ༷ʑͳ໘ͰΞϒϨʔγϣϯ͕ൃੜ͠·ͨ͠ɻݱࡏͷϙδςΟϒͳূڌ͚ͩͰͳ͘ɺ՝୊΍ΦʔϓϯΫΤ ενϣϯʹ͍ͭͯ΋ٞ࿦͢Δɻ͜ͷݚڀ͕ɺকདྷͷݚڀʹ໾ཱͭσʔλϙΠϯτͱܦݧΛఏڙ͢Δ͜ͱ Λظ଴͍ͯ͠·͢ɻ http://arxiv.org/abs/2104.02057v2 Facebook AI Research(FAIR) ˠ7J5ͷϋΠύʔύϥϝʔλݚڀɻ7J5ϕʔεͷ.P$PWϑϨʔϜϫʔΫ Λ࡞ͬͯɺੑೳʹӨڹΛ༩͑Δ7J5ͷύϥϝʔλΛൺֱܭଌͨ͠ɻ
  8. ᶅΫϩεόϦσʔγϣϯɿԿΛਪఆ͢Δͷ͔ɺͲͷఔ౓ͷޮՌ͕͋Δͷ͔ʁ (ݪจ: Cross-validation: what does it estimate and how well

    does it do it?) ΫϩεόϦσʔγϣϯ͸ɼ༧ଌޡࠩΛਪఆ͢ΔͨΊʹ޿͘༻͍ΒΕ͍ͯΔख๏Ͱ͋Δ͕ɼͦͷڍಈ͸ෳࡶ Ͱ͋Γɼ׬શʹ͸ཧղ͞Ε͍ͯͳ͍ɽཧ૝తʹ͸ɼΫϩεόϦσʔγϣϯ๏͸ɼֶशσʔλʹద߹ͨ͠Ϟ σϧͷ༧ଌޡࠩΛਪఆ͢Δͱߟ͍͑ͨɽզʑ͸ɺ͜Ε͕௨ৗͷ࠷খೋ৐๏ʹΑΔઢܗϞσϧͷ৔߹Ͱ͸ͳ ͘ɺಉ͡฼ूஂ͔Βநग़͞Εͨଞͷݟͨ͜ͱͷͳ͍܇࿅ηοτʹద߹ͨ͠Ϟσϧͷฏۉ༧ଌޡࠩΛਪఆ͢ Δ͜ͱΛূ໌͢Δɻ͞Βʹɺ͜ͷݱ৅͸ɺσʔλ෼ׂɺϒʔτετϥοϓɺMallow's CpͳͲɺ༧ଌޡࠩ ͷҰൠతͳਪఆ஋Ͱ΋ى͜Δ͜ͱΛࣔͨ͠ɻ࣍ʹɼΫϩεόϦσʔγϣϯ͔ΒಘΒΕΔ༧ଌޡࠩͷඪ४త ͳ৴པ۠ؒ͸ɼ๬·͍͠ϨϕϧΛ͸Δ͔ʹԼճΔΧόϨοδΛ࣋ͭ৔߹͕͋Γ·͢ɽ͜Ε͸ɼ֤σʔλϙ Πϯτ͕τϨʔχϯάͱςετͷ྆ํʹ࢖༻͞ΕΔͨΊɼ֤ϑΥʔϧυͷଌఆਫ਼౓ʹ૬͕ؔ͋Γɼ௨ৗͷ ෼ࢄͷਪఆ஋͕খ͗͢͞ΔͨΊͰ͢ɽ͜ͷ෼ࢄΛΑΓਖ਼֬ʹਪఆ͢ΔͨΊʹɼωετͨ͠ΫϩεόϦσʔ γϣϯ๏Λಋೖ͠ɼ͜ͷमਖ਼ʹΑΓɼैདྷͷΫϩεόϦσʔγϣϯ๏Ͱ͸ࣦഊ͢ΔΑ͏ͳଟ͘ͷྫͰɼ΄ ΅ਖ਼͍͠ΧόϨοδΛ͕࣋ͭ۠ؒಘΒΕΔ͜ͱΛܦݧతʹࣔͨ͠ɽ࠷ޙʹɺզʑͷ෼ੳͰ͸ɺ୯७ͳσʔ λ෼ׂͰ༧ଌਫ਼౓ͷ৴པ۠ؒΛ࡞੒͢Δ৔߹ɺ৴པ͕۠ؒແޮʹͳΔͨΊɺ݁߹͞ΕͨσʔλʹϞσϧΛ ࠶ద߹ͤ͞Δ΂͖Ͱ͸ͳ͍͜ͱ΋͍ࣔͯ͠Δɻ http://arxiv.org/abs/2104.00673v2 ΧϦϑΥϧχΞେֶόʔΫϨʔߍˍελϯϑΥʔυେֶ ˠΫϩεόϦσʔγϣϯͷվળɻ୯७ͳΫϩεόϦσʔγϣϯͷ༧ଌਫ਼౓͸໊໨ΑΓ ௿͘ͳͬͯ͠·͏໰୊͕͋ͬͨɻωεςουΫϩεόϦσʔγϣϯ /$7 Λಋೖͨ͠ Β͜ͷ໰୊͕վળͨ͠ɻ
  9. ᶆGANcraft:Minecraftϫʔϧυͷڭࢣͳ͠ͷ3DχϡʔϥϧϨϯμϦϯά (ݪจ: GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds)

    GANcraft͸ɺMinecraftͷΑ͏ͳେن໛ͳ3DϒϩοΫੈքͷϑΥτϦΞϦεςΟοΫͳը૾Λੜ੒͢Δ ͨΊͷɺڭࢣͳ͠ͷχϡʔϥϧϨϯμϦϯάϑϨʔϜϫʔΫͰ͢ɻ͜ͷख๏Ͱ͸ɺηϚϯςΟοΫϒ ϩοΫϫʔϧυΛೖྗͱ͠ɺ֤ϒϩοΫʹ౔ɺ૲ɺਫͳͲͷηϚϯςΟοΫϥϕϧΛ෇༩͠·͢ɻຊख ๏Ͱ͸ɼੈքΛ࿈ଓతͳମੵؔ਺ͱͯ͠දݱ͠ɼϢʔβ͕ૢ࡞͢ΔΧϝϥʹରͯ͠Ұ؏ੑͷ͋ΔϑΥτ ϦΞϦεςΟοΫͳը૾ΛϨϯμϦϯά͢ΔΑ͏ʹϞσϧΛֶश͠·͢ɽϒϩοΫੈքͷϖΞͱͳΔά ϥϯυτΡϧʔεͷ࣮ը૾͕ͳ͍৔߹ɺٙࣅάϥϯυτΡϧʔεͱఢରతֶशʹجֶ͍ͮͨशٕज़Λߟ Ҋ͠·ͨ͠ɻ͜Ε͸ɺϏϡʔ߹੒ͷͨΊͷχϡʔϥϧϨϯμϦϯάʹؔ͢Δઌߦݚڀͱ͸ରরతͰ͢ɻ χϡʔϥϧϨϯμϦϯάͰ͸ɺγʔϯͷδΦϝτϦ΍Ϗϡʔʹґଘ͢ΔΞϐΞϥϯεΛਪఆ͢ΔͨΊ ʹɺάϥ΢ϯυτΡϧʔεը૾͕ඞཁͱͳΓ·͢ɻGANcraftͰ͸ɺΧϝϥͷيಓʹՃ͑ͯɺγʔϯͷη ϚϯςΟΫεͱग़ྗελΠϧͷ྆ํΛϢʔβʔ੍͕ޚͰ͖·͢ɻڧྗͳϕʔεϥΠϯͱൺֱ࣮ͨ͠ݧ݁ Ռ͸ɺϑΥτϦΞϦεςΟοΫͳ3DϒϩοΫϫʔϧυ߹੒ͱ͍͏৽͍͠λεΫʹର͢ΔGANcraftͷ༗ ޮੑΛ͍ࣔͯ͠·͢ɻ͜ͷϓϩδΣΫτͷ΢ΣϒαΠτ͸ɺhttps://nvlabs.github.io/GANcraft/ ɻ http://arxiv.org/abs/2104.07659v1 NVIDIA ˠϚΠϯΫϥϑτͷ̏%ߏ଄σʔλ͔Β ෩ܠࣸਅΛੜ੒͢Δ("/DSBGUΛެ։ͨ͠
  10. ᶇStyleCLIP: StyleGANը૾ͷςΩετʹΑΔૢ࡞ํ๏ (ݪจ: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery) StyleGAN͕༷ʑͳ෼໺ͰඇৗʹϦΞϧͳը૾Λੜ੒Ͱ͖Δ͜ͱʹ৮ൃ͞Εɺ࠷ۙͰ͸ɺੜ੒͞Εͨը૾

    ΍࣮෺ͷը૾Λૢ࡞͢ΔͨΊʹStyleGANͷજࡏۭؒΛͲͷΑ͏ʹ࢖༻͢Δ͔Λཧղ͢Δ͜ͱʹଟ͘ͷݚ ڀ͕ूத͍ͯ͠Δɻ͔͠͠ɺҙຯͷ͋Δજࡏతͳૢ࡞Λൃݟ͢ΔͨΊʹ͸ɺଟ͘ͷࣗ༝౓Λਓ͕ؒ୮೦ʹ ௐ΂ͨΓɺ໨తͷૢ࡞͝ͱʹը૾ΛूΊͯ஫ऍΛ෇͚ͨΓ͢Δඞཁ͕͋Γ·͢ɻຊݚڀͰ͸ɺ࠷ۙಋೖ͞ ΕͨCLIPʢContrastive Language-Image Pre-trainingʣϞσϧΛ׆༻͢Δ͜ͱͰɺ͜ͷΑ͏ͳख࡞ۀΛඞ ཁͱ͠ͳ͍StyleGANը૾ૢ࡞ͷͨΊͷςΩετϕʔεͷΠϯλʔϑΣʔεΛ։ൃ͢Δ͜ͱΛݕ౼͢Δɻ ·ͣɺCLIPϕʔεͷଛࣦΛར༻ͯ͠ɺϢʔβʔ͕ఏڙ͢ΔςΩετϓϩϯϓτʹԠͯ͡ೖྗજࡏϕΫτϧ Λमਖ਼͢Δ࠷దԽεΩʔϜΛ঺հ͠·͢ɻ࣍ʹɺ༩͑ΒΕͨೖྗը૾ʹର͢ΔςΩετ༠ಋͷજࡏతૢ࡞ εςοϓΛਪ࿦͢ΔજࡏతϚούʔʹ͍ͭͯઆ໌͠ɺΑΓߴ଎Ͱ҆ఆͨ͠ςΩετϕʔεͷૢ࡞ΛՄೳʹ ͠·͢ɻ࠷ޙʹɺςΩετϓϩϯϓτΛStyleGANͷελΠϧۭؒʹ͓͚Δೖྗʹґଘ͠ͳ͍ࢦࣔʹϚο ϐϯά͢Δํ๏Λࣔ͠ɺςΩετʹΑΔΠϯλϥΫςΟϒͳը૾ૢ࡞ΛՄೳʹ͢Δɻ޿ൣͳ݁Ռͱൺֱʹ ΑΓɺզʑͷΞϓϩʔνͷ༗ޮੑ͕࣮ূ͞Εͨɻ http://arxiv.org/abs/2103.17249v1 ϔϒϥΠେֶɾςϧΞϏϒେֶɾAdobe Research ˠ$-*1ͱ4UZMF("/Λ૊Έ߹Θͤͯɺ ςΩετͰը૾ૢ࡞Ͱ͖Δ4UZMF$-*1Λ࡞ͬͨɻ
  11. ᶈLocalViTɿϏδϣϯτϥϯεϑΥʔϚʔʹ஍ҬੑΛ࣋ͨͤΔ (ݪจ: LocalViT: Bringing Locality to Vision Transformers) ຊݚڀͰ͸ɺϏδϣϯม׵ثʹҐஔ৘ใϝΧχζϜΛಋೖ͢Δํ๏Λݚڀ͢ΔɻมܗثωοτϫʔΫ͸ػց຋༁ʹ ༝དྷ͢Δ΋ͷͰɺಛʹ௕͍γʔέϯε಺ͷ௕ڑ཭ґଘؔ܎ΛϞσϧԽ͢Δͷʹద͍ͯ͠·͢ɻτʔΫϯΤϯϕσΟ

    ϯάؒͷάϩʔόϧͳ૬ޓ࡞༻͸ɺτϥϯεϑΥʔϚͷࣗݾ஫ҙϝΧχζϜʹΑͬͯ͏·͘ϞσϧԽͰ͖·͕͢ɺ ϩʔΧϧͳྖҬ಺Ͱͷ৘ใަ׵ͷͨΊͷϩʔΧϦςΟϝΧχζϜ͕ෆ଍͍ͯ͠·͢ɻ͔͠͠ɺը૾ʹͱͬͯہॴੑ ͸ɺઢɺΤοδɺܗঢ়ɺ͞Βʹ͸෺ମͳͲͷߏ଄ʹؔ܎͢ΔͨΊɺෆՄܽͰ͢ɻ ຊݚڀͰ͸ɺϑΟʔυϑΥϫʔυ ωοτϫʔΫʹਂ͞ํ޲ͷ৞ΈࠐΈΛಋೖ͢Δ͜ͱͰɺࢹ֮ม׵૷ஔʹہॴੑΛ࣋ͨͤ·͢ɻ͜ͷҰݟ୯७ͳղܾ ࡦ͸ɺϑΟʔυϑΥϫʔυωοτϫʔΫͱ൓సͨ͠࢒ࠩϒϩοΫͱͷൺֱ͔Βண૝Λಘ͍ͯ·͢ɻہॴੑϝΧχζ Ϝͷॏཁੑ͸ɺ2ͭͷํ๏Ͱݕূ͞ΕΔɻ1) ہॴੑϝΧχζϜΛ૊ΈࠐΉͨΊʹ͸ɼ෯޿͍ઃܭ্ͷબ୒ࢶʢ׆ੑ Խؔ਺ɼ૚ͷ഑ஔɼ֦େ཰ʣ͕͋Γɼ͢΂ͯͷద੾ͳબ୒͕ϕʔεϥΠϯΑΓ΋ੑೳ޲্ʹͭͳ͕Δ͜ͱɼ2) ಉ͡ ہॴੑϝΧχζϜΛ4ͭͷࢹ֮ม׵ثʹద༻͢Δ͜ͱʹ੒ޭ͠ɼہॴੑίϯηϓτͷҰൠԽΛࣔͨ͜͠ͱͰ͢ɽಛ ʹɼImageNet2012ͷ෼ྨͰ͸ɼύϥϝʔλ਺ͱܭࢉྔͷ૿ՃΛແࢹͯ͠ɼDeiT-TͱPVT-T͕ϕʔεϥΠϯΛ2.6%ͱ 3.1%্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɽίʔυ͸ɼURL{https://github.com/ofsoundof/LocalViT}Ͱެ։͞Ε͍ͯ·͢ɽ http://arxiv.org/abs/2104.05707v1 νϡʔϦοώ޻Պେֶ, ϧʔϰΣϯɾΧτϦοΫେֶ ˠ7J5ͷվྑɻը૾಺ͷ࠲ඪ৘ใΛ΋ͨͤΔ޻෉Λ 7J5ʹ௥Ճͨ͠Βੑೳ͕͕͋ͬͨ
  12. ᶉΩʔϫʔυτϥϯεϑΥʔϚʔΩʔϫʔυεϙοςΟϯάͷͨΊͷࣗݾݴٴϞ σϧ (ݪจ: Keyword Transformer: A Self-Attention Model for Keyword

    Spotting) TransformerͷΞʔΩςΫνϟ͸ɺࣗવݴޠॲཧɺίϯϐϡʔλϏδϣϯɺԻ੠ೝࣝͳͲɺ͞· ͟·ͳྖҬͰ੒ޭΛऩΊ͍ͯ·͢ɻΩʔϫʔυɾεϙοςΟϯάͰ͸ɺओʹ৞ΈࠐΈΤϯίʔ μʔ΍ϦΧϨϯτɾΤϯίʔμʔͷ্ʹࣗݾٵண͕࢖༻͞Ε͖ͯ·ͨ͠ɻຊݚڀͰ͸ɺ TransformerΞʔΩςΫνϟΛΩʔϫʔυɾεϙοςΟϯάʹదԠͤ͞ΔͨΊͷ༷ʑͳํ๏Λௐ ࠪ͠ɺࣄલֶश΍௥Ճσʔλͳ͠ʹෳ਺ͷλεΫͰ࠷ઌ୺ͷੑೳΛ্ճΔ׬શͳࣗݾ஫ҙܕ ΞʔΩςΫνϟͰ͋ΔΩʔϫʔυɾτϥϯεϑΥʔϚʔʢKWTʣΛ঺հ͠·͢ɻڻ͘΂͖͜ͱ ʹɺ͜ͷγϯϓϧͳΞʔΩςΫνϟ͸ɺ৞ΈࠐΈ૚ɺϦΧϨϯτ૚ɺؾ഑Γ૚Λࠞࡏͤͨ͞Α ΓෳࡶͳϞσϧΑΓ΋༏Ε͍ͯ·͢ɻKWT͸ɺ͜ΕΒͷϞσϧͷ୅ସͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ ͖ɺGoogle Speech Commandsσʔληοτʹ͓͍ͯɺ12ݸͷίϚϯυλεΫͰ98.6%ɺ 35ݸͷίϚϯυλεΫͰ97.7%ͷਫ਼౓ͱ͍͏2ͭͷ৽͍͠ϕϯνϚʔΫه࿥Λୡ੒͠·ͨ͠ɻ http://arxiv.org/abs/2104.00769v2 Arm ML Reserch Lab, ϧϯυେֶ ˠ5SBOTGPSNFSͰԻ੠ೝࣝɻεϚʔτεϐʔΧʔ౳ͰΑ͋͘ΔԻ੠͔Βͷ ΩʔϫʔυೝࣝλεΫΛɺ5SBOTGPSNFSΛ࢖ͬͨΒߴਫ਼౓ʹୡ੒Ͱ͖ͨɻ
  13. ᶊϚϧνεέʔϧϏδϣϯτϥϯεϑΥʔϚʔ (ݪจ: Multiscale Vision Transformers) զʑ͸ɺϚϧνεέʔϧಛ௃֊૚ͷਫ਼៛ͳΞΠσΞΛτϥϯεϑΥʔϚʔϞσϧͱ݁ͼ͚ͭΔ͜ͱʹΑ ΓɺϏσΦ͓Αͼը૾ೝࣝͷͨΊͷϚϧνεέʔϧɾϏδϣϯɾτϥϯεϑΥʔϚʔʢMViTʣΛൃද ͢ΔɻϚϧνεέʔϧɾτϥϯεϑΥʔϚʔ͸ɼෳ਺ͷνϟϯωϧղ૾౓εέʔϧͷஈ֊Λ࣋ͭɽೖྗ ղ૾౓ͱখ͞ͳνϟωϧ࣍ݩ͔Βελʔτ֤ͨ͠εςʔδ͸ɼۭؒղ૾౓ΛԼ͛ͳ͕Βνϟωϧ༰ྔΛ ֊૚తʹ֦େ͍͖ͯ͠·͢ɽ͜ΕʹΑΓɺ୯७ͳ௿Ϩϕϧͷࢹ֮৘ใΛϞσϧԽ͢ΔͨΊʹߴ͍ۭؒղ

    ૾౓Ͱಈ࡞͢Δॳظͷ૚ͱɺۭؒతʹૈ͍͕ෳࡶͳߴ࣍ݩͷಛ௃Λ࣋ͭਂ͍૚͔ΒͳΔɺಛ௃ͷϚϧν εέʔϧɾϐϥϛου͕ܗ੒͞ΕΔɻզʑ͸ɺ༷ʑͳϏσΦೝࣝλεΫʹ͓͍ͯɺࢹ֮৴߸ͷ៛ີͳੑ ࣭ΛϞσϧԽ͢ΔͨΊͷ͜ͷجຊతͳΞʔΩςΫνϟ༏ઌ౓ΛධՁͨ͠ͱ͜Ζɺେن໛ͳ֎෦ࣄલֶश ʹґଘ͠ɺܭࢉ΍ύϥϝʔλʹ͓͍ͯ5ʙ10ഒͷίετ͕͔͔Δطଘͷࢹ֮ม׵ثΑΓ΋༏Ε͍ͯ·͠ ͨɻ͞Βʹɺ࣌ؒతͳ࣍ݩΛऔΓআ͖ɺը૾෼ྨʹ͜ͷϞσϧΛద༻ͨ͠ͱ͜Ζɺઌߦ͢Δࢹ֮ม׵૷ ஔΛ্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɻίʔυ͸ https://github.com/facebookresearch/SlowFast ͔Βೖख ՄೳͰ͢ɻ http://arxiv.org/abs/2104.11227v1 Facebook AI Research,ΧϦϑΥϧχΞେֶόʔΫϨʔߍ ˠ7J5ͷվྑɻτϥϯεϑΥʔϚʔͷۭؒɾνϟωϧʹ͍ͭͯϚϧνεέʔϧಛ ௃ͷ֊૚ԽΛͨ͠ΒγϯάϧεέʔϧτϥϯεϑΥʔϚʔΑΓੑೳ͕޲্ͨ͠ɻ طଘϥΠϒϥϦ1Z4MPX'BTUʹಉػೳΛ௥Ճͨ͠ɻ
  14. ᶋSiT: Self-supervised vIsion Transformer (ݪจ: SiT: Self-supervised vIsion Transformer) ࣗݾڭࢣ෇ֶ͖श๏͸ɺۙ೥ɺڭࢣ෇ֶ͖शͱͷࠩΛॖΊΔ͜ͱʹ੒ޭͨ͜͠ͱ͔ΒɺίϯϐϡʔλϏδϣϯͷ෼໺Ͱ·

    ͢·͢஫໨ΛूΊ͍ͯ·͢ɻࣗવݴޠॲཧʢNLPʣͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱม׵ث͸͢Ͱʹબ୒͞Ε͍ͯΔख๏Ͱ͢ɻ ࠷ۙͷจݙʹΑΔͱɺτϥϯεϑΥʔϚʔ͸ίϯϐϡʔλϏδϣϯͰ΋ਓؾ͕ߴ·͍ͬͯΔΑ͏Ͱ͢ɻ͜Ε·Ͱͷͱ͜ ΖɺϏδϣϯม׵ث͸ɺେن໛ͳڭࢣ෇͖σʔλΛ༻͍ͯࣄલֶशΛߦ͏͔ɺڭࢣωοτϫʔΫͳͲͷԿΒ͔ͷڞಉڭࢣ Λ༻͍ͯࣄલֶशΛߦ͏ͱɺ͏·͘ػೳ͢Δ͜ͱ͕ࣔ͞Ε͍ͯΔɻ͜ΕΒͷڭࢣ෇͖ࣄલֶश͞Εͨࢹ֮ม׵ث͸ɺ࠷খ ݶͷมߋͰԼྲྀͷλεΫͰඇৗʹྑ͍݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻຊݚڀͰ͸ɺը૾/ࢹ֮ม׵ثΛࣄલֶश͠ɺԼྲྀͷ෼ ྨλεΫʹ࢖༻͢ΔͨΊͷࣗݾڭࢣ෇ֶ͖शͷϝϦοτΛௐࠪ͢Δɻզʑ͸Self-supervised vIsion Transformers (SiT)Λ ఏҊ͠ɺϓϨςΩετϞσϧΛಘΔͨΊͷ͍͔ͭ͘ͷࣗݾڭࢣ෇ֶ͖शϝΧχζϜʹ͍ͭͯٞ࿦͢ΔɻSiTͷΞʔΩςΫ νϟͷॊೈੑʹΑΓɺΦʔτΤϯίʔμʔͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ͖ɺෳ਺ͷࣗݾڭࢣ෇͖λεΫΛγʔϜϨεʹѻ͏͜ ͱ͕Ͱ͖Δɻզʑ͸ɺ਺ඦສຕͷը૾Ͱ͸ͳ͘਺ઍຕͷը૾Ͱߏ੒͞ΕΔখن໛ͳσʔληοτʹ͓͍ͯɺࣄલʹֶश͠ ͨSiTΛԼྲྀͷ෼ྨλεΫͷͨΊʹඍௐ੔Ͱ͖Δ͜ͱΛࣔ͢ɻఏҊ͞ΕͨΞϓϩʔν͸ɼҰൠతͳϓϩτίϧΛ༻͍ͨඪ ४తͳσʔληοτͰධՁ͞Εͨɽͦͷ݁Ռɺม׵ثͷڧ͞ͱɺࣗݾڭࢣ෇ֶ͖श΁ͷదੑ͕࣮ূ͞Εͨɻզʑ͸ɺطଘ ͷࣗݾڭࢣ෇ֶ͖श๏ΛେࠩͰ྇կͨ͠ɻ·ͨɺSiT͕਺γϣοτͷֶशʹద͍ͯ͠Δ͜ͱΛ֬ೝ͠ɺ͞ΒʹɺSiT͔Βֶ शͨ͠ಛ௃ྔͷ্ʹઢܗ෼ྨثΛֶश͢Δ͚ͩͰɺ༗༻ͳදݱΛֶश͍ͯ͠Δ͜ͱΛࣔ͠·ͨ͠ɻϓϨτϨʔχϯάɺ ϑΝΠϯνϡʔχϯάɺ͓ΑͼධՁίʔυ͸ɺhttps://github.com/Sara-Ahmed/SiTɻ http://arxiv.org/abs/2104.03602v1 IEEE ˠ*&&&ʹΑΔࣗݾڭࢣֶ͖ͭश5SBOTGPSNFS4J5ͷൃදɻ
  15. ᶌϞʔγϣϯɾάϧʔϐϯάʹΑΔࣗݾڭࢣ෇͖ө૾ΦϒδΣΫτɾηάϝϯ ςʔγϣϯ (ݪจ: Self-supervised Video Object Segmentation by Motion Grouping)

    ಈ෺͸ӡಈΛཧղ͢ΔͨΊʹߴػೳͳࢹ֮γεςϜΛਐԽͤ͞ɺෳࡶͳ؀ڥԼͰ΋஌֮Λॿ͚͍ͯΔɻຊ࿦จͰ ͸ɺϞʔγϣϯΩϡʔΛར༻ͯ͠෺ମΛ෼ׂ͢Δ͜ͱ͕Ͱ͖ΔίϯϐϡʔλϏδϣϯγεςϜɺ͢ͳΘͪϞʔ γϣϯηάϝϯςʔγϣϯͷ։ൃʹऔΓ૊ΜͰ͍·͢ɻຊ࿦จͰ͸ɺ࣍ͷΑ͏ͳߩݙΛ͍ͯ͠·͢ɻୈҰʹɺ Transformerͷ؆୯ͳվྑ൛Λಋೖ͠ɺΦϓςΟΧϧϑϩʔϑϨʔϜΛओཁͳΦϒδΣΫτͱഎܠʹ෼ׂ͠·͢ɻ ୈೋʹɺ͜ͷΞʔΩςΫνϟΛɺखಈͷΞϊςʔγϣϯΛ࢖༻ͤͣʹɺࣗݾڭࢣ෇͖Ͱֶश͠·͢ɻୈ3ʹɺզʑ ͷख๏ͷॏཁͳίϯϙʔωϯτΛ෼ੳ͠ɺͦͷඞཁੑΛݕূ͢ΔͨΊʹపఈతͳΞϒϨʔγϣϯݚڀΛߦ͍· ͢ɻୈ4ʹɼఏҊͨ͠ΞʔΩςΫνϟΛύϒϦοΫϕϯνϚʔΫʢDAVIS2016ɼSegTrackv2ɼFBMS59ʣͰධՁ ͢ΔɽΦϓςΟΧϧϑϩʔͷΈΛೖྗͱ͍ͯ͠Δʹ΋͔͔ΘΒͣɼզʑͷΞϓϩʔν͸ɼ͜Ε·Ͱͷ࠷ઌ୺ͷࣗ ݾڭࢣ෇͖ख๏ͱൺֱͯ͠ɼ༏Εͨɼ͋Δ͍͸ಉ౳ͷ݁ՌΛୡ੒͢Δͱͱ΋ʹɼܻҧ͍ʹߴ଎Ͱ͋Δ͜ͱ͕Θ ͔ͬͨɽ͞Βʹɺ೉қ౓ͷߴ͍ΧϞϑϥʔδϡσʔληοτʢMoCAʣΛ༻͍ͯධՁͨ͠ͱ͜Ζɺଞͷࣗݾڭࢣ ෇͖ΞϓϩʔνΛେ෯ʹ্ճΓɺτοϓͷڭࢣ෇͖Ξϓϩʔνͱͷൺֱ΋ྑ޷ͰɺϞʔγϣϯΩϡʔͷॏཁੑ ͱɺطଘͷϏσΦηάϝϯςʔγϣϯϞσϧʹ͓͚Δࢹ֮తͳ֎؍΁ͷજࡏతͳภΓ͕ڧௐ͞Ε·ͨ͠ɻ http://arxiv.org/abs/2104.07658v1 ΦοΫεϑΥʔυେֶ ˠಈըΛର৅ʹͨࣗ͠ݾڭࢣ͋Γֶशͷݚڀɻಈը಺ͷ෺ମͷಈ͖͚ͩΛ ώϯτʹࣗݾڭࢣ͋ΓֶशͰϏσΦσʔλ͔Β෺ମηάϝϯςʔγϣϯΛ ߦͬͨΒɺٖଶಈ෺ͷಈըͰߴ଎ɾߴਫ਼౓Ͱݕग़Ͱ͖ͨɻ
  16. ᶃ ࠷খݶͷ࿪ΈΛར༻ͨ͠ΤϯϕοσΟϯά (ݪจ: Minimum-Distortion Embedding) ͜͜Ͱ͸ɼϕΫτϧຒΊࠐΈ໰୊Λߟ͑Δɽ༗ݶݸͷΞΠςϜͷू߹͕༩͑ΒΕɼ֤ΞΠςϜʹ୅දతͳϕΫτϧΛׂ Γ౰ͯΔ͜ͱ͕໨తͰ͋Δɽ͍͔ͭ͘ͷΞΠςϜͷϖΞ͕ྨࣅ͓ͯ͠Γɺ೚ҙʹ͍͔ͭ͘ͷଞͷϖΞ͕ඇྨࣅͰ͋Δ͜ ͱΛࣔ͢σʔλ͕༩͑ΒΕ·͢ɻྨࣅͨ͠ΞΠςϜͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ʹ͋Δ͜ͱ͕๬·Εɼ ඇྨࣅͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ͳ͍͜ͱ͕๬·Ε·͢ʢϢʔΫϦουڑ཭Ͱଌఆ͞Ε·͢ʣɽզʑ ͸ɺΞΠςϜͷ͍͔ͭ͘ͷϖΞʹ͍ͭͯఆٛ͞Εͨ࿪Έؔ਺Λಋೖ͢Δ͜ͱʹΑͬͯɺ͜ΕΛެࣜԽ͠·͢ɻզʑͷ໨

    త͸ɺ੍໿৚݅ͷ΋ͱͰɺશମͷ࿪ΈΛ࠷খʹ͢ΔຒΊࠐΈΛબͿ͜ͱͰ͋Δɻ͜ΕΛɺ࠷খ࿪ΈຒΊࠐΈʢMDEʣ໰ ୊ͱݺͿɻ MDEͷϑϨʔϜϫʔΫ͸୯७Ͱ͕͢ɺҰൠతͰ͢ɻMDEʹ͸ɺεϖΫτϧຒΊࠐΈɺओ੒෼෼ੳɺଟ࣍ݩ εέʔϦϯάɺIsomap΍UMAPͷΑ͏ͳ࣍ݩ࡟ݮ๏ɺྗ೚ͤͷϨΠΞ΢τͳͲɺ͞·͟·ͳຒΊࠐΈํ๏ؚ͕·Ε͍ͯ ·͢ɻ·ͨɺ৽͍͠ຒΊࠐΈ๏΋ؚ·Ε͓ͯΓɺྺ࢙తͳຒΊࠐΈ๏ͱ৽͍͠ຒΊࠐΈ๏Λಉ༷ʹݕূ͢Δݪཧతͳํ ๏Λఏڙ͍ͯ͠·͢ɻ MDEͷ໰୊Λۙࣅతʹղܾ͠ɺେن໛ͳσʔληοτʹରԠ͢Δ౤Өܕ४χϡʔτϯ๏Λ։ൃ ͠·ͨ͠ɻ͜ͷख๏͸ɺΦʔϓϯιʔεͷPythonύοέʔδͰ͋ΔPyMDEʹ࣮૷͞Ε͍ͯ·͢ɻPyMDEͰ͸ɺϢʔβ͸ ࿪Έؔ਺ͱ੍໿ͷϥΠϒϥϦ͔Βબ୒ͨ͠ΓɺΧελϜͷ΋ͷΛࢦఆͨ͠Γ͢Δ͜ͱ͕Ͱ͖ɺ༷ʑͳຒΊࠐΈΛ؆୯ʹ ࢼ͢͜ͱ͕Ͱ͖·͢ɻ͜ͷιϑτ΢ΣΞ͸ɺ਺ඦສͷΞΠςϜͱ਺ઍສͷ࿪Έؔ਺Λ࣋ͭσʔληοτʹରԠ͍ͯ͠· ͢ɻզʑͷख๏Λ࣮ূ͢ΔͨΊʹɼը૾ɼֶज़తͳڞஶऀωοτϫʔΫɼถࠃͷ܊ͷਓޱ౷ܭσʔλɼ୯Ұࡉ๔ͷ mRNAτϥϯεΫϦϓτʔϜͳͲɼ͍͔ͭ͘ͷ࣮ੈքͷσʔληοτͷຒΊࠐΈΛܭࢉͨ͠ɽ http://arxiv.org/abs/2103.02559v2 ελϯϑΥʔυେֶ ˠ$POUSBTUJWF-FBSOJOH౳ʹؔ࿈͢Δجૅݚڀɻू߹಺ͷཁૉ͝ͱͷྨࣅ౓ΛϕΫτϧͰදݱ͢ΔΞϧΰϦζϜ ͷҰൠԽɾެࣜԽͱɺϑϨʔϜϫʔΫ1Z.%&ͷ঺հ
  17. ᶅRepVGG: VGGελΠϧͷConvNetsΛ࠶ͼૉ੖Β͘͢͠Δ (ݪจ: RepVGG: Making VGG-style ConvNets Great Again) ৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷγϯϓϧͰڧྗͳΞʔΩςΫνϟΛఏҊ͢Δɻ͜ͷΞʔ

    ΩςΫνϟ͸ɺ3x3৞ΈࠐΈͱReLUͷελοΫ͚ͩͰߏ੒͞ΕͨVGGͷΑ͏ͳਪ࿦࣌ͷϘ σΟΛ࣋ͪɺτϨʔχϯά࣌ͷϞσϧ͸ଟࢬͷτϙϩδʔΛ࣋ͭɻ͜ͷΑ͏ͳֶश࣌ͱਪ࿦ ࣌ͷΞʔΩςΫνϟͷ੾Γ཭͠͸ɺߏ଄తͳ࠶ύϥϝʔλԽٕज़ʹΑ࣮ͬͯݱ͞Ε͓ͯΓɺ ͜ͷϞσϧ͸RepVGGͱ໊෇͚ΒΕ͍ͯ·͢ɻImageNetʹ͓͍ͯɺRepVGG͸80%Ҏ্ͷ τοϓ1ਫ਼౓Λୡ੒͓ͯ͠Γɺ͜Ε͸զʑͷ஌ΔݶΓɺϓϨʔϯϞσϧͱͯ͠͸ॳΊͯͷ͜ͱ Ͱ͢ɻNVIDIA 1080Ti GPU্Ͱ͸ɺRepVGGϞσϧ͸ɺResNet-50ΑΓ΋83ˋɺResNet-101 ΑΓ΋101ˋߴ଎ʹಈ࡞͠ɺߴਫ਼౓ͰɺEf fi cientNet΍RegNetͳͲͷ࠷ઌ୺Ϟσϧͱൺֱ͠ ͯɺྑ޷ͳਫ਼౓-଎౓τϨʔυΦϑΛ͍ࣔͯ͠·͢ɻ ίʔυͱֶशࡁΈϞσϧ͸ɺhttps:// github.com/megvii-model/RepVGGɻ http://arxiv.org/abs/2101.03697v3 ਗ਼՚େֶ, Megvii, ߳ߓՊٕେֶ ˠ7((ͷվྑɻࠓͲ͖ͷϞσϧ͸'-01ίετ͕ߴ͗ͨ͢ΓඞཁҎ্ʹෳ ࡶͳͷͰɺ7((Λվྑͯ͠࠷ઌ୺ϞσϧฒΈͷੑೳʹͯ͠Έͨɻ
  18. ᶆੜ෺ֶͱҩֶʹ͓͚ΔωοτϫʔΫͷͨΊͷදݱֶशɻ ਐาɺ௅ઓɺͦͯ͠ػձ (ݪจ: Representation Learning for Networks in Biology and

    Medicine: Advancements, Challenges, and Opportunities) දݱֶश͕ڧྗͳ༧ଌͱσʔλͷಎ࡯Λఏڙ͢Δ͜ͱʹ੒ޭͨ͜͠ͱͰɺදݱֶशٕज़͸ ωοτϫʔΫͷϞσϦϯάɺ෼ੳɺֶश΁ͱٸ଎ʹ֦େ͍ͯ͠·͢ɻੜ෺ҩֶωοτϫʔΫ ͸ɺλϯύΫ࣭ͷ૬ޓ࡞༻͔Β࣬පωοτϫʔΫɺ͞Βʹ͸ҩྍγεςϜ΍Պֶత஌ࣝʹࢸ Δ·Ͱɺ૬ޓ࡞༻͢ΔཁૉͷγεςϜΛද͢ීวతͳهड़Ͱ͋Δɻ͜ͷϨϏϡʔͰ͸ɺωο τϫʔΫੜ෺ֶͱҩֶͷ௕೥ʹΘͨΔݪଇ͕ɺػցֶशͷݚڀͰ͸ޠΒΕͳ͍͜ͱ͕ଟ͍ ͕ɺදݱֶशͷ֓೦తͳج൫Λఏڙ͠ɺݱࡏͷ੒ޭͱݶքΛઆ໌͠ɺকདྷͷਐาʹ໾ཱͯΔ ͜ͱ͕Ͱ͖Δͱ͍͏ݟղΛ͍ࣔͯ͠Δɻຊ࿦จͰ͸ɺωοτϫʔΫΛίϯύΫτͳϕΫτϧ ۭؒʹຒΊࠐΉͨΊʹҐ૬తͳಛ௃Λར༻͢Δ͜ͱΛ֩ͱͨ͠ɺ͞·͟·ͳΞϧΰϦζϜͷ ΞϓϩʔνΛ·ͱΊ͍ͯΔɻ·ͨɺΞϧΰϦζϜͷֵ৽͔Β࠷΋ԸܙΛड͚ΔՄೳੑͷߴ͍ ੜ෺ҩֶ෼໺ͷ෼ྨ๏Λఏڙ͠·͢ɻදݱֶशٕज़͸ɺෳࡶͳܗ࣭ͷࠜఈʹ͋ΔҼՌؔ܎Λ ಛఆͨ͠Γɺ୯Ұࡉ๔ͷߦಈͱ݈߁΁ͷӨڹΛ෼཭ͨ͠Γɺ҆શͰޮՌతͳҩༀ඼ͰපؾΛ ਍அɾ࣏ྍͨ͠Γ͢ΔͨΊʹෆՄܽͳ΋ͷͱͳ͍ͬͯΔɻ http://arxiv.org/abs/2104.04883v1 ϋʔόʔυେֶҩֶେֶӃ ˠੜ෺ֶͱҩֶ෼໺Ͱͷάϥϑදݱֶशʹ͍ͭͯͷϨϏϡʔ
  19. ᶈଟ༷ͳΞϐΞϥϯευϝΠϯͱλεΫλΠϓؒͷτϥϯεϑΝʔϥʔχϯάʹ ӨڹΛ༩͑ΔཁҼ (ݪจ: Factors of In fl uence for Transfer

    Learning across Diverse Appearance Domains and Task Types) సҠֶशͱ͸ɺݩͱͳΔλεΫͰֶशͨ͠஌ࣝΛɺର৅ͱͳΔλεΫͷֶशʹ࠶ར༻͢Δ͜ͱͰ͢ɻ ILSVRCσʔληοτΛ༻͍ͯը૾෼ྨϞσϧΛࣄલʹֶश͠ɺͦͷޙɺ೚ҙͷλʔήοτλεΫͰඍௐ ੔Λߦ͏ͱ͍ͬͨ୯७ͳܗͷసҠֶश͸ɺݱࡏͷ࠷ઌ୺ͷίϯϐϡʔλϏδϣϯϞσϧͰ͸ҰൠతʹߦΘ Ε͍ͯΔɻ͔͠͠ɺ͜Ε·Ͱͷ఻ୡֶशʹؔ͢Δମܥతͳݚڀ͸ݶΒΕ͓ͯΓɺ఻ୡֶश͕ͲͷΑ͏ͳঢ় گͰػೳ͢Δ͜ͱ͕ظ଴͞ΕΔͷ͔ɺे෼ʹཧղ͞Ε͍ͯͳ͍ɻຊ࿦จͰ͸ɺඇৗʹҟͳΔը૾υϝΠϯ ʢফඅऀͷࣸਅɺࣗ཯૸ߦɺߤۭࣸਅɺਫதɺ԰಺γʔϯɺ߹੒ɺΫϩʔζΞοϓʣͱλεΫλΠϓʢη ϚϯςΟοΫηάϝϯςʔγϣϯɺΦϒδΣΫτݕग़ɺਂ౓ਪఆɺΩʔϙΠϯτݕग़ʣΛର৅ʹɺసҠֶ शͷ޿ൣͳ࣮ݧతௐࠪΛ࣮ࢪ͠·ͨ͠ɻॏཁͳͷ͸ɺ͜ΕΒͷλεΫ͸͢΂ͯɺݱ୅ͷίϯϐϡʔλϏ δϣϯΞϓϦέʔγϣϯʹؔ࿈͢ΔɺෳࡶͰߏ଄Խ͞Εͨग़ྗλεΫͰ͋Δͱ͍͏͜ͱͰ͢ɻ߹ܭͰ 1200Ҏ্ͷసૹ࣮ݧΛߦ͍·ͨ͠ɻͦͷதʹ͸ɺιʔεͱλʔήοτ͕ҟͳΔը૾υϝΠϯɺλεΫλ Πϓɺ·ͨ͸ͦͷ྆ํ͔Βߏ੒͞Ε͍ͯΔ΋ͷ΋ଟؚ͘·Ε͍ͯ·͢ɻ͜ΕΒͷ࣮ݧΛମܥతʹ෼ੳ͠ɺ ը૾υϝΠϯɺλεΫλΠϓɺσʔληοτͷαΠζ͕఻ୡֶशͷύϑΥʔϚϯεʹ༩͑ΔӨڹΛཧղ͠ ·͢ɻ͜ͷݚڀʹΑΓɺ͍͔ͭ͘ͷಎ࡯͕ಘΒΕɺ࣮຿ऀ΁ͷ۩ମతͳఏҊʹͭͳ͕Γ·ͨ͠ɻ http://arxiv.org/abs/2103.13318v1 Google Research ˠը૾υϝΠϯͷసҠֶशͷௐࠪ࿦จ
  20. ᶉͳͥہॴ๏Ͱඇತ໰୊͕ղ͚Δͷ͔ʁ (ݪจ: Why Do Local Methods Solve Nonconvex Problems?) ݱ୅ͷػցֶशͰ͸ɺඇತ࠷దԽ͕͍ͨΔͱ͜ΖͰߦΘΕ͍ͯ·͢ɻݚڀऀ͸

    ඇತͷ໨తؔ਺ΛߟҊ͠ɺہॴతͳܗঢ়Λར༻ͯ͠൓෮తʹߋ৽͢Δ֬཰తޯ ഑߱Լ๏΍ͦͷѥछͳͲͷࢢൢͷΦϓςΟϚΠβʔΛ༻͍ͯ࠷దԽ͠·͢ɻඇ ತؔ਺ͷղ๏͸࠷ѱͷ৔߹NPϋʔυͰ͋Δʹ΋͔͔ΘΒͣɺ࣮ࡍʹ͸࠷దԽͷ ࣭͸໰୊ʹͳΒͳ͍͜ͱ͕ଟ͍ɻΦϓςΟϚΠβʔ͸ۙࣅతʹάϩʔόϧϛχ ϚϜΛݟ͚ͭΔͱߟ͑ΒΕ͍ͯΔ͔Βͩɻݚڀऀͨͪ͸ɺ͜ͷڵຯਂ͍ݱ৅Λ ౷Ұతʹઆ໌͢ΔԾઆΛཱͯ·ͨ͠ɻͦΕ͸ɺ࣮ࡍʹ࢖༻͞Ε͍ͯΔ໨తͷ΄ ͱΜͲͷϩʔΧϧϛχϚϜ͕ɺۙࣅతͳάϩʔόϧϛχϚϜͰ͋Δͱ͍͏΋ͷ Ͱ͢ɻຊݚڀͰ͸ɺ͜ͷԾઆΛػցֶश໰୊ͷ۩ମతͳࣄྫʹରͯ͠ݫີʹܗ ࣜԽ͍ͯ͠·͢ɻ http://arxiv.org/abs/2103.13462v1 ελϯϑΥʔυେֶ ˠ0QUJNJ[FSΛ࢖ͬͯͳֶͥश͕࠷దԽͰ͖͍ͯΔͷ͔ͷݚڀ
  21. ᶊϘʔυήʔϜʹΑΔεέʔϦϯάͷ๏ଇ (ݪจ: Scaling Scaling Laws with Board Games) ػցֶशͷେن໛ͳ࣮ݧʹ͸ɺҰ෦ͷػؔΛআ͍ͯɺ༧ࢉΛ͸Δ͔ʹ௒͑ΔϦ ιʔε͕ඞཁʹͳΓ·͢ɻ޾͍ͳ͜ͱʹɺ͜ͷΑ͏ͳେن໛ͳ࣮ݧͷ݁Ռ͸ɺ

    ͸Δ͔ʹখن໛Ͱ҆ՁͳҰ࿈ͷ࣮ݧͷ݁Ռ͔ΒਪఆͰ͖Δ৔߹͕ଟ͍͜ͱ͕࠷ ۙ໌Β͔ʹͳΓ·ͨ͠ɻຊݚڀͰ͸ɺϞσϧͷେ͖͚ͩ͞Ͱͳ͘ɺ໰୊ͷେ͖ ͞ʹ΋ج͍ͮͯਪఆͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻAlphaZeroͱHexΛ࢖ͬͯҰ࿈ ͷ࣮ݧΛߦ͏͜ͱͰɺҰఆͷܭࢉྔͰୡ੒Ͱ͖Δੑೳ͕ɺήʔϜͷن໛͕େ͖ ͘ͳͬͯ೉͘͠ͳΔʹͭΕͯ༧ଌՄೳʹ௿Լ͢Δ͜ͱΛࣔ͠·ͨ͠ɻ·ͨɺओ ͳ݁Ռͱͯ͠ɺΤʔδΣϯτ͕ར༻ՄೳͳςετλΠϜͱτϨʔχϯάλΠϜ ͷܭࢉྔ͸ɺੑೳΛҡ࣋͠ͳ͕ΒτϨʔυΦϑͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻ http://arxiv.org/abs/2104.03113v2 Andy Jones (ϩϯυϯ) ˠݱ࣮ࣾձʹػցֶशΛద༻͢Δ࣌ͷίετݟੵ΋Γʹ໾ཱͭݚڀɻ"MQIB;FSP ͱ)FYΛྫʹͯ͠ϘʔυήʔϜͷ"*ΞϧΰϦζϜͷݚڀΛͨ͠ɻٻΊΔੑೳɾ໰ ୊ͷେ͖͞ʹΑͬͯɺֶशίετɾܭࢉίετ͕Ͳ͏มΘΔ͔Λ·ͱΊͨɻ
  22. ᶋDense PredictionΛՄೳʹ͢ΔϏδϣϯτϥϯεϑΥʔϚʔ (ݪจ: Vision Transformers for Dense Prediction) ີͳ༧ଌλεΫͷόοΫϘʔϯͱͯ͠ɺ৞ΈࠐΈωοτϫʔΫͷ୅ΘΓʹࢹ֮ม׵ثΛ׆༻͢ΔΞʔΩςΫ νϟͰ͋Δʮີͳࢹ֮ม׵ثʯΛ঺հ͠·͢ɻࢹ֮ม׵ثͷ༷ʑͳஈ֊ͰಘΒΕͨτʔΫϯΛ༷ʑͳղ૾౓ͷ

    ը૾ͷΑ͏ͳදݱʹ૊Έཱͯɺ৞ΈࠐΈσίʔμΛ༻͍ͯϑϧղ૾౓ͷ༧ଌʹஈ֊తʹ݁߹͠·͢ɻม׵ثͷ όοΫϘʔϯ͸ɺҰఆͷൺֱతߴ͍ղ૾౓ͰදݱΛॲཧ͠ɺ͢΂ͯͷஈ֊Ͱάϩʔόϧͳड༰໺Λ͍࣋ͬͯ· ͢ɻ͜ΕΒͷಛੑʹΑΓɺ͜ͷߴີ౓Ϗδϣϯม׵ث͸ɺ׬શͳ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɺΑΓ͖Ί ࡉ͔͘ɺΑΓάϩʔόϧʹҰ؏ͨ͠༧ଌΛߦ͏͜ͱ͕Ͱ͖·͢ɻզʑͷ࣮ݧʹΑΔͱɺ͜ͷΞʔΩςΫνϟ ͸ɺಛʹେྔͷֶशσʔλ͕ར༻Մೳͳ৔߹ɺີͳ༧ଌλεΫʹ͓͍ͯେ෯ͳվળΛ΋ͨΒ͢ɻ୯؟ͷਂ౓ਪ ఆͰ͸ɼ࠷ઌ୺ͷ׬શ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɼ૬ରతͳੑೳ͕࠷େͰ28%޲্ͨ͜͠ͱ͕֬ೝ͞Ε ͨɽ·ͨɺηϚϯςΟοΫηάϝϯςʔγϣϯʹద༻ͨ͠ͱ͜Ζɺີ౓ͷߴ͍Ϗδϣϯม׵͸ɺADE20Kʹ͓ ͍ͯ49.02%ͷmIoUΛୡ੒͠ɺ৽ͨͳٕज़ਫ४Λཱ֬͠·ͨ͠ɻ͞ΒʹɺNYUv2ɺKITTIɺPascal ContextͳͲ ͷখن໛ͳσʔληοτʹ͓͍ͯ΋ɺΞʔΩςΫνϟͷඍௐ੔͕ՄೳͰ͋Δ͜ͱΛ͓ࣔͯ͠Γɺ͜͜Ͱ΋৽ͨ ͳٕज़ਫ४Λཱ͍֬ͯ͠·͢ɻզʑͷϞσϧ͸ɺhttps://github.com/intel-isl/DPTɻ http://arxiv.org/abs/2103.13413v1 Intel Labs ˠ7J5ͰηϚϯςΟοΫηάϝϯςʔγϣϯɻ ྫͱͯ͠ɺ୯؟ࣸਅͷਂ౓ਪఆɾηάϝϯςʔγϣϯͰ޷݁Ռʹͳͬͨɻ