Slide 1

Slide 1 text

AI࠷৽࿦จಡΈձ2021೥5݄ ᷂tech vein ழມ ॆԝ

Slide 2

Slide 2 text

ࣗݾ঺հ ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ) גࣜձࣾ tech vein ୅දऔక໾ ݉ σϕϩούʔ twitter: @ino2222 IUUQTXXXUFDIWFJODPN

Slide 3

Slide 3 text

Facebook άϧʔϓͷ঺հ IUUQTXXXGBDFCPPLDPNHSPVQT

Slide 4

Slide 4 text

ΞδΣϯμ Archive Sanity (arxiv-sanity.com) ͔ΒϐοΫΞο ϓͨ͠ɺarxiv.org ͷաڈ1ϲ݄ؒͷ࿦จ঺հɻ ɾҰ൪ؾʹͳͬͨ࿦จͷ঺հ ɾtop recentͷ࿦จτοϓ10 Ϧετ ɾtop hype ͷ࿦จτοϓ10 Ϧετ

Slide 5

Slide 5 text

Archive Sanity? https://www.arxiv-sanity.com/top

Slide 6

Slide 6 text

໨࣍

Slide 7

Slide 7 text

Top10 Recent 1. Ef fi cientNetV2: Smaller Models and Faster Training ← PickUp! 2. An Empirical Study of Training Self-Supervised Vision Transformers 3. Cross-validation: what does it estimate and how well does it do it? 4. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds 5. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 6. LocalViT: Bringing Locality to Vision Transformers 7. Keyword Transformer: A Self-Attention Model for Keyword Spotting 8. Multiscale Vision Transformers 9. SiT: Self-supervised vIsion Transformer 10. Self-supervised Video Object Segmentation by Motion Grouping

Slide 8

Slide 8 text

Top10 Hype 1. Minimum-Distortion Embedding 2. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 3. RepVGG: Making VGG-style ConvNets Great Again 4. Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities 5. Cross-validation: what does it estimate and how well does it do it? 6. Factors of In fl uence for Transfer Learning across Diverse Appearance Domains and Task Types 7. Why Do Local Methods Solve Nonconvex Problems? 8. Scaling Scaling Laws with Board Games 9. Vision Transformers for Dense Prediction 10. Ef fi cientNetV2: Smaller Models and Faster Training)

Slide 9

Slide 9 text

Pickup࿦จ

Slide 10

Slide 10 text

Top recent ᶃEf fi cientNetV2ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά (ݪจ: Ef fi cientNetV2: Smaller Models and Faster Training) ຊ࿦จͰ͸ɺैདྷͷϞσϧΑΓ΋ߴ଎ͳֶश଎౓ͱ༏Εͨύϥϝʔλޮ཰Λ࣋ͭɺ৽͍͠৞ΈࠐΈωοτϫʔΫͷϑΝϛ ϦʔͰ͋ΔEf fi cientNetV2Λ঺հ͠·͢ɻ͜ͷϞσϧ܈Λ։ൃ͢ΔͨΊʹɺզʑ͸τϨʔχϯάΛߟྀͨ͠χϡʔϥϧɾ ΞʔΩςΫνϟͷ୳ࡧͱεέʔϦϯάͷ૊Έ߹ΘͤΛ༻͍ͯɺτϨʔχϯά଎౓ͱύϥϝʔλޮ཰ΛڞಉͰ࠷దԽͨ͠ɻ ͜ͷϞσϧ͸ɺFused-MBConvͳͲͷ৽͍͠ػೳͰڧԽ͞Εͨ୳ࡧۭ͔ؒΒ୳ࡧ͞Ε·ͨ͠ɻ࣮ݧͷ݁Ռɺ Ef fi cientNetV2Ϟσϧ͸ɺ࠷ઌ୺ͷϞσϧΑΓ΋͸Δ͔ʹߴ଎ʹֶशͰ͖ΔҰํͰɺ࠷ྑͰ6.8ഒখ͍͞αΠζʹͳΔ͜ͱ ͕෼͔Γ·ͨ͠ɻ ֶशதʹը૾αΠζΛஈ֊తʹେ͖͘͢Δ͜ͱͰɺֶशΛ͞Βʹߴ଎Խ͢Δ͜ͱ͕Ͱ͖·͕͢ɺ͠͹ ͠͹ਫ਼౓ͷ௿ԼΛҾ͖ى͜͠·͢ɻ͜ͷਫ਼౓௿ԼΛิ͏ͨΊʹɺυϩοϓΞ΢τ΍σʔλ૿ڧͳͲͷਖ਼ଇԽΛదԠతʹௐ ੔͢Δ͜ͱΛఏҊ͠ɺߴ଎ͳֶशͱྑ޷ͳਫ਼౓ͷཱ྆Λ࣮ݱ͍ͯ͠·͢ɻ ϓϩάϨογϒֶशʹΑΓɺEf fi cientNetV2͸ ImageNet͓ΑͼCIFAR/Cars/Flowersσʔληοτʹ͓͍ͯɺैདྷͷϞσϧΛେ෯ʹ্ճΔ݁ՌΛಘ·ͨ͠ɻಉ͡ ImageNet21kͰࣄલֶशΛߦ͏͜ͱͰɺզʑͷEf fi cientNetV2͸ImageNet ILSVRC2012ʹ͓͍ͯ87.3%ͷτοϓ1ਫ਼౓Λ ୡ੒͠ɺ࠷ۙͷViTΛ2.0%্ճΔਫ਼౓Λୡ੒͠·ͨ͠ɻҰํͰɺಉ͡ܭࢉࢿݯΛ༻͍ͯ5ഒ͔Β11ഒͷ଎౓ͰֶशΛߦ͍ ·ͨ͠ɻίʔυ͸ https://github.com/google/automl/ef fi cientnetv2 Ͱެ։͞Ε·͢ɻ http://arxiv.org/abs/2104.00298v1 Google Research, Brain Team. ˠ& ffi DJFOU/FU7ͷൃදɻ ɹ& ffi DJFOU/FUΛ࣮༻తʹֶशͰ͖ΔΑ͏ʹɺܰྔԽɾߴ଎Խɻ

Slide 11

Slide 11 text

ֶशͷߴ଎ԽɾϞσϧαΠζͷܰྔԽ

Slide 12

Slide 12 text

෮श: ResNet-RS (2021.3) by Google Brain ֶशख๏ͱεέʔϧΞοϓख๏Λվળͨ͠ResNet IUUQTBSYJWPSHBCT

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Ef fi cientNetV1ͷվળ • ֶशը૾αΠζΛॖখ͢Δ޻෉ 
 → Progressive Training • ϞσϧͷϘτϧωοΫղফ 
 →Fused-MBConv • Ϟσϧͷεέʔϧ(B0~B7)ͷ࢓ํΛ޻෉ 
 → εέʔϧͷϧʔϧมߋˍը૾ͷ࠷େαΠζΛΑ Γখ͘͞

Slide 15

Slide 15 text

Progressive Training খ͍͞ը૾͔Βॱʹֶशͯ͠ޮ཰Խ +AugumentationϨϕϧͰ޻෉ͯ͠ਫ਼౓௿ԼΛ๷͙

Slide 16

Slide 16 text

MBConv → Fused-MBConv Ϟσϧͷং൫ͷdepthwise-conv͕ϘτϧωοΫͩͬͨͷͰɺ Ұ෦(stage1-5)Λ conv ʹஔ͖׵͑ͨɻ

Slide 17

Slide 17 text

Ϟσϧͷεέʔϧͷ࢓ํͷ޻෉ • inferenceͷը૾αΠζΛ࠷େ480·Ͱʹ੍ݶ • Ϟσϧͷޙ൒ͷεςʔδͷ૚͕ΑΓखް͘ʹͳΔΑ͏ʹάϥ σʔγϣϯʹεέʔϧ͢Δ

Slide 18

Slide 18 text

Ef fi cientNet V2 ॴײ • V2͕͍ܰͱ͍ͬͯ΋Ef fi cientNetB3~B4Ͱֶश Ͱ͖ΔεϖοΫ͸ඞཁɻ • ResNet-RS ͱ Ef fi cientNet V2 ͕ ConvNetͷࠓ ޙͷϕʔεϥΠϯʹͳΔʁ • ެࣜͷιʔεެ։ָ͕͠Έɻ

Slide 19

Slide 19 text

Top recent: Best10

Slide 20

Slide 20 text

ᶃEf fi cientNetV2ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά (ݪจ: Ef fi cientNetV2: Smaller Models and Faster Training) pickup

Slide 21

Slide 21 text

ᶄSelf-Supervised Vision TransformersͷτϨʔχϯάʹؔ͢Δ࣮ূతͳݚڀ (ݪจ: An Empirical Study of Training Self-Supervised Vision Transformers) ͜ͷ࿦จͰ͸ɺ৽͍͠ख๏Λઆ໌͢Δ΋ͷͰ͸͋Γ·ͤΜɻͦͷ୅ΘΓʹɺ࠷ۙͷίϯϐϡʔλϏδϣ ϯͷਐาΛߟྀͯ͠ɺ୯७Ͱ઴ਐతͳɺ͔͠͠஌͓͔ͬͯͳ͚Ε͹ͳΒͳ͍ϕʔεϥΠϯɺ͢ͳΘͪϏ δϣϯτϥϯεϑΥʔϚʔʢViTʣͷͨΊͷࣗݾڭࢣ෇ֶ͖शʹ͍ͭͯݚڀ͢Δɻඪ४తͳ৞ΈࠐΈ ωοτϫʔΫͷֶशϨγϐ͸ඇৗʹ੒ख़͍ͯͯ͠ݎ࿚Ͱ͋Δ͕ɺViTͷֶशϨγϐ͸·ͩߏங͞Ε͓ͯ Βͣɺಛʹࣗݾڭࢣ෇͖ͷγφϦΦͰ͸ֶश͕ΑΓࠔ೉ʹͳΔɻຊݚڀͰ͸ɺجຊʹཱͪฦͬͯɺࣗݾ ڭࢣ෇͖ViTΛֶश͢ΔͨΊͷ͍͔ͭ͘ͷجຊతͳίϯϙʔωϯτͷӨڹΛௐࠪ͠·ͨ͠ɻͦͷ݁Ռɺ ෆ҆ఆੑ͸ਫ਼౓Λ௿Լͤ͞Δେ͖ͳ໰୊Ͱ͋ΓɺҰݟ͢Δͱྑ͍݁ՌʹӅ͞Ε͍ͯΔ͜ͱ͕෼͔Γ·͠ ͨɻ͜ΕΒͷ݁Ռ͸͔֬ʹ෦෼తͳࣦഊͰ͋ΓɺֶशΛΑΓ҆ఆͤ͞Ε͹վળͰ͖Δ͜ͱΛ໌Β͔ʹ͠ ͨɻViTͷ݁ՌΛMoCo v3΍ଞͷ͍͔ͭ͘ͷࣗݾ؂ࢹܕϑϨʔϜϫʔΫͰϕϯνϚʔΫͨ͠ͱ͜Ζɺ ༷ʑͳ໘ͰΞϒϨʔγϣϯ͕ൃੜ͠·ͨ͠ɻݱࡏͷϙδςΟϒͳূڌ͚ͩͰͳ͘ɺ՝୊΍ΦʔϓϯΫΤ ενϣϯʹ͍ͭͯ΋ٞ࿦͢Δɻ͜ͷݚڀ͕ɺকདྷͷݚڀʹ໾ཱͭσʔλϙΠϯτͱܦݧΛఏڙ͢Δ͜ͱ Λظ଴͍ͯ͠·͢ɻ http://arxiv.org/abs/2104.02057v2 Facebook AI Research(FAIR) ˠ7J5ͷϋΠύʔύϥϝʔλݚڀɻ7J5ϕʔεͷ.P$PWϑϨʔϜϫʔΫ Λ࡞ͬͯɺੑೳʹӨڹΛ༩͑Δ7J5ͷύϥϝʔλΛൺֱܭଌͨ͠ɻ

Slide 22

Slide 22 text

όοναΠζɾֶश཰ɾoptimizerΛม͑ͭͭ (ີͳ)kNNϞχλͰܭଌͨ͠ ઃఆ͕ա৒ͩͱEJQ ٸͳམͪࠐΈ ͕ݱΕͯɺ ੑೳ͕Լ͕Δɻ ઃఆ͕ෆ଍ͩͱEJQ͸ग़ͳֶ͍͕शෆ଍ʹͳΔɻ L//Λݟͭͭόϥϯεௐ੔͢Δ͜ͱ͕େࣄɻ

Slide 23

Slide 23 text

ᶅΫϩεόϦσʔγϣϯɿԿΛਪఆ͢Δͷ͔ɺͲͷఔ౓ͷޮՌ͕͋Δͷ͔ʁ (ݪจ: Cross-validation: what does it estimate and how well does it do it?) ΫϩεόϦσʔγϣϯ͸ɼ༧ଌޡࠩΛਪఆ͢ΔͨΊʹ޿͘༻͍ΒΕ͍ͯΔख๏Ͱ͋Δ͕ɼͦͷڍಈ͸ෳࡶ Ͱ͋Γɼ׬શʹ͸ཧղ͞Ε͍ͯͳ͍ɽཧ૝తʹ͸ɼΫϩεόϦσʔγϣϯ๏͸ɼֶशσʔλʹద߹ͨ͠Ϟ σϧͷ༧ଌޡࠩΛਪఆ͢Δͱߟ͍͑ͨɽզʑ͸ɺ͜Ε͕௨ৗͷ࠷খೋ৐๏ʹΑΔઢܗϞσϧͷ৔߹Ͱ͸ͳ ͘ɺಉ͡฼ूஂ͔Βநग़͞Εͨଞͷݟͨ͜ͱͷͳ͍܇࿅ηοτʹద߹ͨ͠Ϟσϧͷฏۉ༧ଌޡࠩΛਪఆ͢ Δ͜ͱΛূ໌͢Δɻ͞Βʹɺ͜ͷݱ৅͸ɺσʔλ෼ׂɺϒʔτετϥοϓɺMallow's CpͳͲɺ༧ଌޡࠩ ͷҰൠతͳਪఆ஋Ͱ΋ى͜Δ͜ͱΛࣔͨ͠ɻ࣍ʹɼΫϩεόϦσʔγϣϯ͔ΒಘΒΕΔ༧ଌޡࠩͷඪ४త ͳ৴པ۠ؒ͸ɼ๬·͍͠ϨϕϧΛ͸Δ͔ʹԼճΔΧόϨοδΛ࣋ͭ৔߹͕͋Γ·͢ɽ͜Ε͸ɼ֤σʔλϙ Πϯτ͕τϨʔχϯάͱςετͷ྆ํʹ࢖༻͞ΕΔͨΊɼ֤ϑΥʔϧυͷଌఆਫ਼౓ʹ૬͕ؔ͋Γɼ௨ৗͷ ෼ࢄͷਪఆ஋͕খ͗͢͞ΔͨΊͰ͢ɽ͜ͷ෼ࢄΛΑΓਖ਼֬ʹਪఆ͢ΔͨΊʹɼωετͨ͠ΫϩεόϦσʔ γϣϯ๏Λಋೖ͠ɼ͜ͷमਖ਼ʹΑΓɼैདྷͷΫϩεόϦσʔγϣϯ๏Ͱ͸ࣦഊ͢ΔΑ͏ͳଟ͘ͷྫͰɼ΄ ΅ਖ਼͍͠ΧόϨοδΛ͕࣋ͭ۠ؒಘΒΕΔ͜ͱΛܦݧతʹࣔͨ͠ɽ࠷ޙʹɺզʑͷ෼ੳͰ͸ɺ୯७ͳσʔ λ෼ׂͰ༧ଌਫ਼౓ͷ৴པ۠ؒΛ࡞੒͢Δ৔߹ɺ৴པ͕۠ؒແޮʹͳΔͨΊɺ݁߹͞ΕͨσʔλʹϞσϧΛ ࠶ద߹ͤ͞Δ΂͖Ͱ͸ͳ͍͜ͱ΋͍ࣔͯ͠Δɻ http://arxiv.org/abs/2104.00673v2 ΧϦϑΥϧχΞେֶόʔΫϨʔߍˍελϯϑΥʔυେֶ ˠΫϩεόϦσʔγϣϯͷվળɻ୯७ͳΫϩεόϦσʔγϣϯͷ༧ଌਫ਼౓͸໊໨ΑΓ ௿͘ͳͬͯ͠·͏໰୊͕͋ͬͨɻωεςουΫϩεόϦσʔγϣϯ /$7 Λಋೖͨ͠ Β͜ͷ໰୊͕վળͨ͠ɻ

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

ᶆGANcraft:Minecraftϫʔϧυͷڭࢣͳ͠ͷ3DχϡʔϥϧϨϯμϦϯά (ݪจ: GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds) GANcraft͸ɺMinecraftͷΑ͏ͳେن໛ͳ3DϒϩοΫੈքͷϑΥτϦΞϦεςΟοΫͳը૾Λੜ੒͢Δ ͨΊͷɺڭࢣͳ͠ͷχϡʔϥϧϨϯμϦϯάϑϨʔϜϫʔΫͰ͢ɻ͜ͷख๏Ͱ͸ɺηϚϯςΟοΫϒ ϩοΫϫʔϧυΛೖྗͱ͠ɺ֤ϒϩοΫʹ౔ɺ૲ɺਫͳͲͷηϚϯςΟοΫϥϕϧΛ෇༩͠·͢ɻຊख ๏Ͱ͸ɼੈքΛ࿈ଓతͳମੵؔ਺ͱͯ͠දݱ͠ɼϢʔβ͕ૢ࡞͢ΔΧϝϥʹରͯ͠Ұ؏ੑͷ͋ΔϑΥτ ϦΞϦεςΟοΫͳը૾ΛϨϯμϦϯά͢ΔΑ͏ʹϞσϧΛֶश͠·͢ɽϒϩοΫੈքͷϖΞͱͳΔά ϥϯυτΡϧʔεͷ࣮ը૾͕ͳ͍৔߹ɺٙࣅάϥϯυτΡϧʔεͱఢରతֶशʹجֶ͍ͮͨशٕज़Λߟ Ҋ͠·ͨ͠ɻ͜Ε͸ɺϏϡʔ߹੒ͷͨΊͷχϡʔϥϧϨϯμϦϯάʹؔ͢Δઌߦݚڀͱ͸ରরతͰ͢ɻ χϡʔϥϧϨϯμϦϯάͰ͸ɺγʔϯͷδΦϝτϦ΍Ϗϡʔʹґଘ͢ΔΞϐΞϥϯεΛਪఆ͢ΔͨΊ ʹɺάϥ΢ϯυτΡϧʔεը૾͕ඞཁͱͳΓ·͢ɻGANcraftͰ͸ɺΧϝϥͷيಓʹՃ͑ͯɺγʔϯͷη ϚϯςΟΫεͱग़ྗελΠϧͷ྆ํΛϢʔβʔ੍͕ޚͰ͖·͢ɻڧྗͳϕʔεϥΠϯͱൺֱ࣮ͨ͠ݧ݁ Ռ͸ɺϑΥτϦΞϦεςΟοΫͳ3DϒϩοΫϫʔϧυ߹੒ͱ͍͏৽͍͠λεΫʹର͢ΔGANcraftͷ༗ ޮੑΛ͍ࣔͯ͠·͢ɻ͜ͷϓϩδΣΫτͷ΢ΣϒαΠτ͸ɺhttps://nvlabs.github.io/GANcraft/ ɻ http://arxiv.org/abs/2104.07659v1 NVIDIA ˠϚΠϯΫϥϑτͷ̏%ߏ଄σʔλ͔Β ෩ܠࣸਅΛੜ੒͢Δ("/DSBGUΛެ։ͨ͠

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

ᶇStyleCLIP: StyleGANը૾ͷςΩετʹΑΔૢ࡞ํ๏ (ݪจ: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery) StyleGAN͕༷ʑͳ෼໺ͰඇৗʹϦΞϧͳը૾Λੜ੒Ͱ͖Δ͜ͱʹ৮ൃ͞Εɺ࠷ۙͰ͸ɺੜ੒͞Εͨը૾ ΍࣮෺ͷը૾Λૢ࡞͢ΔͨΊʹStyleGANͷજࡏۭؒΛͲͷΑ͏ʹ࢖༻͢Δ͔Λཧղ͢Δ͜ͱʹଟ͘ͷݚ ڀ͕ूத͍ͯ͠Δɻ͔͠͠ɺҙຯͷ͋Δજࡏతͳૢ࡞Λൃݟ͢ΔͨΊʹ͸ɺଟ͘ͷࣗ༝౓Λਓ͕ؒ୮೦ʹ ௐ΂ͨΓɺ໨తͷૢ࡞͝ͱʹը૾ΛूΊͯ஫ऍΛ෇͚ͨΓ͢Δඞཁ͕͋Γ·͢ɻຊݚڀͰ͸ɺ࠷ۙಋೖ͞ ΕͨCLIPʢContrastive Language-Image Pre-trainingʣϞσϧΛ׆༻͢Δ͜ͱͰɺ͜ͷΑ͏ͳख࡞ۀΛඞ ཁͱ͠ͳ͍StyleGANը૾ૢ࡞ͷͨΊͷςΩετϕʔεͷΠϯλʔϑΣʔεΛ։ൃ͢Δ͜ͱΛݕ౼͢Δɻ ·ͣɺCLIPϕʔεͷଛࣦΛར༻ͯ͠ɺϢʔβʔ͕ఏڙ͢ΔςΩετϓϩϯϓτʹԠͯ͡ೖྗજࡏϕΫτϧ Λमਖ਼͢Δ࠷దԽεΩʔϜΛ঺հ͠·͢ɻ࣍ʹɺ༩͑ΒΕͨೖྗը૾ʹର͢ΔςΩετ༠ಋͷજࡏతૢ࡞ εςοϓΛਪ࿦͢ΔજࡏతϚούʔʹ͍ͭͯઆ໌͠ɺΑΓߴ଎Ͱ҆ఆͨ͠ςΩετϕʔεͷૢ࡞ΛՄೳʹ ͠·͢ɻ࠷ޙʹɺςΩετϓϩϯϓτΛStyleGANͷελΠϧۭؒʹ͓͚Δೖྗʹґଘ͠ͳ͍ࢦࣔʹϚο ϐϯά͢Δํ๏Λࣔ͠ɺςΩετʹΑΔΠϯλϥΫςΟϒͳը૾ૢ࡞ΛՄೳʹ͢Δɻ޿ൣͳ݁Ռͱൺֱʹ ΑΓɺզʑͷΞϓϩʔνͷ༗ޮੑ͕࣮ূ͞Εͨɻ http://arxiv.org/abs/2103.17249v1 ϔϒϥΠେֶɾςϧΞϏϒେֶɾAdobe Research ˠ$-*1ͱ4UZMF("/Λ૊Έ߹Θͤͯɺ ςΩετͰը૾ૢ࡞Ͱ͖Δ4UZMF$-*1Λ࡞ͬͨɻ

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

ࢀߟ: CLIP จষͱը૾ͷ૊Έ߹ΘͤͰࣄલֶशͯ͠ɺθϩγϣοτ Ͱະ஌ͷը૾ΛΫϥε෼ྨ͢Δɻ(Ϋϥε͸ط஌) IUUQTPQFOBJDPNCMPHDMJQ

Slide 32

Slide 32 text

೚ҙͷະ஌ͷը૾ʹ͍ͭͯɺa photo of a ʙ ͱͯ͠จষΛਪ࿦͢Δͷ͕ಛ௃తɻ IUUQTPQFOBJDPNCMPHDMJQ

Slide 33

Slide 33 text

IUUQTPQFOBJDPNCMPHDMJQ

Slide 34

Slide 34 text

StyleCLIP

Slide 35

Slide 35 text

ᶈLocalViTɿϏδϣϯτϥϯεϑΥʔϚʔʹ஍ҬੑΛ࣋ͨͤΔ (ݪจ: LocalViT: Bringing Locality to Vision Transformers) ຊݚڀͰ͸ɺϏδϣϯม׵ثʹҐஔ৘ใϝΧχζϜΛಋೖ͢Δํ๏Λݚڀ͢ΔɻมܗثωοτϫʔΫ͸ػց຋༁ʹ ༝དྷ͢Δ΋ͷͰɺಛʹ௕͍γʔέϯε಺ͷ௕ڑ཭ґଘؔ܎ΛϞσϧԽ͢Δͷʹద͍ͯ͠·͢ɻτʔΫϯΤϯϕσΟ ϯάؒͷάϩʔόϧͳ૬ޓ࡞༻͸ɺτϥϯεϑΥʔϚͷࣗݾ஫ҙϝΧχζϜʹΑͬͯ͏·͘ϞσϧԽͰ͖·͕͢ɺ ϩʔΧϧͳྖҬ಺Ͱͷ৘ใަ׵ͷͨΊͷϩʔΧϦςΟϝΧχζϜ͕ෆ଍͍ͯ͠·͢ɻ͔͠͠ɺը૾ʹͱͬͯہॴੑ ͸ɺઢɺΤοδɺܗঢ়ɺ͞Βʹ͸෺ମͳͲͷߏ଄ʹؔ܎͢ΔͨΊɺෆՄܽͰ͢ɻ ຊݚڀͰ͸ɺϑΟʔυϑΥϫʔυ ωοτϫʔΫʹਂ͞ํ޲ͷ৞ΈࠐΈΛಋೖ͢Δ͜ͱͰɺࢹ֮ม׵૷ஔʹہॴੑΛ࣋ͨͤ·͢ɻ͜ͷҰݟ୯७ͳղܾ ࡦ͸ɺϑΟʔυϑΥϫʔυωοτϫʔΫͱ൓సͨ͠࢒ࠩϒϩοΫͱͷൺֱ͔Βண૝Λಘ͍ͯ·͢ɻہॴੑϝΧχζ Ϝͷॏཁੑ͸ɺ2ͭͷํ๏Ͱݕূ͞ΕΔɻ1) ہॴੑϝΧχζϜΛ૊ΈࠐΉͨΊʹ͸ɼ෯޿͍ઃܭ্ͷબ୒ࢶʢ׆ੑ Խؔ਺ɼ૚ͷ഑ஔɼ֦େ཰ʣ͕͋Γɼ͢΂ͯͷద੾ͳબ୒͕ϕʔεϥΠϯΑΓ΋ੑೳ޲্ʹͭͳ͕Δ͜ͱɼ2) ಉ͡ ہॴੑϝΧχζϜΛ4ͭͷࢹ֮ม׵ثʹద༻͢Δ͜ͱʹ੒ޭ͠ɼہॴੑίϯηϓτͷҰൠԽΛࣔͨ͜͠ͱͰ͢ɽಛ ʹɼImageNet2012ͷ෼ྨͰ͸ɼύϥϝʔλ਺ͱܭࢉྔͷ૿ՃΛແࢹͯ͠ɼDeiT-TͱPVT-T͕ϕʔεϥΠϯΛ2.6%ͱ 3.1%্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɽίʔυ͸ɼURL{https://github.com/ofsoundof/LocalViT}Ͱެ։͞Ε͍ͯ·͢ɽ http://arxiv.org/abs/2104.05707v1 νϡʔϦοώ޻Պେֶ, ϧʔϰΣϯɾΧτϦοΫେֶ ˠ7J5ͷվྑɻը૾಺ͷ࠲ඪ৘ใΛ΋ͨͤΔ޻෉Λ 7J5ʹ௥Ճͨ͠Βੑೳ͕͕͋ͬͨ

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

ᶉΩʔϫʔυτϥϯεϑΥʔϚʔΩʔϫʔυεϙοςΟϯάͷͨΊͷࣗݾݴٴϞ σϧ (ݪจ: Keyword Transformer: A Self-Attention Model for Keyword Spotting) TransformerͷΞʔΩςΫνϟ͸ɺࣗવݴޠॲཧɺίϯϐϡʔλϏδϣϯɺԻ੠ೝࣝͳͲɺ͞· ͟·ͳྖҬͰ੒ޭΛऩΊ͍ͯ·͢ɻΩʔϫʔυɾεϙοςΟϯάͰ͸ɺओʹ৞ΈࠐΈΤϯίʔ μʔ΍ϦΧϨϯτɾΤϯίʔμʔͷ্ʹࣗݾٵண͕࢖༻͞Ε͖ͯ·ͨ͠ɻຊݚڀͰ͸ɺ TransformerΞʔΩςΫνϟΛΩʔϫʔυɾεϙοςΟϯάʹదԠͤ͞ΔͨΊͷ༷ʑͳํ๏Λௐ ࠪ͠ɺࣄલֶश΍௥Ճσʔλͳ͠ʹෳ਺ͷλεΫͰ࠷ઌ୺ͷੑೳΛ্ճΔ׬શͳࣗݾ஫ҙܕ ΞʔΩςΫνϟͰ͋ΔΩʔϫʔυɾτϥϯεϑΥʔϚʔʢKWTʣΛ঺հ͠·͢ɻڻ͘΂͖͜ͱ ʹɺ͜ͷγϯϓϧͳΞʔΩςΫνϟ͸ɺ৞ΈࠐΈ૚ɺϦΧϨϯτ૚ɺؾ഑Γ૚Λࠞࡏͤͨ͞Α ΓෳࡶͳϞσϧΑΓ΋༏Ε͍ͯ·͢ɻKWT͸ɺ͜ΕΒͷϞσϧͷ୅ସͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ ͖ɺGoogle Speech Commandsσʔληοτʹ͓͍ͯɺ12ݸͷίϚϯυλεΫͰ98.6%ɺ 35ݸͷίϚϯυλεΫͰ97.7%ͷਫ਼౓ͱ͍͏2ͭͷ৽͍͠ϕϯνϚʔΫه࿥Λୡ੒͠·ͨ͠ɻ http://arxiv.org/abs/2104.00769v2 Arm ML Reserch Lab, ϧϯυେֶ ˠ5SBOTGPSNFSͰԻ੠ೝࣝɻεϚʔτεϐʔΧʔ౳ͰΑ͋͘ΔԻ੠͔Βͷ ΩʔϫʔυೝࣝλεΫΛɺ5SBOTGPSNFSΛ࢖ͬͨΒߴਫ਼౓ʹୡ੒Ͱ͖ͨɻ

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

ᶊϚϧνεέʔϧϏδϣϯτϥϯεϑΥʔϚʔ (ݪจ: Multiscale Vision Transformers) զʑ͸ɺϚϧνεέʔϧಛ௃֊૚ͷਫ਼៛ͳΞΠσΞΛτϥϯεϑΥʔϚʔϞσϧͱ݁ͼ͚ͭΔ͜ͱʹΑ ΓɺϏσΦ͓Αͼը૾ೝࣝͷͨΊͷϚϧνεέʔϧɾϏδϣϯɾτϥϯεϑΥʔϚʔʢMViTʣΛൃද ͢ΔɻϚϧνεέʔϧɾτϥϯεϑΥʔϚʔ͸ɼෳ਺ͷνϟϯωϧղ૾౓εέʔϧͷஈ֊Λ࣋ͭɽೖྗ ղ૾౓ͱখ͞ͳνϟωϧ࣍ݩ͔Βελʔτ֤ͨ͠εςʔδ͸ɼۭؒղ૾౓ΛԼ͛ͳ͕Βνϟωϧ༰ྔΛ ֊૚తʹ֦େ͍͖ͯ͠·͢ɽ͜ΕʹΑΓɺ୯७ͳ௿Ϩϕϧͷࢹ֮৘ใΛϞσϧԽ͢ΔͨΊʹߴ͍ۭؒղ ૾౓Ͱಈ࡞͢Δॳظͷ૚ͱɺۭؒతʹૈ͍͕ෳࡶͳߴ࣍ݩͷಛ௃Λ࣋ͭਂ͍૚͔ΒͳΔɺಛ௃ͷϚϧν εέʔϧɾϐϥϛου͕ܗ੒͞ΕΔɻզʑ͸ɺ༷ʑͳϏσΦೝࣝλεΫʹ͓͍ͯɺࢹ֮৴߸ͷ៛ີͳੑ ࣭ΛϞσϧԽ͢ΔͨΊͷ͜ͷجຊతͳΞʔΩςΫνϟ༏ઌ౓ΛධՁͨ͠ͱ͜Ζɺେن໛ͳ֎෦ࣄલֶश ʹґଘ͠ɺܭࢉ΍ύϥϝʔλʹ͓͍ͯ5ʙ10ഒͷίετ͕͔͔Δطଘͷࢹ֮ม׵ثΑΓ΋༏Ε͍ͯ·͠ ͨɻ͞Βʹɺ࣌ؒతͳ࣍ݩΛऔΓআ͖ɺը૾෼ྨʹ͜ͷϞσϧΛద༻ͨ͠ͱ͜Ζɺઌߦ͢Δࢹ֮ม׵૷ ஔΛ্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɻίʔυ͸ https://github.com/facebookresearch/SlowFast ͔Βೖख ՄೳͰ͢ɻ http://arxiv.org/abs/2104.11227v1 Facebook AI Research,ΧϦϑΥϧχΞେֶόʔΫϨʔߍ ˠ7J5ͷվྑɻτϥϯεϑΥʔϚʔͷۭؒɾνϟωϧʹ͍ͭͯϚϧνεέʔϧಛ ௃ͷ֊૚ԽΛͨ͠ΒγϯάϧεέʔϧτϥϯεϑΥʔϚʔΑΓੑೳ͕޲্ͨ͠ɻ طଘϥΠϒϥϦ1Z4MPX'BTUʹಉػೳΛ௥Ճͨ͠ɻ

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

PySlowFast ϏσΦը૾෼ྨϞσϧ IUUQTHJUIVCDPNGBDFCPPLSFTFBSDI4MPX'BTU

Slide 42

Slide 42 text

ࢀߟ: SlowFast (2018~2019) IUUQTBSYJWPSHBCT

Slide 43

Slide 43 text

ᶋSiT: Self-supervised vIsion Transformer (ݪจ: SiT: Self-supervised vIsion Transformer) ࣗݾڭࢣ෇ֶ͖श๏͸ɺۙ೥ɺڭࢣ෇ֶ͖शͱͷࠩΛॖΊΔ͜ͱʹ੒ޭͨ͜͠ͱ͔ΒɺίϯϐϡʔλϏδϣϯͷ෼໺Ͱ· ͢·͢஫໨ΛूΊ͍ͯ·͢ɻࣗવݴޠॲཧʢNLPʣͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱม׵ث͸͢Ͱʹબ୒͞Ε͍ͯΔख๏Ͱ͢ɻ ࠷ۙͷจݙʹΑΔͱɺτϥϯεϑΥʔϚʔ͸ίϯϐϡʔλϏδϣϯͰ΋ਓؾ͕ߴ·͍ͬͯΔΑ͏Ͱ͢ɻ͜Ε·Ͱͷͱ͜ ΖɺϏδϣϯม׵ث͸ɺେن໛ͳڭࢣ෇͖σʔλΛ༻͍ͯࣄલֶशΛߦ͏͔ɺڭࢣωοτϫʔΫͳͲͷԿΒ͔ͷڞಉڭࢣ Λ༻͍ͯࣄલֶशΛߦ͏ͱɺ͏·͘ػೳ͢Δ͜ͱ͕ࣔ͞Ε͍ͯΔɻ͜ΕΒͷڭࢣ෇͖ࣄલֶश͞Εͨࢹ֮ม׵ث͸ɺ࠷খ ݶͷมߋͰԼྲྀͷλεΫͰඇৗʹྑ͍݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻຊݚڀͰ͸ɺը૾/ࢹ֮ม׵ثΛࣄલֶश͠ɺԼྲྀͷ෼ ྨλεΫʹ࢖༻͢ΔͨΊͷࣗݾڭࢣ෇ֶ͖शͷϝϦοτΛௐࠪ͢Δɻզʑ͸Self-supervised vIsion Transformers (SiT)Λ ఏҊ͠ɺϓϨςΩετϞσϧΛಘΔͨΊͷ͍͔ͭ͘ͷࣗݾڭࢣ෇ֶ͖शϝΧχζϜʹ͍ͭͯٞ࿦͢ΔɻSiTͷΞʔΩςΫ νϟͷॊೈੑʹΑΓɺΦʔτΤϯίʔμʔͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ͖ɺෳ਺ͷࣗݾڭࢣ෇͖λεΫΛγʔϜϨεʹѻ͏͜ ͱ͕Ͱ͖Δɻզʑ͸ɺ਺ඦສຕͷը૾Ͱ͸ͳ͘਺ઍຕͷը૾Ͱߏ੒͞ΕΔখن໛ͳσʔληοτʹ͓͍ͯɺࣄલʹֶश͠ ͨSiTΛԼྲྀͷ෼ྨλεΫͷͨΊʹඍௐ੔Ͱ͖Δ͜ͱΛࣔ͢ɻఏҊ͞ΕͨΞϓϩʔν͸ɼҰൠతͳϓϩτίϧΛ༻͍ͨඪ ४తͳσʔληοτͰධՁ͞Εͨɽͦͷ݁Ռɺม׵ثͷڧ͞ͱɺࣗݾڭࢣ෇ֶ͖श΁ͷదੑ͕࣮ূ͞Εͨɻզʑ͸ɺطଘ ͷࣗݾڭࢣ෇ֶ͖श๏ΛେࠩͰ྇կͨ͠ɻ·ͨɺSiT͕਺γϣοτͷֶशʹద͍ͯ͠Δ͜ͱΛ֬ೝ͠ɺ͞ΒʹɺSiT͔Βֶ शͨ͠ಛ௃ྔͷ্ʹઢܗ෼ྨثΛֶश͢Δ͚ͩͰɺ༗༻ͳදݱΛֶश͍ͯ͠Δ͜ͱΛࣔ͠·ͨ͠ɻϓϨτϨʔχϯάɺ ϑΝΠϯνϡʔχϯάɺ͓ΑͼධՁίʔυ͸ɺhttps://github.com/Sara-Ahmed/SiTɻ http://arxiv.org/abs/2104.03602v1 IEEE ˠ*&&&ʹΑΔࣗݾڭࢣֶ͖ͭश5SBOTGPSNFS4J5ͷൃදɻ

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

̎छྨͷը૾Ճ޻ํ๏Λ࠾༻

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

ᶌϞʔγϣϯɾάϧʔϐϯάʹΑΔࣗݾڭࢣ෇͖ө૾ΦϒδΣΫτɾηάϝϯ ςʔγϣϯ (ݪจ: Self-supervised Video Object Segmentation by Motion Grouping) ಈ෺͸ӡಈΛཧղ͢ΔͨΊʹߴػೳͳࢹ֮γεςϜΛਐԽͤ͞ɺෳࡶͳ؀ڥԼͰ΋஌֮Λॿ͚͍ͯΔɻຊ࿦จͰ ͸ɺϞʔγϣϯΩϡʔΛར༻ͯ͠෺ମΛ෼ׂ͢Δ͜ͱ͕Ͱ͖ΔίϯϐϡʔλϏδϣϯγεςϜɺ͢ͳΘͪϞʔ γϣϯηάϝϯςʔγϣϯͷ։ൃʹऔΓ૊ΜͰ͍·͢ɻຊ࿦จͰ͸ɺ࣍ͷΑ͏ͳߩݙΛ͍ͯ͠·͢ɻୈҰʹɺ Transformerͷ؆୯ͳվྑ൛Λಋೖ͠ɺΦϓςΟΧϧϑϩʔϑϨʔϜΛओཁͳΦϒδΣΫτͱഎܠʹ෼ׂ͠·͢ɻ ୈೋʹɺ͜ͷΞʔΩςΫνϟΛɺखಈͷΞϊςʔγϣϯΛ࢖༻ͤͣʹɺࣗݾڭࢣ෇͖Ͱֶश͠·͢ɻୈ3ʹɺզʑ ͷख๏ͷॏཁͳίϯϙʔωϯτΛ෼ੳ͠ɺͦͷඞཁੑΛݕূ͢ΔͨΊʹపఈతͳΞϒϨʔγϣϯݚڀΛߦ͍· ͢ɻୈ4ʹɼఏҊͨ͠ΞʔΩςΫνϟΛύϒϦοΫϕϯνϚʔΫʢDAVIS2016ɼSegTrackv2ɼFBMS59ʣͰධՁ ͢ΔɽΦϓςΟΧϧϑϩʔͷΈΛೖྗͱ͍ͯ͠Δʹ΋͔͔ΘΒͣɼզʑͷΞϓϩʔν͸ɼ͜Ε·Ͱͷ࠷ઌ୺ͷࣗ ݾڭࢣ෇͖ख๏ͱൺֱͯ͠ɼ༏Εͨɼ͋Δ͍͸ಉ౳ͷ݁ՌΛୡ੒͢Δͱͱ΋ʹɼܻҧ͍ʹߴ଎Ͱ͋Δ͜ͱ͕Θ ͔ͬͨɽ͞Βʹɺ೉қ౓ͷߴ͍ΧϞϑϥʔδϡσʔληοτʢMoCAʣΛ༻͍ͯධՁͨ͠ͱ͜Ζɺଞͷࣗݾڭࢣ ෇͖ΞϓϩʔνΛେ෯ʹ্ճΓɺτοϓͷڭࢣ෇͖Ξϓϩʔνͱͷൺֱ΋ྑ޷ͰɺϞʔγϣϯΩϡʔͷॏཁੑ ͱɺطଘͷϏσΦηάϝϯςʔγϣϯϞσϧʹ͓͚Δࢹ֮తͳ֎؍΁ͷજࡏతͳภΓ͕ڧௐ͞Ε·ͨ͠ɻ http://arxiv.org/abs/2104.07658v1 ΦοΫεϑΥʔυେֶ ˠಈըΛର৅ʹͨࣗ͠ݾڭࢣ͋Γֶशͷݚڀɻಈը಺ͷ෺ମͷಈ͖͚ͩΛ ώϯτʹࣗݾڭࢣ͋ΓֶशͰϏσΦσʔλ͔Β෺ମηάϝϯςʔγϣϯΛ ߦͬͨΒɺٖଶಈ෺ͷಈըͰߴ଎ɾߴਫ਼౓Ͱݕग़Ͱ͖ͨɻ

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Top hype: Best10

Slide 50

Slide 50 text

ᶃ ࠷খݶͷ࿪ΈΛར༻ͨ͠ΤϯϕοσΟϯά (ݪจ: Minimum-Distortion Embedding) ͜͜Ͱ͸ɼϕΫτϧຒΊࠐΈ໰୊Λߟ͑Δɽ༗ݶݸͷΞΠςϜͷू߹͕༩͑ΒΕɼ֤ΞΠςϜʹ୅දతͳϕΫτϧΛׂ Γ౰ͯΔ͜ͱ͕໨తͰ͋Δɽ͍͔ͭ͘ͷΞΠςϜͷϖΞ͕ྨࣅ͓ͯ͠Γɺ೚ҙʹ͍͔ͭ͘ͷଞͷϖΞ͕ඇྨࣅͰ͋Δ͜ ͱΛࣔ͢σʔλ͕༩͑ΒΕ·͢ɻྨࣅͨ͠ΞΠςϜͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ʹ͋Δ͜ͱ͕๬·Εɼ ඇྨࣅͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ͳ͍͜ͱ͕๬·Ε·͢ʢϢʔΫϦουڑ཭Ͱଌఆ͞Ε·͢ʣɽզʑ ͸ɺΞΠςϜͷ͍͔ͭ͘ͷϖΞʹ͍ͭͯఆٛ͞Εͨ࿪Έؔ਺Λಋೖ͢Δ͜ͱʹΑͬͯɺ͜ΕΛެࣜԽ͠·͢ɻզʑͷ໨ త͸ɺ੍໿৚݅ͷ΋ͱͰɺશମͷ࿪ΈΛ࠷খʹ͢ΔຒΊࠐΈΛબͿ͜ͱͰ͋Δɻ͜ΕΛɺ࠷খ࿪ΈຒΊࠐΈʢMDEʣ໰ ୊ͱݺͿɻ MDEͷϑϨʔϜϫʔΫ͸୯७Ͱ͕͢ɺҰൠతͰ͢ɻMDEʹ͸ɺεϖΫτϧຒΊࠐΈɺओ੒෼෼ੳɺଟ࣍ݩ εέʔϦϯάɺIsomap΍UMAPͷΑ͏ͳ࣍ݩ࡟ݮ๏ɺྗ೚ͤͷϨΠΞ΢τͳͲɺ͞·͟·ͳຒΊࠐΈํ๏ؚ͕·Ε͍ͯ ·͢ɻ·ͨɺ৽͍͠ຒΊࠐΈ๏΋ؚ·Ε͓ͯΓɺྺ࢙తͳຒΊࠐΈ๏ͱ৽͍͠ຒΊࠐΈ๏Λಉ༷ʹݕূ͢Δݪཧతͳํ ๏Λఏڙ͍ͯ͠·͢ɻ MDEͷ໰୊Λۙࣅతʹղܾ͠ɺେن໛ͳσʔληοτʹରԠ͢Δ౤Өܕ४χϡʔτϯ๏Λ։ൃ ͠·ͨ͠ɻ͜ͷख๏͸ɺΦʔϓϯιʔεͷPythonύοέʔδͰ͋ΔPyMDEʹ࣮૷͞Ε͍ͯ·͢ɻPyMDEͰ͸ɺϢʔβ͸ ࿪Έؔ਺ͱ੍໿ͷϥΠϒϥϦ͔Βબ୒ͨ͠ΓɺΧελϜͷ΋ͷΛࢦఆͨ͠Γ͢Δ͜ͱ͕Ͱ͖ɺ༷ʑͳຒΊࠐΈΛ؆୯ʹ ࢼ͢͜ͱ͕Ͱ͖·͢ɻ͜ͷιϑτ΢ΣΞ͸ɺ਺ඦສͷΞΠςϜͱ਺ઍສͷ࿪Έؔ਺Λ࣋ͭσʔληοτʹରԠ͍ͯ͠· ͢ɻզʑͷख๏Λ࣮ূ͢ΔͨΊʹɼը૾ɼֶज़తͳڞஶऀωοτϫʔΫɼถࠃͷ܊ͷਓޱ౷ܭσʔλɼ୯Ұࡉ๔ͷ mRNAτϥϯεΫϦϓτʔϜͳͲɼ͍͔ͭ͘ͷ࣮ੈքͷσʔληοτͷຒΊࠐΈΛܭࢉͨ͠ɽ http://arxiv.org/abs/2103.02559v2 ελϯϑΥʔυେֶ ˠ$POUSBTUJWF-FBSOJOH౳ʹؔ࿈͢Δجૅݚڀɻू߹಺ͷཁૉ͝ͱͷྨࣅ౓ΛϕΫτϧͰදݱ͢ΔΞϧΰϦζϜ ͷҰൠԽɾެࣜԽͱɺϑϨʔϜϫʔΫ1Z.%&ͷ঺հ

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

ᶄStyleCLIP: StyleGANը૾ͷςΩετʹΑΔૢ࡞ํ๏ (ݪจ: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery) recentͱॏෳ

Slide 53

Slide 53 text

ᶅRepVGG: VGGελΠϧͷConvNetsΛ࠶ͼૉ੖Β͘͢͠Δ (ݪจ: RepVGG: Making VGG-style ConvNets Great Again) ৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷγϯϓϧͰڧྗͳΞʔΩςΫνϟΛఏҊ͢Δɻ͜ͷΞʔ ΩςΫνϟ͸ɺ3x3৞ΈࠐΈͱReLUͷελοΫ͚ͩͰߏ੒͞ΕͨVGGͷΑ͏ͳਪ࿦࣌ͷϘ σΟΛ࣋ͪɺτϨʔχϯά࣌ͷϞσϧ͸ଟࢬͷτϙϩδʔΛ࣋ͭɻ͜ͷΑ͏ͳֶश࣌ͱਪ࿦ ࣌ͷΞʔΩςΫνϟͷ੾Γ཭͠͸ɺߏ଄తͳ࠶ύϥϝʔλԽٕज़ʹΑ࣮ͬͯݱ͞Ε͓ͯΓɺ ͜ͷϞσϧ͸RepVGGͱ໊෇͚ΒΕ͍ͯ·͢ɻImageNetʹ͓͍ͯɺRepVGG͸80%Ҏ্ͷ τοϓ1ਫ਼౓Λୡ੒͓ͯ͠Γɺ͜Ε͸զʑͷ஌ΔݶΓɺϓϨʔϯϞσϧͱͯ͠͸ॳΊͯͷ͜ͱ Ͱ͢ɻNVIDIA 1080Ti GPU্Ͱ͸ɺRepVGGϞσϧ͸ɺResNet-50ΑΓ΋83ˋɺResNet-101 ΑΓ΋101ˋߴ଎ʹಈ࡞͠ɺߴਫ਼౓ͰɺEf fi cientNet΍RegNetͳͲͷ࠷ઌ୺Ϟσϧͱൺֱ͠ ͯɺྑ޷ͳਫ਼౓-଎౓τϨʔυΦϑΛ͍ࣔͯ͠·͢ɻ ίʔυͱֶशࡁΈϞσϧ͸ɺhttps:// github.com/megvii-model/RepVGGɻ http://arxiv.org/abs/2101.03697v3 ਗ਼՚େֶ, Megvii, ߳ߓՊٕେֶ ˠ7((ͷվྑɻࠓͲ͖ͷϞσϧ͸'-01ίετ͕ߴ͗ͨ͢ΓඞཁҎ্ʹෳ ࡶͳͷͰɺ7((Λվྑͯ͠࠷ઌ୺ϞσϧฒΈͷੑೳʹͯ͠Έͨɻ

Slide 54

Slide 54 text

RepVGGͷϞσϧߏ଄

Slide 55

Slide 55 text

ᶆੜ෺ֶͱҩֶʹ͓͚ΔωοτϫʔΫͷͨΊͷදݱֶशɻ ਐาɺ௅ઓɺͦͯ͠ػձ (ݪจ: Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities) දݱֶश͕ڧྗͳ༧ଌͱσʔλͷಎ࡯Λఏڙ͢Δ͜ͱʹ੒ޭͨ͜͠ͱͰɺදݱֶशٕज़͸ ωοτϫʔΫͷϞσϦϯάɺ෼ੳɺֶश΁ͱٸ଎ʹ֦େ͍ͯ͠·͢ɻੜ෺ҩֶωοτϫʔΫ ͸ɺλϯύΫ࣭ͷ૬ޓ࡞༻͔Β࣬පωοτϫʔΫɺ͞Βʹ͸ҩྍγεςϜ΍Պֶత஌ࣝʹࢸ Δ·Ͱɺ૬ޓ࡞༻͢ΔཁૉͷγεςϜΛද͢ීวతͳهड़Ͱ͋Δɻ͜ͷϨϏϡʔͰ͸ɺωο τϫʔΫੜ෺ֶͱҩֶͷ௕೥ʹΘͨΔݪଇ͕ɺػցֶशͷݚڀͰ͸ޠΒΕͳ͍͜ͱ͕ଟ͍ ͕ɺදݱֶशͷ֓೦తͳج൫Λఏڙ͠ɺݱࡏͷ੒ޭͱݶքΛઆ໌͠ɺকདྷͷਐาʹ໾ཱͯΔ ͜ͱ͕Ͱ͖Δͱ͍͏ݟղΛ͍ࣔͯ͠Δɻຊ࿦จͰ͸ɺωοτϫʔΫΛίϯύΫτͳϕΫτϧ ۭؒʹຒΊࠐΉͨΊʹҐ૬తͳಛ௃Λར༻͢Δ͜ͱΛ֩ͱͨ͠ɺ͞·͟·ͳΞϧΰϦζϜͷ ΞϓϩʔνΛ·ͱΊ͍ͯΔɻ·ͨɺΞϧΰϦζϜͷֵ৽͔Β࠷΋ԸܙΛड͚ΔՄೳੑͷߴ͍ ੜ෺ҩֶ෼໺ͷ෼ྨ๏Λఏڙ͠·͢ɻදݱֶशٕज़͸ɺෳࡶͳܗ࣭ͷࠜఈʹ͋ΔҼՌؔ܎Λ ಛఆͨ͠Γɺ୯Ұࡉ๔ͷߦಈͱ݈߁΁ͷӨڹΛ෼཭ͨ͠Γɺ҆શͰޮՌతͳҩༀ඼ͰපؾΛ ਍அɾ࣏ྍͨ͠Γ͢ΔͨΊʹෆՄܽͳ΋ͷͱͳ͍ͬͯΔɻ http://arxiv.org/abs/2104.04883v1 ϋʔόʔυେֶҩֶେֶӃ ˠੜ෺ֶͱҩֶ෼໺Ͱͷάϥϑදݱֶशʹ͍ͭͯͷϨϏϡʔ

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

ᶇΫϩεόϦσʔγϣϯɿԿΛਪఆ͢Δͷ͔ɺͲͷఔ౓ͷޮՌ͕͋ Δͷ͔ʁ ݪจ$SPTTWBMJEBUJPOXIBUEPFTJUFTUJNBUFBOEIPXXFMM EPFTJUEPJU recentͱॏෳ

Slide 59

Slide 59 text

ᶈଟ༷ͳΞϐΞϥϯευϝΠϯͱλεΫλΠϓؒͷτϥϯεϑΝʔϥʔχϯάʹ ӨڹΛ༩͑ΔཁҼ (ݪจ: Factors of In fl uence for Transfer Learning across Diverse Appearance Domains and Task Types) సҠֶशͱ͸ɺݩͱͳΔλεΫͰֶशͨ͠஌ࣝΛɺର৅ͱͳΔλεΫͷֶशʹ࠶ར༻͢Δ͜ͱͰ͢ɻ ILSVRCσʔληοτΛ༻͍ͯը૾෼ྨϞσϧΛࣄલʹֶश͠ɺͦͷޙɺ೚ҙͷλʔήοτλεΫͰඍௐ ੔Λߦ͏ͱ͍ͬͨ୯७ͳܗͷసҠֶश͸ɺݱࡏͷ࠷ઌ୺ͷίϯϐϡʔλϏδϣϯϞσϧͰ͸ҰൠతʹߦΘ Ε͍ͯΔɻ͔͠͠ɺ͜Ε·Ͱͷ఻ୡֶशʹؔ͢Δମܥతͳݚڀ͸ݶΒΕ͓ͯΓɺ఻ୡֶश͕ͲͷΑ͏ͳঢ় گͰػೳ͢Δ͜ͱ͕ظ଴͞ΕΔͷ͔ɺे෼ʹཧղ͞Ε͍ͯͳ͍ɻຊ࿦จͰ͸ɺඇৗʹҟͳΔը૾υϝΠϯ ʢফඅऀͷࣸਅɺࣗ཯૸ߦɺߤۭࣸਅɺਫதɺ԰಺γʔϯɺ߹੒ɺΫϩʔζΞοϓʣͱλεΫλΠϓʢη ϚϯςΟοΫηάϝϯςʔγϣϯɺΦϒδΣΫτݕग़ɺਂ౓ਪఆɺΩʔϙΠϯτݕग़ʣΛର৅ʹɺసҠֶ शͷ޿ൣͳ࣮ݧతௐࠪΛ࣮ࢪ͠·ͨ͠ɻॏཁͳͷ͸ɺ͜ΕΒͷλεΫ͸͢΂ͯɺݱ୅ͷίϯϐϡʔλϏ δϣϯΞϓϦέʔγϣϯʹؔ࿈͢ΔɺෳࡶͰߏ଄Խ͞Εͨग़ྗλεΫͰ͋Δͱ͍͏͜ͱͰ͢ɻ߹ܭͰ 1200Ҏ্ͷసૹ࣮ݧΛߦ͍·ͨ͠ɻͦͷதʹ͸ɺιʔεͱλʔήοτ͕ҟͳΔը૾υϝΠϯɺλεΫλ Πϓɺ·ͨ͸ͦͷ྆ํ͔Βߏ੒͞Ε͍ͯΔ΋ͷ΋ଟؚ͘·Ε͍ͯ·͢ɻ͜ΕΒͷ࣮ݧΛମܥతʹ෼ੳ͠ɺ ը૾υϝΠϯɺλεΫλΠϓɺσʔληοτͷαΠζ͕఻ୡֶशͷύϑΥʔϚϯεʹ༩͑ΔӨڹΛཧղ͠ ·͢ɻ͜ͷݚڀʹΑΓɺ͍͔ͭ͘ͷಎ࡯͕ಘΒΕɺ࣮຿ऀ΁ͷ۩ମతͳఏҊʹͭͳ͕Γ·ͨ͠ɻ http://arxiv.org/abs/2103.13318v1 Google Research ˠը૾υϝΠϯͷసҠֶशͷௐࠪ࿦จ

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

ᶃͲͷσʔληοτͰࣄલֶशͯ͠ɺ ᶄ Ͳ ͷ σ λ η τ ʹ స Ҡ ֶ श ͠ ͨ ͔ ʁ

Slide 62

Slide 62 text

ཁ໿ • ը૾υϝΠϯ͕Ұ൪ॏཁɻ࠷ྑͷ݁ՌΛಘΔͨΊʹ͸ಉ͡ը૾ υϝΠϯΛؚΉλεΫ͔ΒͷసҠֶशʹ͢΂͖ • ಉ͡υϝΠϯͰͳͯ͘΋ɺ޿͍υϝΠϯ͔ΒͷసҠֶश͸ෛͷ ޮՌ͸ຆͲͳ͍ɻେن໛σʔληοτΛ࢖͓͚ͬͯ͹େମେৎ ෉͕ͩޮՌ͕ͳ͍͜ͱ΋͋Δɻ(ྫ: COCO͔ΒͷసҠֶशશൠ) • సҠݩɾసҠઌͷλεΫλΠϓͷؔ܎ʹΑͬͯ͸ɺλεΫλΠ ϓΛ௒͑ͨ఻ୡ͕༗ӹͳ͜ͱ΋͋Δɻ(ྫ:Driving → Aerial, Consumer → Indoor)

Slide 63

Slide 63 text

ᶉͳͥہॴ๏Ͱඇತ໰୊͕ղ͚Δͷ͔ʁ (ݪจ: Why Do Local Methods Solve Nonconvex Problems?) ݱ୅ͷػցֶशͰ͸ɺඇತ࠷దԽ͕͍ͨΔͱ͜ΖͰߦΘΕ͍ͯ·͢ɻݚڀऀ͸ ඇತͷ໨తؔ਺ΛߟҊ͠ɺہॴతͳܗঢ়Λར༻ͯ͠൓෮తʹߋ৽͢Δ֬཰తޯ ഑߱Լ๏΍ͦͷѥछͳͲͷࢢൢͷΦϓςΟϚΠβʔΛ༻͍ͯ࠷దԽ͠·͢ɻඇ ತؔ਺ͷղ๏͸࠷ѱͷ৔߹NPϋʔυͰ͋Δʹ΋͔͔ΘΒͣɺ࣮ࡍʹ͸࠷దԽͷ ࣭͸໰୊ʹͳΒͳ͍͜ͱ͕ଟ͍ɻΦϓςΟϚΠβʔ͸ۙࣅతʹάϩʔόϧϛχ ϚϜΛݟ͚ͭΔͱߟ͑ΒΕ͍ͯΔ͔Βͩɻݚڀऀͨͪ͸ɺ͜ͷڵຯਂ͍ݱ৅Λ ౷Ұతʹઆ໌͢ΔԾઆΛཱͯ·ͨ͠ɻͦΕ͸ɺ࣮ࡍʹ࢖༻͞Ε͍ͯΔ໨తͷ΄ ͱΜͲͷϩʔΧϧϛχϚϜ͕ɺۙࣅతͳάϩʔόϧϛχϚϜͰ͋Δͱ͍͏΋ͷ Ͱ͢ɻຊݚڀͰ͸ɺ͜ͷԾઆΛػցֶश໰୊ͷ۩ମతͳࣄྫʹରͯ͠ݫີʹܗ ࣜԽ͍ͯ͠·͢ɻ http://arxiv.org/abs/2103.13462v1 ελϯϑΥʔυେֶ ˠ0QUJNJ[FSΛ࢖ͬͯͳֶͥश͕࠷దԽͰ͖͍ͯΔͷ͔ͷݚڀ

Slide 64

Slide 64 text

ᶊϘʔυήʔϜʹΑΔεέʔϦϯάͷ๏ଇ (ݪจ: Scaling Scaling Laws with Board Games) ػցֶशͷେن໛ͳ࣮ݧʹ͸ɺҰ෦ͷػؔΛআ͍ͯɺ༧ࢉΛ͸Δ͔ʹ௒͑ΔϦ ιʔε͕ඞཁʹͳΓ·͢ɻ޾͍ͳ͜ͱʹɺ͜ͷΑ͏ͳେن໛ͳ࣮ݧͷ݁Ռ͸ɺ ͸Δ͔ʹখن໛Ͱ҆ՁͳҰ࿈ͷ࣮ݧͷ݁Ռ͔ΒਪఆͰ͖Δ৔߹͕ଟ͍͜ͱ͕࠷ ۙ໌Β͔ʹͳΓ·ͨ͠ɻຊݚڀͰ͸ɺϞσϧͷେ͖͚ͩ͞Ͱͳ͘ɺ໰୊ͷେ͖ ͞ʹ΋ج͍ͮͯਪఆͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻAlphaZeroͱHexΛ࢖ͬͯҰ࿈ ͷ࣮ݧΛߦ͏͜ͱͰɺҰఆͷܭࢉྔͰୡ੒Ͱ͖Δੑೳ͕ɺήʔϜͷن໛͕େ͖ ͘ͳͬͯ೉͘͠ͳΔʹͭΕͯ༧ଌՄೳʹ௿Լ͢Δ͜ͱΛࣔ͠·ͨ͠ɻ·ͨɺओ ͳ݁Ռͱͯ͠ɺΤʔδΣϯτ͕ར༻ՄೳͳςετλΠϜͱτϨʔχϯάλΠϜ ͷܭࢉྔ͸ɺੑೳΛҡ࣋͠ͳ͕ΒτϨʔυΦϑͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻ http://arxiv.org/abs/2104.03113v2 Andy Jones (ϩϯυϯ) ˠݱ࣮ࣾձʹػցֶशΛద༻͢Δ࣌ͷίετݟੵ΋Γʹ໾ཱͭݚڀɻ"MQIB;FSP ͱ)FYΛྫʹͯ͠ϘʔυήʔϜͷ"*ΞϧΰϦζϜͷݚڀΛͨ͠ɻٻΊΔੑೳɾ໰ ୊ͷେ͖͞ʹΑͬͯɺֶशίετɾܭࢉίετ͕Ͳ͏มΘΔ͔Λ·ͱΊͨɻ

Slide 65

Slide 65 text

IUUQTKBXJLJQFEJBPSHXJLJϔοΫε@ ϘʔυήʔϜ

Slide 66

Slide 66 text

(AlphaZeroͰ͸)ಉ͡ڧ͞(Ϩʔτ)Λ࣋ͭΞϧΰϦζ Ϝͷ৔߹ɺֶशॲཧ࣌ؒͱɺਪ࿦ॲཧ࣌ؒ͸൓ൺྫ ͢Δɻ

Slide 67

Slide 67 text

ᶋDense PredictionΛՄೳʹ͢ΔϏδϣϯτϥϯεϑΥʔϚʔ (ݪจ: Vision Transformers for Dense Prediction) ີͳ༧ଌλεΫͷόοΫϘʔϯͱͯ͠ɺ৞ΈࠐΈωοτϫʔΫͷ୅ΘΓʹࢹ֮ม׵ثΛ׆༻͢ΔΞʔΩςΫ νϟͰ͋Δʮີͳࢹ֮ม׵ثʯΛ঺հ͠·͢ɻࢹ֮ม׵ثͷ༷ʑͳஈ֊ͰಘΒΕͨτʔΫϯΛ༷ʑͳղ૾౓ͷ ը૾ͷΑ͏ͳදݱʹ૊Έཱͯɺ৞ΈࠐΈσίʔμΛ༻͍ͯϑϧղ૾౓ͷ༧ଌʹஈ֊తʹ݁߹͠·͢ɻม׵ثͷ όοΫϘʔϯ͸ɺҰఆͷൺֱతߴ͍ղ૾౓ͰදݱΛॲཧ͠ɺ͢΂ͯͷஈ֊Ͱάϩʔόϧͳड༰໺Λ͍࣋ͬͯ· ͢ɻ͜ΕΒͷಛੑʹΑΓɺ͜ͷߴີ౓Ϗδϣϯม׵ث͸ɺ׬શͳ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɺΑΓ͖Ί ࡉ͔͘ɺΑΓάϩʔόϧʹҰ؏ͨ͠༧ଌΛߦ͏͜ͱ͕Ͱ͖·͢ɻզʑͷ࣮ݧʹΑΔͱɺ͜ͷΞʔΩςΫνϟ ͸ɺಛʹେྔͷֶशσʔλ͕ར༻Մೳͳ৔߹ɺີͳ༧ଌλεΫʹ͓͍ͯେ෯ͳվળΛ΋ͨΒ͢ɻ୯؟ͷਂ౓ਪ ఆͰ͸ɼ࠷ઌ୺ͷ׬શ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɼ૬ରతͳੑೳ͕࠷େͰ28%޲্ͨ͜͠ͱ͕֬ೝ͞Ε ͨɽ·ͨɺηϚϯςΟοΫηάϝϯςʔγϣϯʹద༻ͨ͠ͱ͜Ζɺີ౓ͷߴ͍Ϗδϣϯม׵͸ɺADE20Kʹ͓ ͍ͯ49.02%ͷmIoUΛୡ੒͠ɺ৽ͨͳٕज़ਫ४Λཱ֬͠·ͨ͠ɻ͞ΒʹɺNYUv2ɺKITTIɺPascal ContextͳͲ ͷখن໛ͳσʔληοτʹ͓͍ͯ΋ɺΞʔΩςΫνϟͷඍௐ੔͕ՄೳͰ͋Δ͜ͱΛ͓ࣔͯ͠Γɺ͜͜Ͱ΋৽ͨ ͳٕज़ਫ४Λཱ͍֬ͯ͠·͢ɻզʑͷϞσϧ͸ɺhttps://github.com/intel-isl/DPTɻ http://arxiv.org/abs/2103.13413v1 Intel Labs ˠ7J5ͰηϚϯςΟοΫηάϝϯςʔγϣϯɻ ྫͱͯ͠ɺ୯؟ࣸਅͷਂ౓ਪఆɾηάϝϯςʔγϣϯͰ޷݁Ռʹͳͬͨɻ

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

ᶌ& ffi DJFOU/FU7ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά ݪจ& ff i DJFOU/FU74NBMMFS.PEFMTBOE'BTUFS5SBJOJOH recentͱॏෳ

Slide 70

Slide 70 text

DeepL Translator (deepl.com) https://www.deepl.com/en/translator