$30 off During Our Annual Pro Sale. View Details »

AI最新論文読み会2021年5月

 AI最新論文読み会2021年5月

AI最新論文読み会2021年5月【オンライン・Zoom配信】(旧 DL勉強会)
Arxivで直近1ヶ月人気の論文まとめ
https://deeplearning-b.connpass.com/event/209977/
の発表資料です。

M.Inomata

May 12, 2021
Tweet

More Decks by M.Inomata

Other Decks in Research

Transcript

  1. AI࠷৽࿦จಡΈձ2021೥5݄
    ᷂tech vein ழມ ॆԝ

    View Slide

  2. ࣗݾ঺հ
    ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ)


    גࣜձࣾ tech vein ୅දऔక໾ ݉ σϕϩούʔ


    twitter: @ino2222
    IUUQTXXXUFDIWFJODPN

    View Slide

  3. Facebook άϧʔϓͷ঺հ
    IUUQTXXXGBDFCPPLDPNHSPVQT

    View Slide

  4. ΞδΣϯμ
    Archive Sanity (arxiv-sanity.com) ͔ΒϐοΫΞο
    ϓͨ͠ɺarxiv.org ͷաڈ1ϲ݄ؒͷ࿦จ঺հɻ


    ɾҰ൪ؾʹͳͬͨ࿦จͷ঺հ


    ɾtop recentͷ࿦จτοϓ10 Ϧετ


    ɾtop hype ͷ࿦จτοϓ10 Ϧετ


    View Slide

  5. Archive Sanity?
    https://www.arxiv-sanity.com/top

    View Slide

  6. ໨࣍

    View Slide

  7. Top10 Recent
    1. Ef
    fi
    cientNetV2: Smaller Models and Faster Training ← PickUp!


    2. An Empirical Study of Training Self-Supervised Vision Transformers


    3. Cross-validation: what does it estimate and how well does it do it?


    4. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds


    5. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery


    6. LocalViT: Bringing Locality to Vision Transformers


    7. Keyword Transformer: A Self-Attention Model for Keyword Spotting


    8. Multiscale Vision Transformers


    9. SiT: Self-supervised vIsion Transformer


    10. Self-supervised Video Object Segmentation by Motion Grouping

    View Slide

  8. Top10 Hype
    1. Minimum-Distortion Embedding


    2. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery


    3. RepVGG: Making VGG-style ConvNets Great Again


    4. Representation Learning for Networks in Biology and Medicine:


    Advancements, Challenges, and Opportunities


    5. Cross-validation: what does it estimate and how well does it do it?


    6. Factors of In
    fl
    uence for Transfer Learning across Diverse Appearance


    Domains and Task Types


    7. Why Do Local Methods Solve Nonconvex Problems?


    8. Scaling Scaling Laws with Board Games


    9. Vision Transformers for Dense Prediction


    10. Ef
    fi
    cientNetV2: Smaller Models and Faster Training)


    View Slide

  9. Pickup࿦จ

    View Slide

  10. Top recent ᶃEf
    fi
    cientNetV2ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά


    (ݪจ: Ef
    fi
    cientNetV2: Smaller Models and Faster Training)
    ຊ࿦จͰ͸ɺैདྷͷϞσϧΑΓ΋ߴ଎ͳֶश଎౓ͱ༏Εͨύϥϝʔλޮ཰Λ࣋ͭɺ৽͍͠৞ΈࠐΈωοτϫʔΫͷϑΝϛ
    ϦʔͰ͋ΔEf
    fi
    cientNetV2Λ঺հ͠·͢ɻ͜ͷϞσϧ܈Λ։ൃ͢ΔͨΊʹɺզʑ͸τϨʔχϯάΛߟྀͨ͠χϡʔϥϧɾ
    ΞʔΩςΫνϟͷ୳ࡧͱεέʔϦϯάͷ૊Έ߹ΘͤΛ༻͍ͯɺτϨʔχϯά଎౓ͱύϥϝʔλޮ཰ΛڞಉͰ࠷దԽͨ͠ɻ
    ͜ͷϞσϧ͸ɺFused-MBConvͳͲͷ৽͍͠ػೳͰڧԽ͞Εͨ୳ࡧۭ͔ؒΒ୳ࡧ͞Ε·ͨ͠ɻ࣮ݧͷ݁Ռɺ
    Ef
    fi
    cientNetV2Ϟσϧ͸ɺ࠷ઌ୺ͷϞσϧΑΓ΋͸Δ͔ʹߴ଎ʹֶशͰ͖ΔҰํͰɺ࠷ྑͰ6.8ഒখ͍͞αΠζʹͳΔ͜ͱ
    ͕෼͔Γ·ͨ͠ɻ ֶशதʹը૾αΠζΛஈ֊తʹେ͖͘͢Δ͜ͱͰɺֶशΛ͞Βʹߴ଎Խ͢Δ͜ͱ͕Ͱ͖·͕͢ɺ͠͹
    ͠͹ਫ਼౓ͷ௿ԼΛҾ͖ى͜͠·͢ɻ͜ͷਫ਼౓௿ԼΛิ͏ͨΊʹɺυϩοϓΞ΢τ΍σʔλ૿ڧͳͲͷਖ਼ଇԽΛదԠతʹௐ
    ੔͢Δ͜ͱΛఏҊ͠ɺߴ଎ͳֶशͱྑ޷ͳਫ਼౓ͷཱ྆Λ࣮ݱ͍ͯ͠·͢ɻ ϓϩάϨογϒֶशʹΑΓɺEf
    fi
    cientNetV2͸
    ImageNet͓ΑͼCIFAR/Cars/Flowersσʔληοτʹ͓͍ͯɺैདྷͷϞσϧΛେ෯ʹ্ճΔ݁ՌΛಘ·ͨ͠ɻಉ͡
    ImageNet21kͰࣄલֶशΛߦ͏͜ͱͰɺզʑͷEf
    fi
    cientNetV2͸ImageNet ILSVRC2012ʹ͓͍ͯ87.3%ͷτοϓ1ਫ਼౓Λ
    ୡ੒͠ɺ࠷ۙͷViTΛ2.0%্ճΔਫ਼౓Λୡ੒͠·ͨ͠ɻҰํͰɺಉ͡ܭࢉࢿݯΛ༻͍ͯ5ഒ͔Β11ഒͷ଎౓ͰֶशΛߦ͍
    ·ͨ͠ɻίʔυ͸ https://github.com/google/automl/ef
    fi
    cientnetv2 Ͱެ։͞Ε·͢ɻ


    http://arxiv.org/abs/2104.00298v1
    Google Research, Brain Team.
    ˠ&
    ffi
    DJFOU/FU7ͷൃදɻ
    ɹ&
    ffi
    DJFOU/FUΛ࣮༻తʹֶशͰ͖ΔΑ͏ʹɺܰྔԽɾߴ଎Խɻ

    View Slide

  11. ֶशͷߴ଎ԽɾϞσϧαΠζͷܰྔԽ

    View Slide

  12. ෮श: ResNet-RS (2021.3) by Google Brain


    ֶशख๏ͱεέʔϧΞοϓख๏Λվળͨ͠ResNet
    IUUQTBSYJWPSHBCT

    View Slide

  13. View Slide

  14. Ef
    fi
    cientNetV1ͷվળ
    • ֶशը૾αΠζΛॖখ͢Δ޻෉

    → Progressive Training


    • ϞσϧͷϘτϧωοΫղফ

    →Fused-MBConv


    • Ϟσϧͷεέʔϧ(B0~B7)ͷ࢓ํΛ޻෉

    → εέʔϧͷϧʔϧมߋˍը૾ͷ࠷େαΠζΛΑ
    Γখ͘͞

    View Slide

  15. Progressive Training


    খ͍͞ը૾͔Βॱʹֶशͯ͠ޮ཰Խ


    +AugumentationϨϕϧͰ޻෉ͯ͠ਫ਼౓௿ԼΛ๷͙

    View Slide

  16. MBConv → Fused-MBConv


    Ϟσϧͷং൫ͷdepthwise-conv͕ϘτϧωοΫͩͬͨͷͰɺ


    Ұ෦(stage1-5)Λ conv ʹஔ͖׵͑ͨɻ


    View Slide

  17. Ϟσϧͷεέʔϧͷ࢓ํͷ޻෉
    • inferenceͷը૾αΠζΛ࠷େ480·Ͱʹ੍ݶ


    • Ϟσϧͷޙ൒ͷεςʔδͷ૚͕ΑΓखް͘ʹͳΔΑ͏ʹάϥ
    σʔγϣϯʹεέʔϧ͢Δ

    View Slide

  18. Ef
    fi
    cientNet V2 ॴײ
    • V2͕͍ܰͱ͍ͬͯ΋Ef
    fi
    cientNetB3~B4Ͱֶश
    Ͱ͖ΔεϖοΫ͸ඞཁɻ


    • ResNet-RS ͱ Ef
    fi
    cientNet V2 ͕ ConvNetͷࠓ
    ޙͷϕʔεϥΠϯʹͳΔʁ


    • ެࣜͷιʔεެ։ָ͕͠Έɻ

    View Slide

  19. Top recent: Best10

    View Slide

  20. ᶃEf
    fi
    cientNetV2ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά


    (ݪจ: Ef
    fi
    cientNetV2: Smaller Models and Faster Training)
    pickup


    View Slide

  21. ᶄSelf-Supervised Vision TransformersͷτϨʔχϯάʹؔ͢Δ࣮ূతͳݚڀ


    (ݪจ: An Empirical Study of Training Self-Supervised Vision Transformers)
    ͜ͷ࿦จͰ͸ɺ৽͍͠ख๏Λઆ໌͢Δ΋ͷͰ͸͋Γ·ͤΜɻͦͷ୅ΘΓʹɺ࠷ۙͷίϯϐϡʔλϏδϣ
    ϯͷਐาΛߟྀͯ͠ɺ୯७Ͱ઴ਐతͳɺ͔͠͠஌͓͔ͬͯͳ͚Ε͹ͳΒͳ͍ϕʔεϥΠϯɺ͢ͳΘͪϏ
    δϣϯτϥϯεϑΥʔϚʔʢViTʣͷͨΊͷࣗݾڭࢣ෇ֶ͖शʹ͍ͭͯݚڀ͢Δɻඪ४తͳ৞ΈࠐΈ
    ωοτϫʔΫͷֶशϨγϐ͸ඇৗʹ੒ख़͍ͯͯ͠ݎ࿚Ͱ͋Δ͕ɺViTͷֶशϨγϐ͸·ͩߏங͞Ε͓ͯ
    Βͣɺಛʹࣗݾڭࢣ෇͖ͷγφϦΦͰ͸ֶश͕ΑΓࠔ೉ʹͳΔɻຊݚڀͰ͸ɺجຊʹཱͪฦͬͯɺࣗݾ
    ڭࢣ෇͖ViTΛֶश͢ΔͨΊͷ͍͔ͭ͘ͷجຊతͳίϯϙʔωϯτͷӨڹΛௐࠪ͠·ͨ͠ɻͦͷ݁Ռɺ
    ෆ҆ఆੑ͸ਫ਼౓Λ௿Լͤ͞Δେ͖ͳ໰୊Ͱ͋ΓɺҰݟ͢Δͱྑ͍݁ՌʹӅ͞Ε͍ͯΔ͜ͱ͕෼͔Γ·͠
    ͨɻ͜ΕΒͷ݁Ռ͸͔֬ʹ෦෼తͳࣦഊͰ͋ΓɺֶशΛΑΓ҆ఆͤ͞Ε͹վળͰ͖Δ͜ͱΛ໌Β͔ʹ͠
    ͨɻViTͷ݁ՌΛMoCo v3΍ଞͷ͍͔ͭ͘ͷࣗݾ؂ࢹܕϑϨʔϜϫʔΫͰϕϯνϚʔΫͨ͠ͱ͜Ζɺ
    ༷ʑͳ໘ͰΞϒϨʔγϣϯ͕ൃੜ͠·ͨ͠ɻݱࡏͷϙδςΟϒͳূڌ͚ͩͰͳ͘ɺ՝୊΍ΦʔϓϯΫΤ
    ενϣϯʹ͍ͭͯ΋ٞ࿦͢Δɻ͜ͷݚڀ͕ɺকདྷͷݚڀʹ໾ཱͭσʔλϙΠϯτͱܦݧΛఏڙ͢Δ͜ͱ
    Λظ଴͍ͯ͠·͢ɻ
    http://arxiv.org/abs/2104.02057v2
    Facebook AI Research(FAIR)
    ˠ7J5ͷϋΠύʔύϥϝʔλݚڀɻ7J5ϕʔεͷ.P$PWϑϨʔϜϫʔΫ
    Λ࡞ͬͯɺੑೳʹӨڹΛ༩͑Δ7J5ͷύϥϝʔλΛൺֱܭଌͨ͠ɻ

    View Slide

  22. όοναΠζɾֶश཰ɾoptimizerΛม͑ͭͭ


    (ີͳ)kNNϞχλͰܭଌͨ͠
    ઃఆ͕ա৒ͩͱEJQ ٸͳམͪࠐΈ
    ͕ݱΕͯɺ
    ੑೳ͕Լ͕Δɻ
    ઃఆ͕ෆ଍ͩͱEJQ͸ग़ͳֶ͍͕शෆ଍ʹͳΔɻ
    L//Λݟͭͭόϥϯεௐ੔͢Δ͜ͱ͕େࣄɻ

    View Slide

  23. ᶅΫϩεόϦσʔγϣϯɿԿΛਪఆ͢Δͷ͔ɺͲͷఔ౓ͷޮՌ͕͋Δͷ͔ʁ


    (ݪจ: Cross-validation: what does it estimate and how well does it do
    it?)
    ΫϩεόϦσʔγϣϯ͸ɼ༧ଌޡࠩΛਪఆ͢ΔͨΊʹ޿͘༻͍ΒΕ͍ͯΔख๏Ͱ͋Δ͕ɼͦͷڍಈ͸ෳࡶ
    Ͱ͋Γɼ׬શʹ͸ཧղ͞Ε͍ͯͳ͍ɽཧ૝తʹ͸ɼΫϩεόϦσʔγϣϯ๏͸ɼֶशσʔλʹద߹ͨ͠Ϟ
    σϧͷ༧ଌޡࠩΛਪఆ͢Δͱߟ͍͑ͨɽզʑ͸ɺ͜Ε͕௨ৗͷ࠷খೋ৐๏ʹΑΔઢܗϞσϧͷ৔߹Ͱ͸ͳ
    ͘ɺಉ͡฼ूஂ͔Βநग़͞Εͨଞͷݟͨ͜ͱͷͳ͍܇࿅ηοτʹద߹ͨ͠Ϟσϧͷฏۉ༧ଌޡࠩΛਪఆ͢
    Δ͜ͱΛূ໌͢Δɻ͞Βʹɺ͜ͷݱ৅͸ɺσʔλ෼ׂɺϒʔτετϥοϓɺMallow's CpͳͲɺ༧ଌޡࠩ
    ͷҰൠతͳਪఆ஋Ͱ΋ى͜Δ͜ͱΛࣔͨ͠ɻ࣍ʹɼΫϩεόϦσʔγϣϯ͔ΒಘΒΕΔ༧ଌޡࠩͷඪ४త
    ͳ৴པ۠ؒ͸ɼ๬·͍͠ϨϕϧΛ͸Δ͔ʹԼճΔΧόϨοδΛ࣋ͭ৔߹͕͋Γ·͢ɽ͜Ε͸ɼ֤σʔλϙ
    Πϯτ͕τϨʔχϯάͱςετͷ྆ํʹ࢖༻͞ΕΔͨΊɼ֤ϑΥʔϧυͷଌఆਫ਼౓ʹ૬͕ؔ͋Γɼ௨ৗͷ
    ෼ࢄͷਪఆ஋͕খ͗͢͞ΔͨΊͰ͢ɽ͜ͷ෼ࢄΛΑΓਖ਼֬ʹਪఆ͢ΔͨΊʹɼωετͨ͠ΫϩεόϦσʔ
    γϣϯ๏Λಋೖ͠ɼ͜ͷमਖ਼ʹΑΓɼैདྷͷΫϩεόϦσʔγϣϯ๏Ͱ͸ࣦഊ͢ΔΑ͏ͳଟ͘ͷྫͰɼ΄
    ΅ਖ਼͍͠ΧόϨοδΛ͕࣋ͭ۠ؒಘΒΕΔ͜ͱΛܦݧతʹࣔͨ͠ɽ࠷ޙʹɺզʑͷ෼ੳͰ͸ɺ୯७ͳσʔ
    λ෼ׂͰ༧ଌਫ਼౓ͷ৴པ۠ؒΛ࡞੒͢Δ৔߹ɺ৴པ͕۠ؒແޮʹͳΔͨΊɺ݁߹͞ΕͨσʔλʹϞσϧΛ
    ࠶ద߹ͤ͞Δ΂͖Ͱ͸ͳ͍͜ͱ΋͍ࣔͯ͠Δɻ
    http://arxiv.org/abs/2104.00673v2
    ΧϦϑΥϧχΞେֶόʔΫϨʔߍˍελϯϑΥʔυେֶ
    ˠΫϩεόϦσʔγϣϯͷվળɻ୯७ͳΫϩεόϦσʔγϣϯͷ༧ଌਫ਼౓͸໊໨ΑΓ
    ௿͘ͳͬͯ͠·͏໰୊͕͋ͬͨɻωεςουΫϩεόϦσʔγϣϯ /$7
    Λಋೖͨ͠
    Β͜ͷ໰୊͕վળͨ͠ɻ

    View Slide

  24. View Slide

  25. ᶆGANcraft:Minecraftϫʔϧυͷڭࢣͳ͠ͷ3DχϡʔϥϧϨϯμϦϯά


    (ݪจ: GANcraft: Unsupervised 3D Neural Rendering of Minecraft
    Worlds)
    GANcraft͸ɺMinecraftͷΑ͏ͳେن໛ͳ3DϒϩοΫੈքͷϑΥτϦΞϦεςΟοΫͳը૾Λੜ੒͢Δ
    ͨΊͷɺڭࢣͳ͠ͷχϡʔϥϧϨϯμϦϯάϑϨʔϜϫʔΫͰ͢ɻ͜ͷख๏Ͱ͸ɺηϚϯςΟοΫϒ
    ϩοΫϫʔϧυΛೖྗͱ͠ɺ֤ϒϩοΫʹ౔ɺ૲ɺਫͳͲͷηϚϯςΟοΫϥϕϧΛ෇༩͠·͢ɻຊख
    ๏Ͱ͸ɼੈքΛ࿈ଓతͳମੵؔ਺ͱͯ͠දݱ͠ɼϢʔβ͕ૢ࡞͢ΔΧϝϥʹରͯ͠Ұ؏ੑͷ͋ΔϑΥτ
    ϦΞϦεςΟοΫͳը૾ΛϨϯμϦϯά͢ΔΑ͏ʹϞσϧΛֶश͠·͢ɽϒϩοΫੈքͷϖΞͱͳΔά
    ϥϯυτΡϧʔεͷ࣮ը૾͕ͳ͍৔߹ɺٙࣅάϥϯυτΡϧʔεͱఢରతֶशʹجֶ͍ͮͨशٕज़Λߟ
    Ҋ͠·ͨ͠ɻ͜Ε͸ɺϏϡʔ߹੒ͷͨΊͷχϡʔϥϧϨϯμϦϯάʹؔ͢Δઌߦݚڀͱ͸ରরతͰ͢ɻ
    χϡʔϥϧϨϯμϦϯάͰ͸ɺγʔϯͷδΦϝτϦ΍Ϗϡʔʹґଘ͢ΔΞϐΞϥϯεΛਪఆ͢ΔͨΊ
    ʹɺάϥ΢ϯυτΡϧʔεը૾͕ඞཁͱͳΓ·͢ɻGANcraftͰ͸ɺΧϝϥͷيಓʹՃ͑ͯɺγʔϯͷη
    ϚϯςΟΫεͱग़ྗελΠϧͷ྆ํΛϢʔβʔ੍͕ޚͰ͖·͢ɻڧྗͳϕʔεϥΠϯͱൺֱ࣮ͨ͠ݧ݁
    Ռ͸ɺϑΥτϦΞϦεςΟοΫͳ3DϒϩοΫϫʔϧυ߹੒ͱ͍͏৽͍͠λεΫʹର͢ΔGANcraftͷ༗
    ޮੑΛ͍ࣔͯ͠·͢ɻ͜ͷϓϩδΣΫτͷ΢ΣϒαΠτ͸ɺhttps://nvlabs.github.io/GANcraft/ ɻ
    http://arxiv.org/abs/2104.07659v1
    NVIDIA
    ˠϚΠϯΫϥϑτͷ̏%ߏ଄σʔλ͔Β
    ෩ܠࣸਅΛੜ੒͢Δ("/DSBGUΛެ։ͨ͠

    View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. ᶇStyleCLIP: StyleGANը૾ͷςΩετʹΑΔૢ࡞ํ๏


    (ݪจ: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery)


    StyleGAN͕༷ʑͳ෼໺ͰඇৗʹϦΞϧͳը૾Λੜ੒Ͱ͖Δ͜ͱʹ৮ൃ͞Εɺ࠷ۙͰ͸ɺੜ੒͞Εͨը૾
    ΍࣮෺ͷը૾Λૢ࡞͢ΔͨΊʹStyleGANͷજࡏۭؒΛͲͷΑ͏ʹ࢖༻͢Δ͔Λཧղ͢Δ͜ͱʹଟ͘ͷݚ
    ڀ͕ूத͍ͯ͠Δɻ͔͠͠ɺҙຯͷ͋Δજࡏతͳૢ࡞Λൃݟ͢ΔͨΊʹ͸ɺଟ͘ͷࣗ༝౓Λਓ͕ؒ୮೦ʹ
    ௐ΂ͨΓɺ໨తͷૢ࡞͝ͱʹը૾ΛूΊͯ஫ऍΛ෇͚ͨΓ͢Δඞཁ͕͋Γ·͢ɻຊݚڀͰ͸ɺ࠷ۙಋೖ͞
    ΕͨCLIPʢContrastive Language-Image Pre-trainingʣϞσϧΛ׆༻͢Δ͜ͱͰɺ͜ͷΑ͏ͳख࡞ۀΛඞ
    ཁͱ͠ͳ͍StyleGANը૾ૢ࡞ͷͨΊͷςΩετϕʔεͷΠϯλʔϑΣʔεΛ։ൃ͢Δ͜ͱΛݕ౼͢Δɻ
    ·ͣɺCLIPϕʔεͷଛࣦΛར༻ͯ͠ɺϢʔβʔ͕ఏڙ͢ΔςΩετϓϩϯϓτʹԠͯ͡ೖྗજࡏϕΫτϧ
    Λमਖ਼͢Δ࠷దԽεΩʔϜΛ঺հ͠·͢ɻ࣍ʹɺ༩͑ΒΕͨೖྗը૾ʹର͢ΔςΩετ༠ಋͷજࡏతૢ࡞
    εςοϓΛਪ࿦͢ΔજࡏతϚούʔʹ͍ͭͯઆ໌͠ɺΑΓߴ଎Ͱ҆ఆͨ͠ςΩετϕʔεͷૢ࡞ΛՄೳʹ
    ͠·͢ɻ࠷ޙʹɺςΩετϓϩϯϓτΛStyleGANͷελΠϧۭؒʹ͓͚Δೖྗʹґଘ͠ͳ͍ࢦࣔʹϚο
    ϐϯά͢Δํ๏Λࣔ͠ɺςΩετʹΑΔΠϯλϥΫςΟϒͳը૾ૢ࡞ΛՄೳʹ͢Δɻ޿ൣͳ݁Ռͱൺֱʹ
    ΑΓɺզʑͷΞϓϩʔνͷ༗ޮੑ͕࣮ূ͞Εͨɻ
    http://arxiv.org/abs/2103.17249v1
    ϔϒϥΠେֶɾςϧΞϏϒେֶɾAdobe Research
    ˠ$-*1ͱ4UZMF("/Λ૊Έ߹Θͤͯɺ
    ςΩετͰը૾ૢ࡞Ͱ͖Δ4UZMF$-*1Λ࡞ͬͨɻ

    View Slide

  30. View Slide

  31. ࢀߟ: CLIP


    จষͱը૾ͷ૊Έ߹ΘͤͰࣄલֶशͯ͠ɺθϩγϣοτ
    Ͱະ஌ͷը૾ΛΫϥε෼ྨ͢Δɻ(Ϋϥε͸ط஌)
    IUUQTPQFOBJDPNCMPHDMJQ

    View Slide

  32. ೚ҙͷະ஌ͷը૾ʹ͍ͭͯɺa photo of
    a ʙ ͱͯ͠จষΛਪ࿦͢Δͷ͕ಛ௃తɻ
    IUUQTPQFOBJDPNCMPHDMJQ

    View Slide

  33. IUUQTPQFOBJDPNCMPHDMJQ

    View Slide

  34. StyleCLIP

    View Slide

  35. ᶈLocalViTɿϏδϣϯτϥϯεϑΥʔϚʔʹ஍ҬੑΛ࣋ͨͤΔ


    (ݪจ: LocalViT: Bringing Locality to Vision Transformers)
    ຊݚڀͰ͸ɺϏδϣϯม׵ثʹҐஔ৘ใϝΧχζϜΛಋೖ͢Δํ๏Λݚڀ͢ΔɻมܗثωοτϫʔΫ͸ػց຋༁ʹ
    ༝དྷ͢Δ΋ͷͰɺಛʹ௕͍γʔέϯε಺ͷ௕ڑ཭ґଘؔ܎ΛϞσϧԽ͢Δͷʹద͍ͯ͠·͢ɻτʔΫϯΤϯϕσΟ
    ϯάؒͷάϩʔόϧͳ૬ޓ࡞༻͸ɺτϥϯεϑΥʔϚͷࣗݾ஫ҙϝΧχζϜʹΑͬͯ͏·͘ϞσϧԽͰ͖·͕͢ɺ
    ϩʔΧϧͳྖҬ಺Ͱͷ৘ใަ׵ͷͨΊͷϩʔΧϦςΟϝΧχζϜ͕ෆ଍͍ͯ͠·͢ɻ͔͠͠ɺը૾ʹͱͬͯہॴੑ
    ͸ɺઢɺΤοδɺܗঢ়ɺ͞Βʹ͸෺ମͳͲͷߏ଄ʹؔ܎͢ΔͨΊɺෆՄܽͰ͢ɻ ຊݚڀͰ͸ɺϑΟʔυϑΥϫʔυ
    ωοτϫʔΫʹਂ͞ํ޲ͷ৞ΈࠐΈΛಋೖ͢Δ͜ͱͰɺࢹ֮ม׵૷ஔʹہॴੑΛ࣋ͨͤ·͢ɻ͜ͷҰݟ୯७ͳղܾ
    ࡦ͸ɺϑΟʔυϑΥϫʔυωοτϫʔΫͱ൓సͨ͠࢒ࠩϒϩοΫͱͷൺֱ͔Βண૝Λಘ͍ͯ·͢ɻہॴੑϝΧχζ
    Ϝͷॏཁੑ͸ɺ2ͭͷํ๏Ͱݕূ͞ΕΔɻ1) ہॴੑϝΧχζϜΛ૊ΈࠐΉͨΊʹ͸ɼ෯޿͍ઃܭ্ͷબ୒ࢶʢ׆ੑ
    Խؔ਺ɼ૚ͷ഑ஔɼ֦େ཰ʣ͕͋Γɼ͢΂ͯͷద੾ͳબ୒͕ϕʔεϥΠϯΑΓ΋ੑೳ޲্ʹͭͳ͕Δ͜ͱɼ2) ಉ͡
    ہॴੑϝΧχζϜΛ4ͭͷࢹ֮ม׵ثʹద༻͢Δ͜ͱʹ੒ޭ͠ɼہॴੑίϯηϓτͷҰൠԽΛࣔͨ͜͠ͱͰ͢ɽಛ
    ʹɼImageNet2012ͷ෼ྨͰ͸ɼύϥϝʔλ਺ͱܭࢉྔͷ૿ՃΛແࢹͯ͠ɼDeiT-TͱPVT-T͕ϕʔεϥΠϯΛ2.6%ͱ
    3.1%্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɽίʔυ͸ɼURL{https://github.com/ofsoundof/LocalViT}Ͱެ։͞Ε͍ͯ·͢ɽ
    http://arxiv.org/abs/2104.05707v1
    νϡʔϦοώ޻Պେֶ, ϧʔϰΣϯɾΧτϦοΫେֶ
    ˠ7J5ͷվྑɻը૾಺ͷ࠲ඪ৘ใΛ΋ͨͤΔ޻෉Λ
    7J5ʹ௥Ճͨ͠Βੑೳ͕͕͋ͬͨ

    View Slide

  36. View Slide

  37. ᶉΩʔϫʔυτϥϯεϑΥʔϚʔΩʔϫʔυεϙοςΟϯάͷͨΊͷࣗݾݴٴϞ
    σϧ


    (ݪจ: Keyword Transformer: A Self-Attention Model for Keyword Spotting)
    TransformerͷΞʔΩςΫνϟ͸ɺࣗવݴޠॲཧɺίϯϐϡʔλϏδϣϯɺԻ੠ೝࣝͳͲɺ͞·
    ͟·ͳྖҬͰ੒ޭΛऩΊ͍ͯ·͢ɻΩʔϫʔυɾεϙοςΟϯάͰ͸ɺओʹ৞ΈࠐΈΤϯίʔ
    μʔ΍ϦΧϨϯτɾΤϯίʔμʔͷ্ʹࣗݾٵண͕࢖༻͞Ε͖ͯ·ͨ͠ɻຊݚڀͰ͸ɺ
    TransformerΞʔΩςΫνϟΛΩʔϫʔυɾεϙοςΟϯάʹదԠͤ͞ΔͨΊͷ༷ʑͳํ๏Λௐ
    ࠪ͠ɺࣄલֶश΍௥Ճσʔλͳ͠ʹෳ਺ͷλεΫͰ࠷ઌ୺ͷੑೳΛ্ճΔ׬શͳࣗݾ஫ҙܕ
    ΞʔΩςΫνϟͰ͋ΔΩʔϫʔυɾτϥϯεϑΥʔϚʔʢKWTʣΛ঺հ͠·͢ɻڻ͘΂͖͜ͱ
    ʹɺ͜ͷγϯϓϧͳΞʔΩςΫνϟ͸ɺ৞ΈࠐΈ૚ɺϦΧϨϯτ૚ɺؾ഑Γ૚Λࠞࡏͤͨ͞Α
    ΓෳࡶͳϞσϧΑΓ΋༏Ε͍ͯ·͢ɻKWT͸ɺ͜ΕΒͷϞσϧͷ୅ସͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ
    ͖ɺGoogle Speech Commandsσʔληοτʹ͓͍ͯɺ12ݸͷίϚϯυλεΫͰ98.6%ɺ
    35ݸͷίϚϯυλεΫͰ97.7%ͷਫ਼౓ͱ͍͏2ͭͷ৽͍͠ϕϯνϚʔΫه࿥Λୡ੒͠·ͨ͠ɻ
    http://arxiv.org/abs/2104.00769v2
    Arm ML Reserch Lab, ϧϯυେֶ
    ˠ5SBOTGPSNFSͰԻ੠ೝࣝɻεϚʔτεϐʔΧʔ౳ͰΑ͋͘ΔԻ੠͔Βͷ
    ΩʔϫʔυೝࣝλεΫΛɺ5SBOTGPSNFSΛ࢖ͬͨΒߴਫ਼౓ʹୡ੒Ͱ͖ͨɻ

    View Slide

  38. View Slide

  39. ᶊϚϧνεέʔϧϏδϣϯτϥϯεϑΥʔϚʔ


    (ݪจ: Multiscale Vision Transformers)
    զʑ͸ɺϚϧνεέʔϧಛ௃֊૚ͷਫ਼៛ͳΞΠσΞΛτϥϯεϑΥʔϚʔϞσϧͱ݁ͼ͚ͭΔ͜ͱʹΑ
    ΓɺϏσΦ͓Αͼը૾ೝࣝͷͨΊͷϚϧνεέʔϧɾϏδϣϯɾτϥϯεϑΥʔϚʔʢMViTʣΛൃද
    ͢ΔɻϚϧνεέʔϧɾτϥϯεϑΥʔϚʔ͸ɼෳ਺ͷνϟϯωϧղ૾౓εέʔϧͷஈ֊Λ࣋ͭɽೖྗ
    ղ૾౓ͱখ͞ͳνϟωϧ࣍ݩ͔Βελʔτ֤ͨ͠εςʔδ͸ɼۭؒղ૾౓ΛԼ͛ͳ͕Βνϟωϧ༰ྔΛ
    ֊૚తʹ֦େ͍͖ͯ͠·͢ɽ͜ΕʹΑΓɺ୯७ͳ௿Ϩϕϧͷࢹ֮৘ใΛϞσϧԽ͢ΔͨΊʹߴ͍ۭؒղ
    ૾౓Ͱಈ࡞͢Δॳظͷ૚ͱɺۭؒతʹૈ͍͕ෳࡶͳߴ࣍ݩͷಛ௃Λ࣋ͭਂ͍૚͔ΒͳΔɺಛ௃ͷϚϧν
    εέʔϧɾϐϥϛου͕ܗ੒͞ΕΔɻզʑ͸ɺ༷ʑͳϏσΦೝࣝλεΫʹ͓͍ͯɺࢹ֮৴߸ͷ៛ີͳੑ
    ࣭ΛϞσϧԽ͢ΔͨΊͷ͜ͷجຊతͳΞʔΩςΫνϟ༏ઌ౓ΛධՁͨ͠ͱ͜Ζɺେن໛ͳ֎෦ࣄલֶश
    ʹґଘ͠ɺܭࢉ΍ύϥϝʔλʹ͓͍ͯ5ʙ10ഒͷίετ͕͔͔Δطଘͷࢹ֮ม׵ثΑΓ΋༏Ε͍ͯ·͠
    ͨɻ͞Βʹɺ࣌ؒతͳ࣍ݩΛऔΓআ͖ɺը૾෼ྨʹ͜ͷϞσϧΛద༻ͨ͠ͱ͜Ζɺઌߦ͢Δࢹ֮ม׵૷
    ஔΛ্ճΔ݁Ռ͕ಘΒΕ·ͨ͠ɻίʔυ͸ https://github.com/facebookresearch/SlowFast ͔Βೖख
    ՄೳͰ͢ɻ
    http://arxiv.org/abs/2104.11227v1
    Facebook AI Research,ΧϦϑΥϧχΞେֶόʔΫϨʔߍ
    ˠ7J5ͷվྑɻτϥϯεϑΥʔϚʔͷۭؒɾνϟωϧʹ͍ͭͯϚϧνεέʔϧಛ
    ௃ͷ֊૚ԽΛͨ͠ΒγϯάϧεέʔϧτϥϯεϑΥʔϚʔΑΓੑೳ͕޲্ͨ͠ɻ
    طଘϥΠϒϥϦ1Z4MPX'BTUʹಉػೳΛ௥Ճͨ͠ɻ

    View Slide

  40. View Slide

  41. PySlowFast


    ϏσΦը૾෼ྨϞσϧ
    IUUQTHJUIVCDPNGBDFCPPLSFTFBSDI4MPX'BTU

    View Slide

  42. ࢀߟ: SlowFast (2018~2019)
    IUUQTBSYJWPSHBCT

    View Slide

  43. ᶋSiT: Self-supervised vIsion Transformer


    (ݪจ: SiT: Self-supervised vIsion Transformer)
    ࣗݾڭࢣ෇ֶ͖श๏͸ɺۙ೥ɺڭࢣ෇ֶ͖शͱͷࠩΛॖΊΔ͜ͱʹ੒ޭͨ͜͠ͱ͔ΒɺίϯϐϡʔλϏδϣϯͷ෼໺Ͱ·
    ͢·͢஫໨ΛूΊ͍ͯ·͢ɻࣗવݴޠॲཧʢNLPʣͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱม׵ث͸͢Ͱʹબ୒͞Ε͍ͯΔख๏Ͱ͢ɻ
    ࠷ۙͷจݙʹΑΔͱɺτϥϯεϑΥʔϚʔ͸ίϯϐϡʔλϏδϣϯͰ΋ਓؾ͕ߴ·͍ͬͯΔΑ͏Ͱ͢ɻ͜Ε·Ͱͷͱ͜
    ΖɺϏδϣϯม׵ث͸ɺେن໛ͳڭࢣ෇͖σʔλΛ༻͍ͯࣄલֶशΛߦ͏͔ɺڭࢣωοτϫʔΫͳͲͷԿΒ͔ͷڞಉڭࢣ
    Λ༻͍ͯࣄલֶशΛߦ͏ͱɺ͏·͘ػೳ͢Δ͜ͱ͕ࣔ͞Ε͍ͯΔɻ͜ΕΒͷڭࢣ෇͖ࣄલֶश͞Εͨࢹ֮ม׵ث͸ɺ࠷খ
    ݶͷมߋͰԼྲྀͷλεΫͰඇৗʹྑ͍݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻຊݚڀͰ͸ɺը૾/ࢹ֮ม׵ثΛࣄલֶश͠ɺԼྲྀͷ෼
    ྨλεΫʹ࢖༻͢ΔͨΊͷࣗݾڭࢣ෇ֶ͖शͷϝϦοτΛௐࠪ͢Δɻզʑ͸Self-supervised vIsion Transformers (SiT)Λ
    ఏҊ͠ɺϓϨςΩετϞσϧΛಘΔͨΊͷ͍͔ͭ͘ͷࣗݾڭࢣ෇ֶ͖शϝΧχζϜʹ͍ͭͯٞ࿦͢ΔɻSiTͷΞʔΩςΫ
    νϟͷॊೈੑʹΑΓɺΦʔτΤϯίʔμʔͱͯ͠࢖༻͢Δ͜ͱ͕Ͱ͖ɺෳ਺ͷࣗݾڭࢣ෇͖λεΫΛγʔϜϨεʹѻ͏͜
    ͱ͕Ͱ͖Δɻզʑ͸ɺ਺ඦສຕͷը૾Ͱ͸ͳ͘਺ઍຕͷը૾Ͱߏ੒͞ΕΔখن໛ͳσʔληοτʹ͓͍ͯɺࣄલʹֶश͠
    ͨSiTΛԼྲྀͷ෼ྨλεΫͷͨΊʹඍௐ੔Ͱ͖Δ͜ͱΛࣔ͢ɻఏҊ͞ΕͨΞϓϩʔν͸ɼҰൠతͳϓϩτίϧΛ༻͍ͨඪ
    ४తͳσʔληοτͰධՁ͞Εͨɽͦͷ݁Ռɺม׵ثͷڧ͞ͱɺࣗݾڭࢣ෇ֶ͖श΁ͷదੑ͕࣮ূ͞Εͨɻզʑ͸ɺطଘ
    ͷࣗݾڭࢣ෇ֶ͖श๏ΛେࠩͰ྇կͨ͠ɻ·ͨɺSiT͕਺γϣοτͷֶशʹద͍ͯ͠Δ͜ͱΛ֬ೝ͠ɺ͞ΒʹɺSiT͔Βֶ
    शͨ͠ಛ௃ྔͷ্ʹઢܗ෼ྨثΛֶश͢Δ͚ͩͰɺ༗༻ͳදݱΛֶश͍ͯ͠Δ͜ͱΛࣔ͠·ͨ͠ɻϓϨτϨʔχϯάɺ
    ϑΝΠϯνϡʔχϯάɺ͓ΑͼධՁίʔυ͸ɺhttps://github.com/Sara-Ahmed/SiTɻ
    http://arxiv.org/abs/2104.03602v1
    IEEE
    ˠ*&&&ʹΑΔࣗݾڭࢣֶ͖ͭश5SBOTGPSNFS4J5ͷൃදɻ

    View Slide

  44. View Slide

  45. ̎छྨͷը૾Ճ޻ํ๏Λ࠾༻

    View Slide

  46. View Slide

  47. ᶌϞʔγϣϯɾάϧʔϐϯάʹΑΔࣗݾڭࢣ෇͖ө૾ΦϒδΣΫτɾηάϝϯ
    ςʔγϣϯ


    (ݪจ: Self-supervised Video Object Segmentation by Motion Grouping)
    ಈ෺͸ӡಈΛཧղ͢ΔͨΊʹߴػೳͳࢹ֮γεςϜΛਐԽͤ͞ɺෳࡶͳ؀ڥԼͰ΋஌֮Λॿ͚͍ͯΔɻຊ࿦จͰ
    ͸ɺϞʔγϣϯΩϡʔΛར༻ͯ͠෺ମΛ෼ׂ͢Δ͜ͱ͕Ͱ͖ΔίϯϐϡʔλϏδϣϯγεςϜɺ͢ͳΘͪϞʔ
    γϣϯηάϝϯςʔγϣϯͷ։ൃʹऔΓ૊ΜͰ͍·͢ɻຊ࿦จͰ͸ɺ࣍ͷΑ͏ͳߩݙΛ͍ͯ͠·͢ɻୈҰʹɺ
    Transformerͷ؆୯ͳվྑ൛Λಋೖ͠ɺΦϓςΟΧϧϑϩʔϑϨʔϜΛओཁͳΦϒδΣΫτͱഎܠʹ෼ׂ͠·͢ɻ
    ୈೋʹɺ͜ͷΞʔΩςΫνϟΛɺखಈͷΞϊςʔγϣϯΛ࢖༻ͤͣʹɺࣗݾڭࢣ෇͖Ͱֶश͠·͢ɻୈ3ʹɺզʑ
    ͷख๏ͷॏཁͳίϯϙʔωϯτΛ෼ੳ͠ɺͦͷඞཁੑΛݕূ͢ΔͨΊʹపఈతͳΞϒϨʔγϣϯݚڀΛߦ͍·
    ͢ɻୈ4ʹɼఏҊͨ͠ΞʔΩςΫνϟΛύϒϦοΫϕϯνϚʔΫʢDAVIS2016ɼSegTrackv2ɼFBMS59ʣͰධՁ
    ͢ΔɽΦϓςΟΧϧϑϩʔͷΈΛೖྗͱ͍ͯ͠Δʹ΋͔͔ΘΒͣɼզʑͷΞϓϩʔν͸ɼ͜Ε·Ͱͷ࠷ઌ୺ͷࣗ
    ݾڭࢣ෇͖ख๏ͱൺֱͯ͠ɼ༏Εͨɼ͋Δ͍͸ಉ౳ͷ݁ՌΛୡ੒͢Δͱͱ΋ʹɼܻҧ͍ʹߴ଎Ͱ͋Δ͜ͱ͕Θ
    ͔ͬͨɽ͞Βʹɺ೉қ౓ͷߴ͍ΧϞϑϥʔδϡσʔληοτʢMoCAʣΛ༻͍ͯධՁͨ͠ͱ͜Ζɺଞͷࣗݾڭࢣ
    ෇͖ΞϓϩʔνΛେ෯ʹ্ճΓɺτοϓͷڭࢣ෇͖Ξϓϩʔνͱͷൺֱ΋ྑ޷ͰɺϞʔγϣϯΩϡʔͷॏཁੑ
    ͱɺطଘͷϏσΦηάϝϯςʔγϣϯϞσϧʹ͓͚Δࢹ֮తͳ֎؍΁ͷજࡏతͳภΓ͕ڧௐ͞Ε·ͨ͠ɻ
    http://arxiv.org/abs/2104.07658v1
    ΦοΫεϑΥʔυେֶ
    ˠಈըΛର৅ʹͨࣗ͠ݾڭࢣ͋Γֶशͷݚڀɻಈը಺ͷ෺ମͷಈ͖͚ͩΛ
    ώϯτʹࣗݾڭࢣ͋ΓֶशͰϏσΦσʔλ͔Β෺ମηάϝϯςʔγϣϯΛ
    ߦͬͨΒɺٖଶಈ෺ͷಈըͰߴ଎ɾߴਫ਼౓Ͱݕग़Ͱ͖ͨɻ

    View Slide

  48. View Slide

  49. Top hype: Best10

    View Slide

  50. ᶃ ࠷খݶͷ࿪ΈΛར༻ͨ͠ΤϯϕοσΟϯά


    (ݪจ: Minimum-Distortion Embedding)
    ͜͜Ͱ͸ɼϕΫτϧຒΊࠐΈ໰୊Λߟ͑Δɽ༗ݶݸͷΞΠςϜͷू߹͕༩͑ΒΕɼ֤ΞΠςϜʹ୅දతͳϕΫτϧΛׂ
    Γ౰ͯΔ͜ͱ͕໨తͰ͋Δɽ͍͔ͭ͘ͷΞΠςϜͷϖΞ͕ྨࣅ͓ͯ͠Γɺ೚ҙʹ͍͔ͭ͘ͷଞͷϖΞ͕ඇྨࣅͰ͋Δ͜
    ͱΛࣔ͢σʔλ͕༩͑ΒΕ·͢ɻྨࣅͨ͠ΞΠςϜͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ʹ͋Δ͜ͱ͕๬·Εɼ
    ඇྨࣅͷϖΞͰ͸ɼରԠ͢ΔϕΫτϧ͕ޓ͍ʹۙ͘ͳ͍͜ͱ͕๬·Ε·͢ʢϢʔΫϦουڑ཭Ͱଌఆ͞Ε·͢ʣɽզʑ
    ͸ɺΞΠςϜͷ͍͔ͭ͘ͷϖΞʹ͍ͭͯఆٛ͞Εͨ࿪Έؔ਺Λಋೖ͢Δ͜ͱʹΑͬͯɺ͜ΕΛެࣜԽ͠·͢ɻզʑͷ໨
    త͸ɺ੍໿৚݅ͷ΋ͱͰɺશମͷ࿪ΈΛ࠷খʹ͢ΔຒΊࠐΈΛબͿ͜ͱͰ͋Δɻ͜ΕΛɺ࠷খ࿪ΈຒΊࠐΈʢMDEʣ໰
    ୊ͱݺͿɻ MDEͷϑϨʔϜϫʔΫ͸୯७Ͱ͕͢ɺҰൠతͰ͢ɻMDEʹ͸ɺεϖΫτϧຒΊࠐΈɺओ੒෼෼ੳɺଟ࣍ݩ
    εέʔϦϯάɺIsomap΍UMAPͷΑ͏ͳ࣍ݩ࡟ݮ๏ɺྗ೚ͤͷϨΠΞ΢τͳͲɺ͞·͟·ͳຒΊࠐΈํ๏ؚ͕·Ε͍ͯ
    ·͢ɻ·ͨɺ৽͍͠ຒΊࠐΈ๏΋ؚ·Ε͓ͯΓɺྺ࢙తͳຒΊࠐΈ๏ͱ৽͍͠ຒΊࠐΈ๏Λಉ༷ʹݕূ͢Δݪཧతͳํ
    ๏Λఏڙ͍ͯ͠·͢ɻ MDEͷ໰୊Λۙࣅతʹղܾ͠ɺେن໛ͳσʔληοτʹରԠ͢Δ౤Өܕ४χϡʔτϯ๏Λ։ൃ
    ͠·ͨ͠ɻ͜ͷख๏͸ɺΦʔϓϯιʔεͷPythonύοέʔδͰ͋ΔPyMDEʹ࣮૷͞Ε͍ͯ·͢ɻPyMDEͰ͸ɺϢʔβ͸
    ࿪Έؔ਺ͱ੍໿ͷϥΠϒϥϦ͔Βબ୒ͨ͠ΓɺΧελϜͷ΋ͷΛࢦఆͨ͠Γ͢Δ͜ͱ͕Ͱ͖ɺ༷ʑͳຒΊࠐΈΛ؆୯ʹ
    ࢼ͢͜ͱ͕Ͱ͖·͢ɻ͜ͷιϑτ΢ΣΞ͸ɺ਺ඦສͷΞΠςϜͱ਺ઍສͷ࿪Έؔ਺Λ࣋ͭσʔληοτʹରԠ͍ͯ͠·
    ͢ɻզʑͷख๏Λ࣮ূ͢ΔͨΊʹɼը૾ɼֶज़తͳڞஶऀωοτϫʔΫɼถࠃͷ܊ͷਓޱ౷ܭσʔλɼ୯Ұࡉ๔ͷ
    mRNAτϥϯεΫϦϓτʔϜͳͲɼ͍͔ͭ͘ͷ࣮ੈքͷσʔληοτͷຒΊࠐΈΛܭࢉͨ͠ɽ
    http://arxiv.org/abs/2103.02559v2
    ελϯϑΥʔυେֶ
    ˠ$POUSBTUJWF-FBSOJOH౳ʹؔ࿈͢Δجૅݚڀɻू߹಺ͷཁૉ͝ͱͷྨࣅ౓ΛϕΫτϧͰදݱ͢ΔΞϧΰϦζϜ
    ͷҰൠԽɾެࣜԽͱɺϑϨʔϜϫʔΫ1Z.%&ͷ঺հ

    View Slide

  51. View Slide

  52. ᶄStyleCLIP: StyleGANը૾ͷςΩετʹΑΔૢ࡞ํ๏


    (ݪจ: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery)
    recentͱॏෳ

    View Slide

  53. ᶅRepVGG: VGGελΠϧͷConvNetsΛ࠶ͼૉ੖Β͘͢͠Δ


    (ݪจ: RepVGG: Making VGG-style ConvNets Great Again)
    ৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷγϯϓϧͰڧྗͳΞʔΩςΫνϟΛఏҊ͢Δɻ͜ͷΞʔ
    ΩςΫνϟ͸ɺ3x3৞ΈࠐΈͱReLUͷελοΫ͚ͩͰߏ੒͞ΕͨVGGͷΑ͏ͳਪ࿦࣌ͷϘ
    σΟΛ࣋ͪɺτϨʔχϯά࣌ͷϞσϧ͸ଟࢬͷτϙϩδʔΛ࣋ͭɻ͜ͷΑ͏ͳֶश࣌ͱਪ࿦
    ࣌ͷΞʔΩςΫνϟͷ੾Γ཭͠͸ɺߏ଄తͳ࠶ύϥϝʔλԽٕज़ʹΑ࣮ͬͯݱ͞Ε͓ͯΓɺ
    ͜ͷϞσϧ͸RepVGGͱ໊෇͚ΒΕ͍ͯ·͢ɻImageNetʹ͓͍ͯɺRepVGG͸80%Ҏ্ͷ
    τοϓ1ਫ਼౓Λୡ੒͓ͯ͠Γɺ͜Ε͸զʑͷ஌ΔݶΓɺϓϨʔϯϞσϧͱͯ͠͸ॳΊͯͷ͜ͱ
    Ͱ͢ɻNVIDIA 1080Ti GPU্Ͱ͸ɺRepVGGϞσϧ͸ɺResNet-50ΑΓ΋83ˋɺResNet-101
    ΑΓ΋101ˋߴ଎ʹಈ࡞͠ɺߴਫ਼౓ͰɺEf
    fi
    cientNet΍RegNetͳͲͷ࠷ઌ୺Ϟσϧͱൺֱ͠
    ͯɺྑ޷ͳਫ਼౓-଎౓τϨʔυΦϑΛ͍ࣔͯ͠·͢ɻ ίʔυͱֶशࡁΈϞσϧ͸ɺhttps://
    github.com/megvii-model/RepVGGɻ
    http://arxiv.org/abs/2101.03697v3
    ਗ਼՚େֶ, Megvii, ߳ߓՊٕେֶ
    ˠ7((ͷվྑɻࠓͲ͖ͷϞσϧ͸'-01ίετ͕ߴ͗ͨ͢ΓඞཁҎ্ʹෳ
    ࡶͳͷͰɺ7((Λվྑͯ͠࠷ઌ୺ϞσϧฒΈͷੑೳʹͯ͠Έͨɻ

    View Slide

  54. RepVGGͷϞσϧߏ଄

    View Slide

  55. ᶆੜ෺ֶͱҩֶʹ͓͚ΔωοτϫʔΫͷͨΊͷදݱֶशɻ ਐาɺ௅ઓɺͦͯ͠ػձ


    (ݪจ: Representation Learning for Networks in Biology and Medicine:


    Advancements, Challenges, and Opportunities)


    දݱֶश͕ڧྗͳ༧ଌͱσʔλͷಎ࡯Λఏڙ͢Δ͜ͱʹ੒ޭͨ͜͠ͱͰɺදݱֶशٕज़͸
    ωοτϫʔΫͷϞσϦϯάɺ෼ੳɺֶश΁ͱٸ଎ʹ֦େ͍ͯ͠·͢ɻੜ෺ҩֶωοτϫʔΫ
    ͸ɺλϯύΫ࣭ͷ૬ޓ࡞༻͔Β࣬පωοτϫʔΫɺ͞Βʹ͸ҩྍγεςϜ΍Պֶత஌ࣝʹࢸ
    Δ·Ͱɺ૬ޓ࡞༻͢ΔཁૉͷγεςϜΛද͢ීวతͳهड़Ͱ͋Δɻ͜ͷϨϏϡʔͰ͸ɺωο
    τϫʔΫੜ෺ֶͱҩֶͷ௕೥ʹΘͨΔݪଇ͕ɺػցֶशͷݚڀͰ͸ޠΒΕͳ͍͜ͱ͕ଟ͍
    ͕ɺදݱֶशͷ֓೦తͳج൫Λఏڙ͠ɺݱࡏͷ੒ޭͱݶքΛઆ໌͠ɺকདྷͷਐาʹ໾ཱͯΔ
    ͜ͱ͕Ͱ͖Δͱ͍͏ݟղΛ͍ࣔͯ͠Δɻຊ࿦จͰ͸ɺωοτϫʔΫΛίϯύΫτͳϕΫτϧ
    ۭؒʹຒΊࠐΉͨΊʹҐ૬తͳಛ௃Λར༻͢Δ͜ͱΛ֩ͱͨ͠ɺ͞·͟·ͳΞϧΰϦζϜͷ
    ΞϓϩʔνΛ·ͱΊ͍ͯΔɻ·ͨɺΞϧΰϦζϜͷֵ৽͔Β࠷΋ԸܙΛड͚ΔՄೳੑͷߴ͍
    ੜ෺ҩֶ෼໺ͷ෼ྨ๏Λఏڙ͠·͢ɻදݱֶशٕज़͸ɺෳࡶͳܗ࣭ͷࠜఈʹ͋ΔҼՌؔ܎Λ
    ಛఆͨ͠Γɺ୯Ұࡉ๔ͷߦಈͱ݈߁΁ͷӨڹΛ෼཭ͨ͠Γɺ҆શͰޮՌతͳҩༀ඼ͰපؾΛ
    ਍அɾ࣏ྍͨ͠Γ͢ΔͨΊʹෆՄܽͳ΋ͷͱͳ͍ͬͯΔɻ
    http://arxiv.org/abs/2104.04883v1
    ϋʔόʔυେֶҩֶେֶӃ
    ˠੜ෺ֶͱҩֶ෼໺Ͱͷάϥϑදݱֶशʹ͍ͭͯͷϨϏϡʔ

    View Slide

  56. View Slide

  57. View Slide

  58. ᶇΫϩεόϦσʔγϣϯɿԿΛਪఆ͢Δͷ͔ɺͲͷఔ౓ͷޮՌ͕͋
    Δͷ͔ʁ
    ݪจ$SPTTWBMJEBUJPOXIBUEPFTJUFTUJNBUFBOEIPXXFMM
    EPFTJUEPJU

    recentͱॏෳ

    View Slide

  59. ᶈଟ༷ͳΞϐΞϥϯευϝΠϯͱλεΫλΠϓؒͷτϥϯεϑΝʔϥʔχϯάʹ
    ӨڹΛ༩͑ΔཁҼ


    (ݪจ: Factors of In
    fl
    uence for Transfer Learning across Diverse Appearance


    Domains and Task Types)


    సҠֶशͱ͸ɺݩͱͳΔλεΫͰֶशͨ͠஌ࣝΛɺର৅ͱͳΔλεΫͷֶशʹ࠶ར༻͢Δ͜ͱͰ͢ɻ
    ILSVRCσʔληοτΛ༻͍ͯը૾෼ྨϞσϧΛࣄલʹֶश͠ɺͦͷޙɺ೚ҙͷλʔήοτλεΫͰඍௐ
    ੔Λߦ͏ͱ͍ͬͨ୯७ͳܗͷసҠֶश͸ɺݱࡏͷ࠷ઌ୺ͷίϯϐϡʔλϏδϣϯϞσϧͰ͸ҰൠతʹߦΘ
    Ε͍ͯΔɻ͔͠͠ɺ͜Ε·Ͱͷ఻ୡֶशʹؔ͢Δମܥతͳݚڀ͸ݶΒΕ͓ͯΓɺ఻ୡֶश͕ͲͷΑ͏ͳঢ়
    گͰػೳ͢Δ͜ͱ͕ظ଴͞ΕΔͷ͔ɺे෼ʹཧղ͞Ε͍ͯͳ͍ɻຊ࿦จͰ͸ɺඇৗʹҟͳΔը૾υϝΠϯ
    ʢফඅऀͷࣸਅɺࣗ཯૸ߦɺߤۭࣸਅɺਫதɺ԰಺γʔϯɺ߹੒ɺΫϩʔζΞοϓʣͱλεΫλΠϓʢη
    ϚϯςΟοΫηάϝϯςʔγϣϯɺΦϒδΣΫτݕग़ɺਂ౓ਪఆɺΩʔϙΠϯτݕग़ʣΛର৅ʹɺసҠֶ
    शͷ޿ൣͳ࣮ݧతௐࠪΛ࣮ࢪ͠·ͨ͠ɻॏཁͳͷ͸ɺ͜ΕΒͷλεΫ͸͢΂ͯɺݱ୅ͷίϯϐϡʔλϏ
    δϣϯΞϓϦέʔγϣϯʹؔ࿈͢ΔɺෳࡶͰߏ଄Խ͞Εͨग़ྗλεΫͰ͋Δͱ͍͏͜ͱͰ͢ɻ߹ܭͰ
    1200Ҏ্ͷసૹ࣮ݧΛߦ͍·ͨ͠ɻͦͷதʹ͸ɺιʔεͱλʔήοτ͕ҟͳΔը૾υϝΠϯɺλεΫλ
    Πϓɺ·ͨ͸ͦͷ྆ํ͔Βߏ੒͞Ε͍ͯΔ΋ͷ΋ଟؚ͘·Ε͍ͯ·͢ɻ͜ΕΒͷ࣮ݧΛମܥతʹ෼ੳ͠ɺ
    ը૾υϝΠϯɺλεΫλΠϓɺσʔληοτͷαΠζ͕఻ୡֶशͷύϑΥʔϚϯεʹ༩͑ΔӨڹΛཧղ͠
    ·͢ɻ͜ͷݚڀʹΑΓɺ͍͔ͭ͘ͷಎ࡯͕ಘΒΕɺ࣮຿ऀ΁ͷ۩ମతͳఏҊʹͭͳ͕Γ·ͨ͠ɻ
    http://arxiv.org/abs/2103.13318v1
    Google Research
    ˠը૾υϝΠϯͷసҠֶशͷௐࠪ࿦จ

    View Slide

  60. View Slide

  61. ᶃͲͷσʔληοτͰࣄલֶशͯ͠ɺ

    Ͳ
    ͷ
    σ
    λ
    η
    τ
    ʹ

    Ҡ
    ֶ

    ͠
    ͨ
    ͔
    ʁ

    View Slide

  62. ཁ໿
    • ը૾υϝΠϯ͕Ұ൪ॏཁɻ࠷ྑͷ݁ՌΛಘΔͨΊʹ͸ಉ͡ը૾
    υϝΠϯΛؚΉλεΫ͔ΒͷసҠֶशʹ͢΂͖


    • ಉ͡υϝΠϯͰͳͯ͘΋ɺ޿͍υϝΠϯ͔ΒͷసҠֶश͸ෛͷ
    ޮՌ͸ຆͲͳ͍ɻେن໛σʔληοτΛ࢖͓͚ͬͯ͹େମେৎ
    ෉͕ͩޮՌ͕ͳ͍͜ͱ΋͋Δɻ(ྫ: COCO͔ΒͷసҠֶशશൠ)


    • సҠݩɾసҠઌͷλεΫλΠϓͷؔ܎ʹΑͬͯ͸ɺλεΫλΠ
    ϓΛ௒͑ͨ఻ୡ͕༗ӹͳ͜ͱ΋͋Δɻ(ྫ:Driving → Aerial,
    Consumer → Indoor)

    View Slide

  63. ᶉͳͥہॴ๏Ͱඇತ໰୊͕ղ͚Δͷ͔ʁ


    (ݪจ: Why Do Local Methods Solve Nonconvex Problems?)
    ݱ୅ͷػցֶशͰ͸ɺඇತ࠷దԽ͕͍ͨΔͱ͜ΖͰߦΘΕ͍ͯ·͢ɻݚڀऀ͸
    ඇತͷ໨తؔ਺ΛߟҊ͠ɺہॴతͳܗঢ়Λར༻ͯ͠൓෮తʹߋ৽͢Δ֬཰తޯ
    ഑߱Լ๏΍ͦͷѥछͳͲͷࢢൢͷΦϓςΟϚΠβʔΛ༻͍ͯ࠷దԽ͠·͢ɻඇ
    ತؔ਺ͷղ๏͸࠷ѱͷ৔߹NPϋʔυͰ͋Δʹ΋͔͔ΘΒͣɺ࣮ࡍʹ͸࠷దԽͷ
    ࣭͸໰୊ʹͳΒͳ͍͜ͱ͕ଟ͍ɻΦϓςΟϚΠβʔ͸ۙࣅతʹάϩʔόϧϛχ
    ϚϜΛݟ͚ͭΔͱߟ͑ΒΕ͍ͯΔ͔Βͩɻݚڀऀͨͪ͸ɺ͜ͷڵຯਂ͍ݱ৅Λ
    ౷Ұతʹઆ໌͢ΔԾઆΛཱͯ·ͨ͠ɻͦΕ͸ɺ࣮ࡍʹ࢖༻͞Ε͍ͯΔ໨తͷ΄
    ͱΜͲͷϩʔΧϧϛχϚϜ͕ɺۙࣅతͳάϩʔόϧϛχϚϜͰ͋Δͱ͍͏΋ͷ
    Ͱ͢ɻຊݚڀͰ͸ɺ͜ͷԾઆΛػցֶश໰୊ͷ۩ମతͳࣄྫʹରͯ͠ݫີʹܗ
    ࣜԽ͍ͯ͠·͢ɻ
    http://arxiv.org/abs/2103.13462v1
    ελϯϑΥʔυେֶ
    ˠ0QUJNJ[FSΛ࢖ͬͯͳֶͥश͕࠷దԽͰ͖͍ͯΔͷ͔ͷݚڀ

    View Slide

  64. ᶊϘʔυήʔϜʹΑΔεέʔϦϯάͷ๏ଇ


    (ݪจ: Scaling Scaling Laws with Board Games)
    ػցֶशͷେن໛ͳ࣮ݧʹ͸ɺҰ෦ͷػؔΛআ͍ͯɺ༧ࢉΛ͸Δ͔ʹ௒͑ΔϦ
    ιʔε͕ඞཁʹͳΓ·͢ɻ޾͍ͳ͜ͱʹɺ͜ͷΑ͏ͳେن໛ͳ࣮ݧͷ݁Ռ͸ɺ
    ͸Δ͔ʹখن໛Ͱ҆ՁͳҰ࿈ͷ࣮ݧͷ݁Ռ͔ΒਪఆͰ͖Δ৔߹͕ଟ͍͜ͱ͕࠷
    ۙ໌Β͔ʹͳΓ·ͨ͠ɻຊݚڀͰ͸ɺϞσϧͷେ͖͚ͩ͞Ͱͳ͘ɺ໰୊ͷେ͖
    ͞ʹ΋ج͍ͮͯਪఆͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻAlphaZeroͱHexΛ࢖ͬͯҰ࿈
    ͷ࣮ݧΛߦ͏͜ͱͰɺҰఆͷܭࢉྔͰୡ੒Ͱ͖Δੑೳ͕ɺήʔϜͷن໛͕େ͖
    ͘ͳͬͯ೉͘͠ͳΔʹͭΕͯ༧ଌՄೳʹ௿Լ͢Δ͜ͱΛࣔ͠·ͨ͠ɻ·ͨɺओ
    ͳ݁Ռͱͯ͠ɺΤʔδΣϯτ͕ར༻ՄೳͳςετλΠϜͱτϨʔχϯάλΠϜ
    ͷܭࢉྔ͸ɺੑೳΛҡ࣋͠ͳ͕ΒτϨʔυΦϑͰ͖Δ͜ͱΛ͍ࣔͯ͠·͢ɻ
    http://arxiv.org/abs/2104.03113v2
    Andy Jones (ϩϯυϯ)
    ˠݱ࣮ࣾձʹػցֶशΛద༻͢Δ࣌ͷίετݟੵ΋Γʹ໾ཱͭݚڀɻ"MQIB;FSP
    ͱ)FYΛྫʹͯ͠ϘʔυήʔϜͷ"*ΞϧΰϦζϜͷݚڀΛͨ͠ɻٻΊΔੑೳɾ໰
    ୊ͷେ͖͞ʹΑͬͯɺֶशίετɾܭࢉίετ͕Ͳ͏มΘΔ͔Λ·ͱΊͨɻ

    View Slide

  65. IUUQTKBXJLJQFEJBPSHXJLJϔοΫε@ ϘʔυήʔϜ

    View Slide

  66. (AlphaZeroͰ͸)ಉ͡ڧ͞(Ϩʔτ)Λ࣋ͭΞϧΰϦζ
    Ϝͷ৔߹ɺֶशॲཧ࣌ؒͱɺਪ࿦ॲཧ࣌ؒ͸൓ൺྫ
    ͢Δɻ

    View Slide

  67. ᶋDense PredictionΛՄೳʹ͢ΔϏδϣϯτϥϯεϑΥʔϚʔ


    (ݪจ: Vision Transformers for Dense Prediction)
    ີͳ༧ଌλεΫͷόοΫϘʔϯͱͯ͠ɺ৞ΈࠐΈωοτϫʔΫͷ୅ΘΓʹࢹ֮ม׵ثΛ׆༻͢ΔΞʔΩςΫ
    νϟͰ͋Δʮີͳࢹ֮ม׵ثʯΛ঺հ͠·͢ɻࢹ֮ม׵ثͷ༷ʑͳஈ֊ͰಘΒΕͨτʔΫϯΛ༷ʑͳղ૾౓ͷ
    ը૾ͷΑ͏ͳදݱʹ૊Έཱͯɺ৞ΈࠐΈσίʔμΛ༻͍ͯϑϧղ૾౓ͷ༧ଌʹஈ֊తʹ݁߹͠·͢ɻม׵ثͷ
    όοΫϘʔϯ͸ɺҰఆͷൺֱతߴ͍ղ૾౓ͰදݱΛॲཧ͠ɺ͢΂ͯͷஈ֊Ͱάϩʔόϧͳड༰໺Λ͍࣋ͬͯ·
    ͢ɻ͜ΕΒͷಛੑʹΑΓɺ͜ͷߴີ౓Ϗδϣϯม׵ث͸ɺ׬શͳ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɺΑΓ͖Ί
    ࡉ͔͘ɺΑΓάϩʔόϧʹҰ؏ͨ͠༧ଌΛߦ͏͜ͱ͕Ͱ͖·͢ɻզʑͷ࣮ݧʹΑΔͱɺ͜ͷΞʔΩςΫνϟ
    ͸ɺಛʹେྔͷֶशσʔλ͕ར༻Մೳͳ৔߹ɺີͳ༧ଌλεΫʹ͓͍ͯେ෯ͳվળΛ΋ͨΒ͢ɻ୯؟ͷਂ౓ਪ
    ఆͰ͸ɼ࠷ઌ୺ͷ׬શ৞ΈࠐΈωοτϫʔΫͱൺֱͯ͠ɼ૬ରతͳੑೳ͕࠷େͰ28%޲্ͨ͜͠ͱ͕֬ೝ͞Ε
    ͨɽ·ͨɺηϚϯςΟοΫηάϝϯςʔγϣϯʹద༻ͨ͠ͱ͜Ζɺີ౓ͷߴ͍Ϗδϣϯม׵͸ɺADE20Kʹ͓
    ͍ͯ49.02%ͷmIoUΛୡ੒͠ɺ৽ͨͳٕज़ਫ४Λཱ֬͠·ͨ͠ɻ͞ΒʹɺNYUv2ɺKITTIɺPascal ContextͳͲ
    ͷখن໛ͳσʔληοτʹ͓͍ͯ΋ɺΞʔΩςΫνϟͷඍௐ੔͕ՄೳͰ͋Δ͜ͱΛ͓ࣔͯ͠Γɺ͜͜Ͱ΋৽ͨ
    ͳٕज़ਫ४Λཱ͍֬ͯ͠·͢ɻզʑͷϞσϧ͸ɺhttps://github.com/intel-isl/DPTɻ
    http://arxiv.org/abs/2103.13413v1
    Intel Labs
    ˠ7J5ͰηϚϯςΟοΫηάϝϯςʔγϣϯɻ
    ྫͱͯ͠ɺ୯؟ࣸਅͷਂ౓ਪఆɾηάϝϯςʔγϣϯͰ޷݁Ռʹͳͬͨɻ

    View Slide

  68. View Slide

  69. ᶌ&
    ffi
    DJFOU/FU7ɻΑΓখ͞ͳϞσϧͱΑΓ଎͍τϨʔχϯά
    ݪจ&
    ff
    i
    DJFOU/FU74NBMMFS.PEFMTBOE'BTUFS5SBJOJOH

    recentͱॏෳ

    View Slide

  70. DeepL Translator (deepl.com)
    https://www.deepl.com/en/translator

    View Slide