$30 off During Our Annual Pro Sale. View Details »

20220112_AI勉強会

M.Inomata
January 11, 2022

 20220112_AI勉強会

M.Inomata

January 11, 2022
Tweet

More Decks by M.Inomata

Other Decks in Technology

Transcript

  1. AI࠷৽࿦จಡΈձ2022೥1݄
    ᷂tech vein ழມ ॆԝ

    View Slide

  2. ࣗݾ঺հ
    ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ)


    גࣜձࣾ tech vein / DeepRad גࣜձࣾ


    ֤୅දऔక໾ ݉ σϕϩούʔ


    twitter: @ino2222

    View Slide

  3. Facebook άϧʔϓͷ঺հ
    IUUQTXXXGBDFCPPLDPNHSPVQT

    View Slide

  4. ΞδΣϯμ
    Archive Sanity (arxiv-sanity.com) ͔ΒϐοΫΞο
    ϓͨ͠ɺarxiv.org ͷաڈ1ϲ݄ؒͷ࿦จ঺հɻ


    ɾҰ൪ؾʹͳͬͨ࿦จͷ঺հ


    ɾtop recentͷ࿦จτοϓ10 Ϧετ


    ɾtop hype ͷ࿦จτοϓ10 Ϧετ


    View Slide

  5. Archive Sanity?
    https://www.arxiv-sanity.com/top

    View Slide

  6. ໨࣍

    View Slide

  7. Top10 Recent
    1. Plenoxels: Radiance Fields without Neural Networks


    2. GLIDE: Towards Photorealistic Image Generation and Editing with Text-
    Guided Diffusion Models


    3. Uni-Perceiver: Pre-training Uni
    fi
    ed Architecture for Generic Perception
    for Zero-shot and Few-shot Tasks


    4. Masked Feature Prediction for Self-Supervised Visual Pre-Training


    5. Self-attention Does Not Need $O(n^2)$ Memory


    6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-
    Throughs


    7. Exploring the Equivalence of Siamese Self-Supervised Learning via A
    Uni
    fi
    ed Gradient Framework


    8. Improving language models by retrieving from trillions of tokens


    9. BEVT: BERT Pretraining of Video Transformers


    10. SLIP: Self-supervision meets Language-Image Pre-training

    View Slide

  8. Top10 Hype
    1. Critical Sentence Identi
    fi
    cation in Legal Cases Using Multi-Class
    Classi
    fi
    cation


    2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts


    3. Plenoxels: Radiance Fields without Neural Networks


    4. Show Your Work: Scratchpads for Intermediate Computation with Language
    Models


    5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos


    6. GLIDE: Towards Photorealistic Image Generation and Editing with Text-
    Guided Diffusion Models


    7. Ef
    fi
    cient Geometry-aware 3D Generative Adversarial Networks


    8. Player of Games


    9. NL-Augmenter: A Framework for Task-Sensitive Natural Language
    Augmentation


    10. FuseDream: Training-Free Text-to-Image Generation with Improved
    CLIP+GAN Space Optimization
    1JDL6Q

    View Slide

  9. Top10 Recent (ςʔϚผ)
    1. Plenoxels: Radiance Fields without Neural Networks


    2. GLIDE: Towards Photorealistic Image Generation and Editing with Text-
    Guided Diffusion Models


    3. Uni-Perceiver: Pre-training Uni
    fi
    ed Architecture for Generic Perception
    for Zero-shot and Few-shot Tasks


    4. Masked Feature Prediction for Self-Supervised Visual Pre-Training


    5. Self-attention Does Not Need $O(n^2)$ Memory


    6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-
    Throughs


    7. Exploring the Equivalence of Siamese Self-Supervised Learning via A
    Uni
    fi
    ed Gradient Framework


    8. Improving language models by retrieving from trillions of tokens


    9. BEVT: BERT Pretraining of Video Transformers


    10. SLIP: Self-supervision meets Language-Image Pre-training
    CLIP
    CLIP
    NeRF
    NeRF
    Video
    Video
    NLP
    SSL
    SSL
    SSL
    Transformer
    Transformer
    Transformer
    Transformer

    View Slide

  10. Top10 Hype (ςʔϚผ)
    1. Critical Sentence Identi
    fi
    cation in Legal Cases Using Multi-Class
    Classi
    fi
    cation


    2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts


    3. Plenoxels: Radiance Fields without Neural Networks


    4. Show Your Work: Scratchpads for Intermediate Computation with Language
    Models


    5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos


    6. GLIDE: Towards Photorealistic Image Generation and Editing with Text-
    Guided Diffusion Models


    7. Ef
    fi
    cient Geometry-aware 3D Generative Adversarial Networks


    8. Player of Games


    9. NL-Augmenter: A Framework for Task-Sensitive Natural Language
    Augmentation


    10. FuseDream: Training-Free Text-to-Image Generation with Improved
    CLIP+GAN Space Optimization
    CLIP
    NeRF Video
    NLP
    NLP
    NLP
    NLP
    Transformer
    GAN
    GAN
    NeRF
    CLIP
    NeRF

    View Slide

  11. Pick up

    View Slide

  12. Top Hype 7. δΦϝτϦΛߟྀͨ͠ޮ཰తͳ3࣍ݩੜ੒Adversarial Networks


    (ݪจ: Ef
    fi
    cient Geometry-aware 3D Generative Adversarial Networks)


    ୯؟ͷ2DࣸਅΛ༻͍ͯɺଟ؟తʹ੔߹ੑͷ͋Δߴ඼࣭ͳը૾΍3Dܗঢ়Λڭࢣͳ͠Ͱੜ੒͢Δ͜ͱ
    ͸ɺ௕೥ͷ՝୊Ͱͨ͠ɻطଘͷ3࣍ݩGAN͸ɺܭࢉྔ͕ଟ͍͔ɺ3࣍ݩతʹ੔߹ੑͷͳ͍ۙࣅΛߦ
    ͏͔ͷ͍ͣΕ͔Ͱ͋Γɺલऀ͸ੜ੒͞ΕΔը૾ͷ඼࣭ͱղ૾౓Λ੍ݶ͠ɺޙऀ͸ଟࢹ఺ͷ੔߹ੑ
    ͱܗঢ়ͷ඼࣭ʹѱӨڹΛ༩͑ΔɻຊݚڀͰ͸ɺ͜ΕΒͷۙࣅʹա౓ʹґଘ͢Δ͜ͱͳ͘ɺ3D GAN
    ͷܭࢉޮ཰ͱը࣭Λ޲্ͤ͞Δɻ͜ͷ໨తͷͨΊʹɺզʑ͸දݱྗ๛͔ͳ໌ࣔత-ඇ໌ࣔతϋΠϒ
    ϦουωοτϫʔΫΞʔΩςΫνϟΛಋೖ͠ɺଞͷઃܭ্ͷબ୒ͱ߹Θͤͯɺߴղ૾౓ͷଟࢹ఺Ұ
    ؏ੑͷ͋Δը૾ΛϦΞϧλΠϜͰ߹੒͢Δ͚ͩͰͳ͘ɺߴ඼࣭ͷ3Dܗঢ়Λੜ੒͠·͢ɻಛ௃ྔͷ
    ੜ੒ͱχϡʔϥϧϨϯμϦϯάΛ੾Γ཭͢͜ͱͰɺզʑͷϑϨʔϜϫʔΫ͸StyleGAN2ͷΑ͏ͳ࠷
    ઌ୺ͷ2D CNNδΣωϨʔλΛ׆༻͠ɺͦͷޮ཰ੑͱදݱྗΛड͚ܧ͙͜ͱ͕Ͱ͖·͢ɻFFHQͱ
    AFHQ CatsΛ༻͍࣮ͨݧͳͲʹΑΓɺ࠷ઌ୺ͷ3DରԠ߹੒Λ࣮ূ͍ͯ͠·͢ɻ
    w ໨తɿ%ࣸਅ͔Β࣍ݩΛߟྀͨ͠%ࣸਅΛੜ੒͢Δ("/ͷੑೳ޲্
    w ੒Ռɿਓؒɾಈ෺ͷإࣸਅΛཱମతʹॲཧͯ͠ը૾Λੜ੒͢Δ("/ͷ4P5"Λ։ൃ
    w ํ๏ɿ4UZMF("/7PMVNF3FOEFSFS
    w ݻ༗໊ɿ&(% &
    ff
    i
    DJFOU(FPNFUSZBXBSF%(FOFSBUJWF"EWFSTBSJBM/FUXPSLT

    w ஶऀॴଐɿ4UBOGPSE6OJWFSTJUZ/7*%*"
    http://arxiv.org/abs/2112.07945v1

    View Slide

  13. View Slide

  14. σϞಈը(URLࢀর)
    IUUQTNBUUIFXBDIBOHJUIVCJP&(%

    View Slide

  15. EG3D GAN Framework


    (StyleGAN2ϕʔε)

    View Slide

  16. Tri-Plane ͸3ํ޲͔Βͷը૾(΍ಛ௃)Λݩʹ
    NeRFͷΑ͏ʹ࠲ඪຖͷີ౓ɾ৭Λਪଌ͢Δ

    View Slide

  17. [෮श] NeRF
    IUUQTBSYJWPSHBCT

    View Slide

  18. NeRF vs Tri-Plane


    Tri-Plane͸MipNeRFΑΓܰྔͰߴੑೳ

    View Slide

  19. NeRFͱTriPlaneͷϞσϧαΠζɾ଎౓ͷൺֱ


    (SSO: 3ํ޲ͷը૾, GAN:3ํ޲ͷStyleGANͷಛ௃σʔλ)

    View Slide

  20. ௿ղ૾౓ը૾ͷੜ੒ޙɺ
    SupervisonͰղ૾౓Λ্͍͛ͯΔ

    View Slide

  21. ߴ଎ɾߴੑೳɻ


    RTX3090 GPUͰ26~36FPS

    View Slide

  22. Top recent: Best10

    View Slide

  23. 1. Plenoxels:χϡʔϥϧωοτϫʔΫͷͳ͍ϥσΟΞϯεϑΟʔϧυ


    (ݪจ: Plenoxels: Radiance Fields without Neural Networks)


    ϑΥτϦΞϦεςΟοΫͳϏϡʔ߹੒ͷͨΊͷγεςϜɺPlenoxels
    (plenoptic voxels)Λ঺հ͠·͢ɻPlenoxels͸ɺγʔϯΛٿ໘ௐ࿨ͷ
    ͋Δૄͳ3࣍ݩάϦουͱͯ͠දݱ͠·͢ɻ͜ͷදݱ͸ɺΩϟϦϒ
    Ϩʔγϣϯ͞Εͨը૾͔Βɺޯ഑๏ͱਖ਼ଇԽʹΑͬͯɺਆܦ੒෼Λ
    ؚ·ͣʹ࠷దԽ͢Δ͜ͱ͕Ͱ͖·͢ɻඪ४తͳϕϯνϚʔΫλεΫ
    ʹ͓͍ͯɺPlenoxels͸Neural Radiance FieldsΑΓ΋2ܻҎ্ߴ଎ʹ
    ࠷దԽ͞Εɺࢹ֮తͳ඼࣭Λଛͳ͏͜ͱ͸͋Γ·ͤΜɻ
    w ໨తɿϏϡʔ߹੒γεςϜͷվળ
    w ੒Ռɿ/F3'ͱಉਫ਼౓Ͱɺഒ଎ֶ͘शͰ͖ΔϞσϧ͕Ͱ͖ͨ
    w ํ๏ɿ/F3'ΞϧΰϦζϜΛࢀߟʹඇχϡʔϥϧωοτͰ࠶࣮૷ͨ͠ 574)

    w ݻ༗໊ɿ1MFOPYFMT QMFOPQUJDWPYFMT

    w ஶऀॴଐɿ6$#FSLFMFZ
    http://arxiv.org/abs/2112.05131v1

    View Slide

  24. Plenoxels: Neural NetworkΛ࢖Θͣʹ
    Radiance FieldsΛॲཧ͢Δ͜ͱͰߴ଎Խͨ͠

    View Slide

  25. View Slide

  26. d

    View Slide

  27. View Slide

  28. GitHubͰެ։͞Ε͍ͯΔ
    ಋೖࢀߟهࣄIUUQTQZUIPOSFQPDPNSFQPTYZVTWPYQZUIPOEFFQMFBSOJOH
    IUUQTHJUIVCDPNTBSBGSJEPWQMFOPYFMT

    View Slide

  29. 2. GLIDE:ςΩετ༠ಋܕ֦ࢄϞσϧʹΑΔϑΥτϦΞϦεςΟοΫͳը૾ੜ੒ɾฤूΛ໨ࢦͯ͠


    (ݪจ: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion
    Models)
    ֦ࢄϞσϧ͸ɺಛʹɺଟ༷ੑͱ஧࣮ੑΛτϨʔυΦϑ͢ΔΨΠμϯεٕज़ͱ૊Έ߹Θͤͨ৔߹ʹɺ
    ߴ඼࣭ͷ߹੒ը૾Λੜ੒͢Δ͜ͱ͕࠷ۙࣔ͞Ε͍ͯΔɻຊݚڀͰ͸ɺςΩετΛ৚݅ͱͨ͠ը૾߹
    ੒ͷ໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺ2ͭͷҟͳΔΨΠμϯεઓུΛൺֱ͢ΔɻຊݚڀͰ͸ɺς
    Ωετ৚݅෇͖ը૾߹੒໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺCLIPΨΠμϯεͱ෼ྨثͳ͠ΨΠμϯ
    εͱ͍͏2ͭͷΨΠμϯεઓུΛൺֱͨ͠ɻͦͷ݁Ռɺޙऀͷํ͕ɺ࣮ࣸੑͱΩϟϓγϣϯͷྨࣅ
    ੑͷ྆ํʹ͓͍ͯਓؒͷධՁऀʹ޷·Εɺ࣮ࣸతͳαϯϓϧΛੜ੒͢Δ͜ͱ͕ଟ͍͜ͱ͕Θ͔ͬ
    ͨɻ·ͨɺ35ԯݸͷύϥϝʔλΛ࣋ͭςΩετ৚݅෇͖֦ࢄϞσϧΛ༻͍ͯɺ෼ྨثෆཁͷΨΠμ
    ϯεΛߦͬͨ৔߹ɺDALL-EͷαϯϓϧΑΓ΋ਓؒͷධՁऀʹ޷·ΕΔ͜ͱ͕෼͔Γ·ͨ͠ɻ͞Β
    ʹɺ͜ͷϞσϧΛඍௐ੔ͯ͠ը૾ͷΠϯϖΠϯςΟϯάΛߦ͏͜ͱͰɺςΩετۦಈܕͷڧྗͳը
    ૾ฤू͕ՄೳʹͳΔ͜ͱ͕Θ͔Γ·ͨ͠ɻϑΟϧλϦϯά͞ΕͨσʔληοτͰΑΓখ͞ͳϞσϧ
    Λ܇࿅͠ɺͦͷίʔυͱॏΈΛ https://github.com/openai/glide-text2im Ͱެ։͠·ͨ͠ɻ
    w ໨తɿUFYUUPJNBHFϞσϧͷվྑ
    w ੒ՌɿςΩετ͔Βͷը૾ͷੜ੒ɾ෦෼ฤू͕ՄೳͳߴੑೳϞσϧΛ։ൃ

    ϑϧαΠζ͸ߴੑೳ͗͢ΔͷͰɺෆਖ਼๷ࢭ༻ͷϑΟϧλ͋ΓσʔληοτͱখܕϞσϧͷΈެ։

    w ํ๏ɿ$-*1%J
    ff
    VTJPO.PEFM
    w ݻ༗໊ɿ(-*%& (VJEFE-BOHVBHFUP*NBHF%J
    ff
    VTJPOGPS(FOFSBUJPOBOE&EJUJOH

    w ஶऀॴଐɿ0QFO"*
    http://arxiv.org/abs/2112.10741v2

    View Slide

  30. Diffusion Model


    ཭ࢄతͳঢ়ଶͷϊΠζ༧ଌΛֶशͯ͠ɺλεΫಛԽͷֶ
    शͳ͠ʹϚϧνλεΫʹద༻Ͱ͖Δσʔλੜ੒Ϟσϧ

    View Slide

  31. CLIP


    Text to image

    View Slide

  32. GLIDE=CLIP + Diffusion Model

    View Slide

  33. View Slide

  34. ςΩετͰͷը૾ੜ੒ʴ


    ςΩετʹΑΔ෦෼ฤू
    <ը૾ੜ੒>ډ৺஍ͷྑ͍ϦϏϯά
    <ϚεΫՃ޻>ιϑΝʔ্෦ͷนʹίʔΪʔͷֆ
    <ϚεΫՃ޻>ιϑΝʔલʹؙܕͷίʔώʔςʔϒϧ
    <ϚεΫՃ޻>ίʔώʔςʔϒϧͷ্ʹՖළ
    <ϚεΫՃ޻>ιϑΝʔ͕෦԰ͷ֯ʹ͋Δ

    View Slide

  35. View Slide

  36. 3. Uni-Perceiver:θϩγϣοτ͓Αͼ਺γϣοτͷλεΫͷͨΊͷ൚༻తͳ஌֮ͷͨΊͷࣄલֶश༻౷߹ΞʔΩςΫνϟ


    (ݪจ: Uni-Perceiver: Pre-training Uni
    fi
    ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks)


    ಈ෺ͷੜ෺ֶతͳ஌ೳγεςϜ͸ɺҟͳΔϞμϦςΟͷ৘ใΛ౷߹͠ɺ༷ʑͳλεΫͷͨΊʹಉ࣌ʹॲཧ͢Δ͜ͱͰ
    ੈքΛೝ͍ࣝͯ͠·͢ɻҰํɺݱࡏͷػցֶशͷݚڀ͸ɺλεΫʹಛԽͨ͠ύϥμΠϜʹै͍ͬͯΔͨΊɺλεΫؒ
    ͷඇޮ཰ͳ࿈ܞ΍ɺ৽͍͠λεΫͷͨΊͷ஌֮Ϟσϧͷ։ൃʹ͔͔Δߴ͍ݶքίετʹͭͳ͕͍ͬͯΔɻຊ࿦จͰ
    ͸ɺUni-Perceiverͱ໊෇͚ΒΕͨ൚༻తͳ஌֮ΞʔΩςΫνϟΛ঺հ͢ΔɻUni-Perceiver͸ɺ౷Ұ͞ΕͨϞσϦϯά
    ͱڞ༗ύϥϝʔλͰ༷ʑͳϞμϦςΟͱλεΫΛॲཧ͢Δɻ۩ମతʹ͸ɺUni-Perceiver͸ɺ೚ҙͷϞμϦςΟ͔Βͷ
    ҟͳΔλεΫೖྗͱλʔήοτΛɺϞμϦςΟʹͱΒΘΕͳ͍TransformerΤϯίʔμͱܰྔͳϞμϦςΟݻ༗ͷ
    τʔΫϯԽثΛ༻͍ͯɺ౷Ұ͞ΕͨදݱۭؒʹΤϯίʔυ͠·͢ɻҟͳΔ஌֮λεΫ͸ɺಉ͡ఆࣜԽͱͯ͠ϞσϧԽ
    ͞Ε·͢ɻͭ·ΓɺͦΕͧΕͷೖྗʹର͢Δ࠷େ໬౓ͷλʔήοτΛɺදݱͷྨࣅੑΛ௨ͯ͠ݟ͚ͭΔͷͰ͢ɻ͜ͷ
    Ϟσϧ͸ɺ͍͔ͭ͘ͷϢχϞʔμϧ͓ΑͼϚϧνϞʔμϧͳλεΫͰࣄલʹֶश͞Εɺࣄલֶशͷஈ֊Ͱ͸ొ৔͠ͳ
    ͔ͬͨ৽نλεΫΛؚΉɺ͞·͟·ͳԼྲྀλεΫͰධՁ͞ΕΔɻͦͷ݁Ռɺνϡʔχϯάͳ͠Ͱࣄલֶशͨ͠Ϟσϧ
    ͸ɺ৽نλεΫͰ͋ͬͯ΋ଥ౰ͳੑೳΛୡ੒Ͱ͖Δ͜ͱ͕Θ͔ͬͨɻ·ͨɺԼྲྀλεΫͷ1%ͷσʔλʹରͯ͠ਝ଎ͳ
    νϡʔχϯάΛߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ʹ͍ۙϨϕϧ·ͰੑೳΛ޲্ͤ͞Δ͜ͱ͕Ͱ͖Δɻ͞ΒʹɺϑϧσʔλͰ
    ͷඍௐ੔Λߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ͱಉ౳Ҏ্ͷ݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻίʔυΛެ։͠·͢ɻ
    w ໨తɿ5SBOTGPSNFSͷϚϧνλεΫֶशɾར༻ͷվળ
    w ੒Ռɿଟ༷ͳೖग़ྗܗࣜʹରԠͨ͠5SBOTGPSNFSʮ6OJ1FSDFJWFSʯͷ։ൃ
    w ํ๏ɿ5SBOTGPSNFSϞμϦςΟݻ༗ͷτʔΫϯԽث
    w ݻ༗໊ɿ6OJ1FSDFJWFS
    w ஶऀॴଐɿ4FOTF5JNF3FTFBSDI੢҆ަ௨େֶ߳ߓதจେֶ
    http://arxiv.org/abs/2112.01522v1

    View Slide

  37. View Slide

  38. ը૾ɾςΩετɾϏσΦΛڞ௨
    ॲཧͰ͖ΔܗࣜʹτʔΫϯԽ

    View Slide

  39. ϚϧνϞμϦςΟͰࣄલֶशͨ͠Ϟσϧɻ


    ϚϧνλεΫͰνϡʔχϯάͯ͠ར༻Ͱ͖Δ

    View Slide

  40. 4. ϚεΫ͞Εͨಛ௃ͷ༧ଌʹΑΔࣗݾڭࢣ෇͖ࢹ֮త༧උ܇࿅


    (ݪจ: Masked Feature Prediction for Self-Supervised Visual Pre-Training)
    զʑ͸ɺϏσΦϞσϧͷࣗݾڭࢣ෇͖ࣄલֶशͷͨΊͷMasked Feature Prediction (MaskFeat)Λൃද͢
    Δɻຊख๏Ͱ͸ɺ·ͣɺೖྗγʔέϯεͷҰ෦ΛϥϯμϜʹϚεΫ͠ɺ࣍ʹɺϚεΫ͞ΕͨྖҬͷಛ௃
    Λ༧ଌ͢Δɻ5छྨͷಛ௃ྔΛݕ౼ͨ݁͠Ռɼख࡞Γͷಛ௃ྔهड़ࢠͰ͋ΔHistograms of Oriented
    GradientsʢHOGʣ͕ɼੑೳͱޮ཰ͷ྆໘Ͱಛʹ༏Ε͍ͯΔ͜ͱ͕Θ͔ͬͨɽ·ͨɺHOGͷہॴతͳί
    ϯτϥετͷਖ਼نԽ͸ɺྑ޷ͳ݁ՌΛಘΔͨΊʹෆՄܽͰ͋Γɺ͜Ε͸ɺHOGΛࢹ֮ೝࣝʹ༻͍ͨҎલ
    ͷݚڀͱಉ༷Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻզʑͷΞϓϩʔν͸ɺ๛෋ͳࢹ֮త஌ࣝΛֶश͠ɺେن໛ͳ
    TransformerϕʔεͷϞσϧΛۦಈ͢Δ͜ͱ͕Ͱ͖·͢ɻϞσϧͷॏΈ΍؂ࢹΛ௥Ճ͢Δ͜ͱͳ͘ɺϥϕ
    ϧͷͳ͍ϏσΦͰࣄલֶश͞ΕͨMaskFeat͸ɺKinetics-400ͷMViT-LͰ86.7%ɺKinetics-600Ͱ
    88.3%ɺKinetics-700Ͱ80.4%ɺAVAͰ38.8mAPɺSSv2Ͱ75.0%ͱ͍͏ɺ͜Ε·Ͱʹͳ͍݁ՌΛୡ੒͠
    ͨɻMaskFeat͸͞Βʹɺ1ϑϨʔϜͷಈըͱղऍͰ͖Δը૾ೖྗʹ΋ҰൠԽ͠ɺImageNetͰڝ૪ྗͷ
    ͋Δ݁ՌΛಘ͍ͯ·͢ɻ
    w ໨తɿϏσΦΛର৅ʹͨ͠ϚεΩϯάը૾ิ׬ͷֶश
    w ੒Ռɿࣗݾڭࢣ͖ͭϏσΦֶशํ๏.BTL'FBUͷ։ൃ
    w ํ๏ɿϚεΫ͞Εͨίϯςϯπͷಛ௃ )0(
    Λ௚઀ճؼͯ͠ࣄલֶश͢Δ
    w ݻ༗໊ɿ.BTL'FBU .BTLFE'FBUVSF1SFEJDUJPO

    w ஶऀॴଐɿ'BDFCPPL"*3FTFBSDI+PIOT)PQLJOT6OJWFSTJUZ
    http://arxiv.org/abs/2112.09133v1

    View Slide

  41. ϚεΫө૾ͷHOG(ޯ഑ͷώετάϥ
    Ϝ)Λ༧ଌ͢Δ

    View Slide

  42. ϐΫηϧ༧ଌ vs HOG༧ଌ


    ϐΫηϧ༧ଌ͸໛༷ͷ৭ਪଌ΍ෳࡶͳߏ଄ͷ༧ଌ͕೉͍͕͠ɺ


    HOG͸ಛ௃ͷΈΛѻ͏ͷͰ͏·͘ਪଌ͠΍͍͢

    View Slide

  43. ө૾Λ̏࣍ݩ(ॎԣ+࣌ؒ)ʹΩϡʔ
    ϒঢ়ʹϚεΩϯάֶͯ͠श͢Δ

    View Slide

  44. View Slide

  45. 5. ࣗݾೝࣝʹ͸On2 ͷهԱ͕ඞཁͳ͍


    (ݪจ: Self-attention Does Not Need On2 Memory)


    ຊ࿦จͰ͸ɺ഑ྻͷ௕͞ʹରͯ͠O1
    ͷϝϞϦΛඞཁͱ͢Δඇৗʹ୯७ͳ஫ҙͷΞϧΰϦζ
    ϜͱɺOn2ͷϝϞϦΛඞཁͱ͢Δࣗݾ஫ҙ΁ͷ֦ுΛࣔ͠·͢ɻ͜Ε͸ɺࣗݾ஫ҙ͕On2ͷ
    ϝϞϦΛඞཁͱ͢ΔͱΑ͘ݴΘΕΔͷͱ͸ରরతͰ͢ɻ࣌ؒతͳෳࡶ͞͸ґવͱͯ͠On2Ͱ
    ͕͢ɺ࠷ۙͷՃ଎ثͰ͸ɺܭࢉೳྗͰ͸ͳ͘σόΠεͷϝϞϦ੍͕ݶཁҼͱͳΔ͜ͱ͕Α͘
    ͋Γ·͢ɻͦͷͨΊɺΞςϯγϣϯʹඞཁͳϝϞϦྔΛݮΒ͢͜ͱͰɺଞͷํ๏Ͱ͸࣮ݱͰ
    ͖ͳ͍Α͏ͳ௕͍γʔέϯεͷॲཧ͕ՄೳʹͳΓ·͢ɻຊݚڀͰ͸ɺΞΫηϥϨʔλ༻ͷ࣮
    ༻తͳ࣮૷Λఏڙ͠·͢ɻ͜ͷ࣮૷Ͱ͸ɺO√nͷϝϞϦΛඞཁͱ͠ɺ਺஋తʹ҆ఆ͓ͯ͠
    Γɺඪ४తͳΞςϯγϣϯͷ࣮૷ͷϥϯλΠϜͷ਺ύʔηϯτҎ಺ʹऩ·͍ͬͯ·͢ɻ·
    ͨɺϝϞϦޮ཰Λҡ࣋͠ͳ͕Βؔ਺Λඍ෼͢Δํ๏Λࣔ͠·͢ɻ഑ྻ௕16384ʹରͯ͠ɺࣗ
    ݾ஫໨ͷϝϞϦΦʔόʔϔου͸ɺਪ࿦Ͱ59ഒɺඍ෼Ͱ32ഒʹ࡟ݮ͞Εͨɻ
    w ໨తɿ4FMGBUUFOUJPOϞσϧͷলྗԽ
    w ੒Ռɿ5SBOTGPSNFSͷϝϞϦඞཁྔΛOͷ৐͔Β㲋ʹ࡟ݮ
    w ํ๏ɿ5SBOTGPSNFSͷϝϞϦίετ͕ߴ͍࣮૷Λ෦෼తʹมߋͨ͠
    w ݻ༗໊ɿͳ͠
    w ஶऀॴଐɿ(PPHMF3FTFBSDI
    http://arxiv.org/abs/2112.05682v2

    View Slide

  46. View Slide

  47. 6. Mega-NeRF: Ծ૝ϑϥΠεϧʔͷͨΊͷେن໛NeRFͷεέʔϥϒϧͳߏங


    (ݪจ: Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs)
    ຊݚڀͰ͸ɺओʹυϩʔϯσʔλ͔Βऩूͨ͠ɺϏϧ΍֗۠ʹ·͕ͨΔେن໛ͳϏδϡΞϧΩϟϓνϟ͔ΒɺχϡʔϥϧϥδΞ
    ϯεϑΟʔϧυʢNeRFʣΛ׆༻ͯ͠ɺΠϯλϥΫςΟϒͳ3D؀ڥΛߏங͢Δํ๏Λݕ౼͍ͯ͠·͢ɻैདྷɺNeRF͕ධՁ͞Εͯ
    ͖ͨ୯Ұ෺ମͷγʔϯͱ͸ରরతʹɺ͜ͷઃఆͰ͸ɺ(1)γʔϯͷখ͞ͳαϒηοτ͔͠ଊ͍͑ͯͳ͍ɺর໌৚݅ͷҟͳΔԿઍ΋
    ͷը૾ΛऔΓࠐΉඞཁ͕͋Δɺ(2)୯ҰͷGPUͰૉ๿ʹֶशͰ͖ΔൣғΛ௒͑ͨɺ๏֎ʹߴ͍Ϟσϧ༰ྔͱϨΠαϯϓϦϯάͷ
    ཁٻ͕͋Δɺ(3)೚ҙͷ਺ͷՄೳͳࢹ఺͕͋ΔͨΊɺ(ϦΞϧλΠϜNeRFϨϯμϥʔ͕௨ৗߦ͏Α͏ʹ)͢΂ͯͷؔ࿈৘ใΛࣄલʹ
    ܭࢉ͢Δ͜ͱ͸ෆՄೳͰ͋Δɺͱ͍ͬͨෳ਺ͷ՝୊͕͋Γ·͢ɻ͜ΕΒͷ՝୊Λղܾ͢ΔͨΊʹɺ·ͣɺେن໛ͳγʔϯͷՄࢹ
    ੑ౷ܭΛ෼ੳ͠ɺύϥϝʔλ͕γʔϯͷҟͳΔྖҬʹಛԽ͞Ε͍ͯΔૄͳωοτϫʔΫߏ଄ͷಈػ෇͚Λߦ͍·͢ɻ͞Βʹɺγ
    ϯϓϧͳزԿֶతΫϥελϦϯάΞϧΰϦζϜΛಋೖ͠ɺτϨʔχϯάը૾ʢͱ͍͏ΑΓ΋ϐΫηϧʣΛɺฒྻʹτϨʔχϯά
    Ͱ͖ΔҟͳΔNeRFαϒϞδϡʔϧʹ෼ׂ͠·͢ɻQuad 6kσʔληοτɺUrbanScene3Dσʔληοτɺ͓Αͼզʑͷυϩʔ
    ϯө૾͔Βऔಘͨ͠γʔϯΛର৅ʹɺզʑͷΞϓϩʔνΛධՁͨ͠ͱ͜ΖɺฏۉͰPSNRΛ11ˋҎ্޲্ͤ͞ͳ͕Βɺֶश଎౓
    Λ3ഒʹ޲্ͤ͞Δ͜ͱ͕Ͱ͖·ͨ͠ɻଓ͍ͯɺMega-NeRFʹՃ͑ͯ࠷ۙͷNeRFߴ଎Ϩϯμϥʔͷ࣮ূධՁΛߦ͍ɺ࣌ؒతͳ
    ίώʔϨϯεΛར༻ͨ͠৽͍͠ख๏Λ঺հ͠·͢ɻզʑͷख๏͸ɺPSNR඼࣭Λ0.5dbҎ಺ʹ཈͑ͳ͕ΒɺैདྷͷNeRFϨϯμϦ
    ϯάʹൺ΂ͯ40ഒͷߴ଎ԽΛୡ੒͠ɺطଘͷߴ଎Ϩϯμϥʔͷ஧࣮౓Λ্ճΔ݁Ռͱͳͬͨɻ
    w ໨తɿΠϯλϥΫςΟϒͳେن໛/F3'%؀ڥͷߏங
    w ੒Ռɿ/F3'ͱಉੑೳͰ̏ഒ଎͍େن໛ۭؒ޲͚/F3'Λ։ൃ
    w ํ๏ɿυϩʔϯө૾σʔληοτ౳Λର৅ʹۭؒ෼ׂͯ͠ฒྻͰ/F3'ֶशͨ͠
    w ݻ༗໊ɿ.FHB/F3'
    w ஶऀॴଐɿ$BSOFHJF.FMMPO6OJWFSTJUZ"SHP"* ΧϦϑΥϧχΞͷࣗಈӡసελʔτΞοϓ

    http://arxiv.org/abs/2112.10703v1

    View Slide

  48. NeRF, NeRF++, MegaNeRFͷ


    ֶशख๏ͷҧ͍

    View Slide

  49. View Slide

  50. View Slide

  51. ਫ਼౓ˢɹֶश଎౓ˢˢˢ

    View Slide

  52. 7. ౷Ұޯ഑ϑϨʔϜϫʔΫʹΑΔSiamese Self-Supervised Learningͷ౳Ձੑͷ୳ٻ


    (ݪจ: Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni
    fi
    ed Gradient
    Framework)
    ࣗݾڭࢣ෇ֶ͖श͸ɺਓؒͷΞϊςʔγϣϯͳ͠Ͱڧྗͳࢹ֮දݱΛநग़͢ΔͨΊͷେ͖ͳՄೳੑΛ͍ࣔͯ͠Δɻࣗ
    ݾڭࢣ෇ֶ͖शΛ༷ʑͳ؍఺͔Βѻ͏ͨΊʹɺ༷ʑͳ࡞඼͕ఏҊ͞Ε͍ͯΔɻ(1) ରൺֶश๏ʢMoCo, SimCLRͳͲʣ
    ͸ɺֶशͷํ޲ੑΛܾΊΔͨΊʹਖ਼ෛ྆ํͷαϯϓϧΛར༻͢Δɻ(2) ඇରশωοτϫʔΫ๏ʢBYOL, SimSiamͳ
    Ͳʣ͸ɺ༧ଌωοτϫʔΫͷಋೖͱఀࢭޯ഑ૢ࡞ʹΑͬͯෛͷαϯϓϧΛऔΓআ͘ɻ(3) ಛ௃૷০๏ʢBarlow Twins,
    VICRegͳͲʣ͸ɺಛ௃࣍ݩؒͷ৑௕ੑΛݮΒ͢͜ͱΛ໨తͱ͍ͯ͠Δɻ͜ΕΒͷख๏͸ɺ༷ʑͳಈػ͔Βɺઃܭ͞
    Εͨଛࣦؔ਺͕͔ͳΓҟͳ͍ͬͯΔΑ͏Ͱ͢ɻ·ͨɺ࠷ऴతͳਫ਼౓΋༷ʑͰɺ࡞඼ʹΑͬͯҟͳΔωοτϫʔΫ΍τ
    ϦοΫ͕ར༻͞Ε͍ͯ·͢ɻຊݚڀͰ͸ɺ͜ΕΒͷख๏͕ಉ͡ܗࣜʹ౷ҰͰ͖Δ͜ͱΛࣔ͠·͢ɻͦΕͧΕͷଛࣦؔ
    ਺Λൺֱ͢ΔͷͰ͸ͳ͘ɺޯ഑෼ੳʹΑͬͯ౷Ұ͞ΕͨࣜΛಋ͖ग़͠·͢ɻ͞ΒʹɺެฏͰৄࡉͳ࣮ݧΛߦ͍ɺ྆ऀ
    ͷੑೳΛൺֱ͠·ͨ͠ɻͦͷ݁Ռɺ͜ΕΒͷख๏ͷؒʹ͸΄ͱΜͲΪϟοϓ͕ͳ͘ɺϞϝϯλϜΤϯίʔμͷ࢖༻͕
    ੑೳΛ޲্ͤ͞ΔॏཁͳཁૉͰ͋Δ͜ͱ͕Θ͔ͬͨɻ͜ͷ౷Ұ͞ΕͨϑϨʔϜϫʔΫ͔Βɺզʑ͸UniGradΛఏҊ͠
    ·͢ɻUniGrad͸ɺࣗݾڭࢣ෇ֶ͖शͷͨΊͷγϯϓϧͰޮՌతͳޯ഑ܗࣜͰ͢ɻUniGrad͸ɺϝϞϦόϯΫ΍༧ଌ
    ωοτϫʔΫΛඞཁͱ͠ͳ͍͕ɺ࠷ઌ୺ͷੑೳΛୡ੒͢Δ͜ͱ͕Ͱ͖ɺଞͷֶशઓུΛ༰қʹ࠾༻͢Δ͜ͱ͕Ͱ͖
    Δɻ·ͨɺઢܗධՁ΍ଟ͘ͷԼྲྀλεΫͰͷ޿ൣͳ࣮ݧʹΑΓɺͦͷ༗ޮੑ͕ࣔ͞Ε͍ͯ·͢ɻίʔυΛެ։͢Δɻ
    w ໨తɿࣗݾڭࢣֶ͖ͭशΞϧΰϦζϜͷ౷ҰԽ
    w ํ๏ɿ֤ࣗݾڭࢣ෇ֶ͖शख๏Λ෼ੳ
    w ੒Ռɿࣗݾڭࢣֶ͖ͭशͷͨΊͷڞ௨ͷࣜΛಋ͖ग़ͯ͠ɺطଘͱ΄΅ಉ౳ͷਫ਼౓ͩͬͨ
    w ݻ༗໊ɿ6OJ(SBE
    w ஶऀॴଐɿਗ਼՚େֶ4FOTF5JNF3FTFBSDIᔳߐେֶ๺ژਓ޻஌ೳݚڀӃ
    http://arxiv.org/abs/2112.05141v1

    View Slide

  53. ओͳҧ͍͸ɺ
    ɾਖ਼ͷྨࣅ౓ܭࢉ
    ɾෛͷྨࣅ౓ܭࢉ
    ɾMPTTܭࢉ
    ˠ6OJ(SBEͰఆࣜԽͨ͠ɻ

    View Slide

  54. ֤ख๏ʹ͍ͭͯɺ


    ੑೳ(Linear Eval)͸΄΅ಉ͡

    View Slide

  55. ͍ͣΕͷख๏Ͱ΋ Momentum
    Encoder͕͋Δͱੑೳ͕ +2%޲্ͨ͠

    View Slide

  56. ౷ҰԽͨ͠ڧΈͰ Data Augmentation ͯ͠ɺ


    طଘख๏Λ྇կ

    View Slide

  57. 8. Կஹ΋ͷτʔΫϯ͔Βݕࡧͯ͠ݴޠϞσϧΛվળ͢Δ


    (ݪจ: Improving language models by retrieving from trillions of tokens)
    େن໛ίʔύε͔Βݕࡧ͞ΕͨจॻνϟϯΫΛɺઌߦ͢ΔτʔΫϯͱͷہॴతͳྨࣅੑʹج͍ͮ
    ͯ৚݅෇͚͢Δ͜ͱͰɺࣗಈճؼܕݴޠϞσϧΛڧԽ͢Δɻ2ஹݸͷτʔΫϯσʔλϕʔεΛ༻
    ͍ͯɺզʑͷRetrieval-Enhanced Transformer (RETRO)͸ɺ25ഒগͳ͍ύϥϝʔλΛ༻͍͍ͯΔ
    ʹ΋͔͔ΘΒͣɺthe PileσʔληοτͰGPT-3΍Jurassic-1ͱಉ౳ͷੑೳΛಘΔ͜ͱ͕Ͱ͖ͨɻ
    RETROͷੑೳ͸ɺඍௐ੔ͷޙɺ࣭໰Ԡ౴ͷΑ͏ͳԼྲྀͷ஌ࣝू໿ܕͷλεΫʹม׵͞Ε·͢ɻ
    RETRO͸frozen Bert retrieverɺdifferential encoderɺchunked cross-attentionػߏΛ૊Έ߹Θ
    ͤͯɺֶश࣌ʹ௨ৗফඅ͞ΕΔσʔλΑΓ΋ܻҧ͍ʹଟ͘ͷσʔλʹج͍ͮͯτʔΫϯΛ༧ଌ͠
    ·͢ɻRETRO͸௨ৗεΫϥον͔Βֶश͠·͕͢ɺࣄલʹֶशͨ͠ม׵ثΛݕࡧ͠ͳ͕Βਝ଎ʹ
    RETRO
    fi
    t͢Δ͜ͱ΋Ͱ͖ɺྑ޷ͳੑೳΛಘΔ͜ͱ͕Ͱ͖·͢ɻࢲͨͪͷݚڀ͸ɺ͜Ε·Ͱʹͳ͍
    ن໛ͷ໌ࣔతͳهԱʹΑͬͯݴޠϞσϧΛվળ͢ΔͨΊͷ৽ͨͳಓΛ։͘΋ͷͰ͢ɻ
    w ໨తɿ(15ɾ+VSBTTJDͷΑ͏ͳࣗಈճؼܕݴޠϞσϧͷվળ
    w ੒ՌɿطଘͷࣗવݴޠֶशϞσϧ405"ͱಉੑೳͰഒܰྔͳ3&530Λ࣮૷
    w ํ๏ɿGSP[FO#FSUSFUSJFWFSEJ
    ff
    FSFOUJBMFODPEFSDIVOLFEDSPTTBUUFOUJPO
    w ݻ༗໊ɿ3&530 3FUSJFWBM&OIBODFE5SBOTGPSNFS

    w ஶऀॴଐɿ%FFQ.JOE
    http://arxiv.org/abs/2112.04426v1

    View Slide

  58. ࠨ: ύϥϝʔλ਺ͱੑೳ(RETRO:OFF͕ݕࡧ0ͰϕʔεϥΠϯͱಉ౳)


    தԝ: τʔΫϯݕࡧ਺ͱੑೳ


    ӈ: ۙ๣୳ࡧ਺ͱੑೳ

    View Slide

  59. 9. BEVT: BERT Pretraining of Video Transformers


    (ݪจ: BEVT: BERT Pretraining of Video Transformers)
    ຊ࿦จ͸ɺϏσΦม׵ثͷBERTࣄલֶशʹ͍ͭͯݚڀ͍ͯ͠·͢ɻ͜Ε͸؆୯ͳ͜ͱͰ͕͢ɺ࠷ۙͷը૾ม׵ͷBERT
    ࣄલֶशͷ੒ޭΛߟ͑Δͱɺݚڀ͢ΔՁ஋ͷ͋Δ֦ுͰ͢ɻຊ࿦จͰ͸ɺϏσΦදݱֶशΛۭؒදݱֶशͱ࣌ؒతμΠ
    φϛΫεֶशʹ෼཭͢ΔBEVTΛಋೖ͠·͢ɻ۩ମతʹ͸ɺBEVT͸ɺ·ͣը૾σʔλʹରͯ͠ϚεΩϯά͞Εͨը૾Ϟ
    σϦϯάΛߦ͍ɺ࣍ʹϏσΦσʔλʹରͯ͠ϚεΩϯά͞ΕͨϏσΦϞσϦϯάͱಉ࣌ʹϚεΩϯά͞Εͨը૾ϞσϦ
    ϯάΛߦ͍·͢ɻ͜ͷઃܭͷಈػ͸ɺ࣍ͷ2ͭͷ఺ʹ͋Γ·͢ɻ1) ը૾σʔλͰֶश͞Εͨม׵ث͸ɼద੾ͳۭؒϓϦʔ
    ΞΛఏڙ͠ɼεΫϥονͰֶश͞Εͨ৔߹ʹ͸͠͹͠͹ܭࢉෛՙ͕͔͔ΔϏσΦม׵ثͷֶशΛ༰қʹ͢Δ͜ͱ͕Ͱ͖
    Δɽ 2) ਖ਼͍͠༧ଌΛߦ͏ͨΊʹඞཁͳࣝผతͳख͕͔Γɼ͢ͳΘۭͪؒత͓Αͼ࣌ؒతͳ৘ใ͸ɼΫϥε಺͓ΑͼΫϥ
    εؒͷมಈ͕େ͖͍ͨΊɼҟͳΔϏσΦؒͰมԽ͢ΔɽBEVT͕ඇৗʹ༗๬ͳ݁ՌΛಘͨ3ͭͷνϟϨϯδϯάͳϏσΦ
    ϕϯνϚʔΫͰɺ޿ൣғͳ࣮ݧΛߦ͍·ͨ͠ɻKinetics 400Ͱ͸ɺೝࣝ͸ओʹࣝผతͳۭؒදݱʹґଘ͓ͯ͠ΓɺBEVT
    ͸ڧྗͳڭࢣ෇͖ϕʔεϥΠϯͱಉ౳ͷ݁ՌΛୡ੒͠·ͨ͠ɻ·ͨɺSomething-Something-V2ͱDiving 48Ͱ͸ɺ࣌ؒ
    తͳμΠφϛΫεʹґଘ͢ΔϏσΦΛର৅ͱ͍ͯ͠·͕͢ɺBEVT͸ଞͷ͢΂ͯͷϕʔεϥΠϯΑΓ΋໌Β͔ʹ༏Ε͓ͯ
    ΓɺͦΕͧΕ70.6%ͱ86.7%ͷτοϓ1ਫ਼౓ͱ͍͏࠷ઌ୺ͷੑೳΛୡ੒͠·ͨ͠ɻ
    w ໨తɿ#&35ͰϏσΦม׵͢Δݚڀ
    w ੒Ռɿ4PNFUIJOH4PNFUIJOH7ͱ%JWJOHͰτοϓਫ਼౓Λୡ੒
    w ํ๏ɿը૾ϚεΩϯά#&35ϞσϧͱϏσΦϚεΩϯά#&35ϞσϧΛڠௐֶͤͯ͞श
    w ݻ༗໊ɿ#&75
    w ஶऀॴଐɿ෮୴େֶίϯ
    ピ
    ϡʔλαΠΤϯεֶ෦্ւ஌ೳ৘ใॲཧΩʔϥϘ.JDSPTPGU$MPVE
    "*
    http://arxiv.org/abs/2112.01529v1

    View Slide

  60. Ϟσϧߏ଄ͷ֓೦ਤ

    View Slide

  61. BEVTϑϨʔϜϫʔΫ

    View Slide

  62. Something-Something-V2ͱDiving 48ͰTOP1

    View Slide

  63. …ͱࢥͬͨΒɺSomething-Something-V2ͷ݁ՌͰɺ

    BEVT ͸ MViT-L + MaskFeat ʹ͸ෛ͚͍ͯͨɻ

    View Slide

  64. 10. SLIP: Self-supervision meets Language-Image Pre-training


    (ݪจ: SLIP: Self-supervision meets Language-Image Pre-training)
    ࠷ۙͷݚڀͰ͸ɺ೉қ౓ͷߴ͍ࢹ֮ೝࣝλεΫʹ͓͍ͯɺࣗݾڭࢣ෇͖ͷࣄલֶश͕ڭࢣ෇ֶ͖शΑΓ
    ΋༏Ε͍ͯΔ͜ͱ͕ࣔ͞Ε͍ͯ·͢ɻ·ͨɺݴޠ؂ಜΛ༻ֶ͍ͨशͷ৽͍͠ΞϓϩʔνͰ͋ΔCLIP͸ɺ
    ༷ʑͳϕϯνϚʔΫͰ༗๬ͳੑೳΛ͍ࣔͯ͠·͢ɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖श͕ɺࢹ֮දݱͷֶश
    ʹ͓͚Δݴޠ؂ಜͷར༻ʹ໾ཱ͔ͭͲ͏͔Λݕ౼͢ΔɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱCLIPʹΑΔࣄ
    લֶशΛ૊Έ߹ΘͤͨϚϧνλεΫֶशϑϨʔϜϫʔΫͰ͋ΔSLIPΛ঺հ͢ΔɻVision TransformersΛ
    ༻͍ͯࣄલֶशΛߦͬͨޙɺදݱ඼࣭ΛపఈతʹධՁ͠ɺθϩγϣοτసૹɺઢܗ෼ྨɺΤϯυπʔΤ
    ϯυͷඍௐ੔ͱ͍͏3ͭͷҟͳΔઃఆͷԼͰɺCLIPͱࣗݾڭࢣ෇ֶ͖शͷ྆ํͱੑೳΛൺֱ͠·͢ɻ
    ImageNet͓Αͼͦͷଞͷσʔληοτʹ͓͍ͯɺSLIP͸ਫ਼౓Λେ෯ʹ޲্ͤ͞Δ͜ͱ͕Θ͔Γ·͠
    ͨɻ͞ΒʹɼϞσϧαΠζɼֶशεέδϡʔϧɼࣄલֶशσʔληοτΛม࣮͑ͯݧΛߦ͍ɼ͜ͷ݁Ռ
    Λݕূ͠·ͨ͠ɽͦͷ݁ՌɼSLIP͸ɼࣗݾεʔύʔϏδϣϯʢઢܗਫ਼౓8.1%૿ʣͱݴޠεʔύʔϏδϣ
    ϯʢθϩγϣοτਫ਼౓5.2%૿ʣͷ྆ํͷར఺ΛڗडͰ͖Δ͜ͱ͕Θ͔ͬͨɽ
    w ໨తɿը૾ͷࣗݾڭࢣֶ͖ͭश͕$-*1Ͱͷݴޠͷࣗݾڭࢣֶ͖ͭशʹ΋ԸܙΛ༩͑Δ͔ௐࠪ
    w ํ๏ɿ$-*1ࣗݾڭࢣ෇ֶ͖श 4JN$-3

    w ੒Ռθϩγϣοτਫ਼౓͕޲্ͨ͠
    w ݻ༗໊ɿ4-*1
    w ஶऀॴଐɿ6$#FSLFMFZ 'BDFCPPL"*3FTFBSDI '"*3

    http://arxiv.org/abs/2112.12750v1

    View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. Top hype: Best10

    View Slide

  69. 1. ϚϧνΫϥε෼ྨΛ༻͍ͨ๏ྫʹ͓͚Δॏཁจͷࣝผ


    (ݪจ: Critical Sentence Identi
    fi
    cation in Legal Cases Using Multi-Class Classi
    fi
    cation)


    ๏཯෼໺Ͱ͸ɺςΩετܗࣜͷ๲େͳσʔλؚ͕·Ε͍ͯ·͢ɻͦͷͨΊɺ
    ͜ͷ෼໺ͷ෼ੳχʔζʹԠ͑ΔͨΊʹ͸ɺࣗવݴޠॲཧʢNatural Language
    Processing: NLPʣͷద༻͕ඞཁͱͳΓ·͢ɻNLPͷਐา͸ɺ࣮༻Խ΍ֶज़ݚ
    ڀͷܗͰɺ๏཯෼໺Λ͸͡Ίͱ͢Δ༷ʑͳྖҬʹ޿͕͍ͬͯ·͢ɻ๏཯ͷઐ
    ໳Ոʹͱͬͯɺૌুʹ͓͚Δॏཁͳจষɺࣄ࣮ɺٞ࿦Λಛఆ͢Δ͜ͱ͸ɺୀ
    ۶ͳ࡞ۀͰ͢ɻຊݚڀͰ͸ɺϚϧνΫϥε෼ྨͷͨΊͷจຒΊࠐΈͷར༻Λ
    ݕ౼͠ɺૌুࣄ݅ͷओཁͳ౰ࣄऀͷ؍఺͔Βɺૌুࣄ݅ʹ͓͚ΔॏཁͳจΛ
    ಛఆ͢Δɻ·ͨɺΧςΰϦʔผͷΫϩεΤϯτϩϐʔଛࣦΛར༻͢Δ͜ͱ
    Ͱɺਫ਼౓Λ޲্ͤ͞ΔͨΊʹɺλεΫݻ༗ͷଛࣦؔ਺Λఆٛ͠·͢ɻ
    w ໨తɿ๏཯ͷઐ໳Ոͷ๏ྫ෼ੳࢧԉ
    w ੒Ռɿ๏ྫͷॏཁจΛϚϧνΫϥε෼ྨ͢ΔϞσϧͷ࣮૷
    w ํ๏ɿ#&35ಠࣗଛࣦؔ਺ఆٛ
    w ݻ༗໊ɿ4FNBOUJD4JNJMBSJUZ4DPSF454
    w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB ϞϥτΡϫେֶ

    http://arxiv.org/abs/2111.05721v2

    View Slide

  70. View Slide

  71. 2. γϯϋϥޠͷηϯνϝϯτΛٻΊͯγϯϋϥޠͷ౤ߘʹର͢ΔFacebookͷ൓ԠΛ༧ଌ͢Δ


    (ݪจ: Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts)
    FacebookͷωοτϫʔΫͰ͸ɺϢʔβʔ͕ςΩετʹର͢Δ൓ԠΛɺײ৘ͷྨܕԽʹΑͬͯه࿥͢Δ
    ͜ͱ͕Ͱ͖·͢ɻ͜ͷωοτϫʔΫ͸ɺେن໛Ͱ͋ΔͨΊɺ஫ऍ෇͖ͷηϯνϝϯτσʔλͷओཁͳ
    σʔληοτͱͳ͍ͬͯ·͢ɻຊ࿦จͰ͸ɺεϦϥϯΧͷจ຺Λத৺ͱͨ͠10೥෼ͷFacebookͷ౤
    ߘσʔλ͔ΒಘΒΕͨ਺ඦສͷ൓ԠΛ༻͍ͯɺΦϯϥΠϯͷγϯϋϥޠςΩετίϯςϯπͷηϯν
    ϝϯτݕग़ʹର͢ΔʮݟΔਓͷ໨ʯͷΞϓϩʔνΛϞσϧԽ͢Δɻ3छྨͷηϯνϝϯτ෼ੳϞσϧ
    ͕ߏங͞Ε͓ͯΓɺϦΞΫγϣϯͷݶఆ͞Εͨαϒηοτɺ͢΂ͯͷϦΞΫγϣϯɺϙδςΟϒ/ωΨ
    ςΟϒͷ੕ධՁ஋Λಋ͖ग़͢Ϟσϧ͕ߟྀ͞Ε͍ͯ·͢ɻͦͯ͠ɺ͜ΕΒͷϞσϧ͕؍࡯ऀͷ൓ԠΛ
    ଊ͑Δͷʹ༗ޮͰ͋Δ͔Ͳ͏͔Λܭࢉ͠ɺٞ࿦ͨ͠ɻ෼ੳͷ݁ՌɺγϯϋϥޠͷίϯςϯπͰ͸ɺϦ
    ΞΫγϣϯͷೋ஋෼ྨ͕ଞͷΞϓϩʔνʹൺ΂ͯஶ͘͠ਖ਼֬Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻ͞ΒʹɺࣅͨΑ
    ͏ͳϦΞΫγϣϯΛؚΊΔͱɺଞͷϦΞΫγϣϯΛਖ਼֬ʹ༧ଌ͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΓ·͢ɻ
    w ໨తɿࢿݯ͕ශࠔͳ஍ҬݴޠɺγϯϋϥޠͷࢿݯΪϟοϓΛຒΊΔͨΊͷ࣮ݧతͳࢼΈ
    w ੒Ռɿγϯϋϥޠͷײ৘༧ଌϞσϧͷ࣮૷
    w ํ๏ɿγϯϋϥޠͷ'BDFCPPL౤ߘͷੜͷίʔύεΛֶशσʔλͱͯ͠࢖͏
    w ݻ༗໊ɿͳ͠
    w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB-*3/&BTJB
    http://arxiv.org/abs/2112.00468v1

    View Slide

  72. View Slide

  73. 3. Plenoxels:χϡʔϥϧωοτϫʔΫͷͳ͍ϥσΟΞϯεϑΟʔϧυ


    (ݪจ: Plenoxels: Radiance Fields without Neural Networks)


    ॏෳ
    http://arxiv.org/abs/2112.05131v1

    View Slide

  74. 4. ͋ͳͨͷ࡞඼Λݟ͍ͤͯͩ͘͞ɻεΫϥονύουʹΑΔݴޠϞσϧΛ࢖ͬͨதڃऀ޲͚ܭࢉػ


    (ݪจ: Show Your Work: Scratchpads for Intermediate Computation with Language Models)


    ֶशࡁΈͷେن໛ͳݴޠϞσϧ͸ɺݱ࣮తͳςΩετͷੜ੒΍ίϯϐϡʔλϓϩά
    ϥϜͷ߹੒ͳͲɺʮϫϯύεʯͰ࣮ߦͰ͖ΔλεΫͰ͸ඇৗʹߴ͍ੑೳΛൃش͢
    Δɻ͔͠͠ɺ੔਺ͷ଍͠ࢉ΍ϓϩάϥϜͷ࣮ߦͳͲɺແݶͷଟஈ֊ܭࢉΛඞཁͱ͢
    ΔλεΫͰ͸ۤઓ͠·͢ɻڻ͘΂͖͜ͱʹɺ͜ΕΒͷϞσϧ͸ɺෳࡶͳଟஈ֊ܭࢉ
    Λɺͨͱ͑਺γϣοτͷྖҬͰ͋ͬͯ΋ɺ్தͷܭࢉ݁ՌΛࣔ͠ͳ͕Βʮεςο
    ϓɾόΠɾεςοϓʯͰ࣮ߦ͢ΔΑ͏ʹٻΊΒΕΔͱɺ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ
    ಛʹɺதؒతͳܭࢉεςοϓΛʮεΫϥονύουʯʹग़ྗ͢ΔΑ͏ʹࢦࣔ͢Δ͜
    ͱͰɺଟஈ֊ͷܭࢉΛ࣮ߦͰ͖ΔΑ͏ʹτϥϯεϑΥʔϚʔΛ܇࿅͠·͢ɻ௕͍଍͠
    ࢉ͔Β೚ҙͷϓϩάϥϜͷ࣮ߦ·ͰɺঃʑʹෳࡶʹͳΔҰ࿈ͷλεΫʹ͓͍ͯɺεΫ
    ϥονύου͕ݴޠϞσϧͷଟஈ֊ܭࢉͷೳྗΛܶతʹ޲্ͤ͞Δ͜ͱΛࣔͨ͠ɻ
    w ໨తɿϓϩάϥϛϯάݴޠͳͲͷܭࢉॲཧͷλεΫֶशվળ
    w ํ๏ɿεΫϥονύουʹεςοϓͣͭதؒͷ݁ՌΛࣔͯ͠5SBOTGPSNFSΛֶशͤ͞Δ
    w ੒ՌɿҰ౓ʹ݁ՌΛਪଌ͠Α͏ͱࣦͯ͠ഊ͢ΔιʔείʔυͰ΋εςοϓόΠεςοϓͳΒਖ਼
    ࣮͘͠ߦͰ͖Δ͜ͱ͕Θ͔ͬͨ
    w ݻ༗໊ɿͳ͠
    w ஶऀॴଐɿ.*5(PPHMF3FTFBSDI #SBJO5FBN#MVFTIJGU5FBN

    http://arxiv.org/abs/2112.00114v1

    View Slide

  75. μΠϨΫτֶशͱεςοϓֶशͷྫ

    View Slide

  76. ྫ: ࢉ਺ͷܭࢉࣜ(଍͠ࢉ)
    • ܻ͝ͱʹ଍ͨ݁͠ՌΛࣔ͢(C͸܁Γ্͕Γ)

    View Slide

  77. ྫ: ࢉ਺ͷܭࢉࣜ(ଟ߲ࣜͷܭࢉ)
    • ߲͝ͱʹxΛ୅ೖͯ͠ܭࢉͨ݁͠ՌΛࣔ͢


    • μΠϨΫτʹൺ΂ͯϑΝΠϯνϡʔχϯά݁
    Ռ͕ 31.8 -> 50.7%ʹ޲্

    View Slide

  78. ྫ: Python ϓϩάϥϜ
    • μΠϨΫτʹൺ΂ͯϑΝΠϯνϡʔχϯά݁
    Ռ͕ 20% -> 41.5%ʹ޲্

    View Slide

  79. 5. BANMo: ଟ͘ͷΧδϡΞϧϏσΦ͔ΒΞχϝʔγϣϯՄೳͳ3DਆܦϞσϧΛߏங͢Δ


    (ݪจ: BANMo: Building Animatable 3D Neural Models from Many Casual Videos)


    ଟؔઅܕͷ3࣍ݩܗঢ়෮ݩͷͨΊͷઌߦݚڀ͸ɺଟ͘ͷ৔߹ɺಛघͳηϯαʔʢྫɿಉظͨ͠ϚϧνΧϝϥγεςϜʣ΍ɺ͋Β
    ͔͡Ίߏங͞Εͨ3࣍ݩมܗϞσϧʢྫɿSMAL΍SMPLʣʹґଘ͍ͯ͠·͢ɻ͜ͷΑ͏ͳख๏͸ɺࣗવքͷଟ༷ͳ෺ମͷηοτ
    ʹରԠ͢Δ͜ͱ͕Ͱ͖·ͤΜɻBANMo͸ɺಛघͳηϯαʔ΍ࣄલʹఆٛ͞ΕͨςϯϓϨʔτܗঢ়Λඞཁͱ͠ͳ͍ํ๏Ͱ͢ɻ
    BANMo͸ɺඍ෼ՄೳͳϨϯμϦϯάϑϨʔϜϫʔΫΛ༻͍ͯɺଟ͘ͷ୯؟ΧδϡΞϧϏσΦ͔Βɺߴ஧࣮౓Ͱؔઅͷ͋Δ3DϞ
    σϧʢܗঢ়ͱΞχϝʔγϣϯՄೳͳεΩχϯά΢ΣΠτΛؚΉʣΛߏங͠·͢ɻଟ͘ͷϏσΦΛ࢖༻͢Δ͜ͱͰɺΧϝϥϏϡʔ
    ͱΦϒδΣΫτͷΞʔςΟΩϡϨʔγϣϯΛΑΓଟ͘Χόʔ͢Δ͜ͱ͕Ͱ͖·͕͢ɺഎܠ΍র໌৚݅ͳͲ͕ҟͳΔγʔϯؒͷର
    Ԡؔ܎Λཱ֬͢Δ͜ͱʹେ͖ͳ՝୊͕͋Γ·͢ɻզʑͷॏཁͳಎ࡯͸ɺʢ1ʣؔઅࠎͱϒϨϯυεΩχϯάΛར༻ͨ͠ݹయతͳ
    มܗՄೳͳܗঢ়Ϟσϧɺʢ2ʣޯ഑ϕʔεͷ࠷దԽʹదͨ͠ମੵχϡʔϥϧϥδΞϯεϑΟʔϧυʢNeRFʣɺʢ3ʣϐΫηϧͱ
    ؔઅϞσϧͷؒͷରԠؔ܎Λੜ੒͢Δਖ਼४ຒΊࠐΈɺͱ͍͏3ͭͷྲّྀΛ౷߹͢Δ͜ͱͰ͢ɻຊݚڀͰ͸ɺඍ෼Մೳ͓Αͼ൓స
    ՄೳͳؔઅมܗΛՄೳʹ͢ΔχϡʔϥϧϒϨϯυεΩχϯάϞσϧΛಋೖͨ͠ɻ͜ͷΑ͏ͳϞσϧΛਖ਼४ຒΊࠐΈͱ૊Έ߹Θͤ
    Δ͜ͱͰɺϏσΦؒͷີͳରԠؔ܎Λཱ֬͢Δ͜ͱ͕Ͱ͖ɺαΠΫϧҰ؏ੑΛ࣋ͬͨࣗݾڭࢣԽ͕ՄೳͱͳΔɻBANMo͸ɺ࣮
    σʔλ͓Αͼ߹੒σʔλʹ͓͍ͯɺਓؒ΍ಈ෺Λର৅ͱͨ͠ઌߦݚڀΑΓ΋ߴ͍஧࣮౓ͷ3D࠶ߏ੒Λࣔ͠ɺ৽͍͠ࢹ఺΍ϙʔζ
    ͔ΒϦΞϧͳը૾ΛϨϯμϦϯά͢ΔೳྗΛඋ͍͑ͯ·͢ɻϓϩδΣΫτͷ΢Σϒϖʔδɿbanmo-www.github.io
    w ໨తɿҰൠతͳಈը͔Βɺ̏%ը૾σʔλΛ࡞Δ
    w ੒Ռɿਓؒͱ࢛଍าߦಈ෺ʢೣͳͲʣͷ̏࣍ݩը૾σʔλΛ࡞ΔϞσϧΛ։ൃͨ͠
    w ํ๏ɿಉҰͷਓ෺ɾಈ෺ʹ͍ͭͯͷෳ਺ಈը͔Βͷ%FOTF1PTF$4&ֶश
    w ݻ༗໊ɿ#"/.P
    w ஶऀॴଐɿ.FUB"*$BSOFHJF.FMMPO6OJWFSTJUZ.FUB3FBMJUZ-BCT
    http://arxiv.org/abs/2112.12761v2

    View Slide

  80. ಛघͳΧϝϥΛ࢖Θͣɺී௨ͷϏσΦө૾͔
    ΒΞχϝʔγϣϯՄೳͳ̏࣍ݩσʔλΛ࡞Δ

    View Slide

  81. View Slide

  82. View Slide

  83. ϏσΦͷϑϨʔϜ(ίϚ)Λֶश͢Δ΄
    Ͳɺ̏࣍ݩϞσϧ͕ਖ਼֬ʹͳ͍ͬͯ͘

    View Slide

  84. 6. GLIDE:ςΩετ༠ಋܕ֦ࢄϞσϧʹΑΔϑΥτϦΞϦεςΟοΫͳը૾ੜ੒ɾฤूΛ໨ࢦͯ͠


    (ݪจ: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion
    Models)
    ॏෳ
    http://arxiv.org/abs/2112.10741v2

    View Slide

  85. 7. δΦϝτϦΛߟྀͨ͠ޮ཰తͳ3࣍ݩੜ੒Adversarial Networks


    (ݪจ: Ef
    fi
    cient Geometry-aware 3D Generative Adversarial Networks)


    PickUp
    http://arxiv.org/abs/2112.07945v1

    View Slide

  86. 8. ήʔϜͷϓϨΠϠʔ


    (ݪจ: Player of Games)


    ήʔϜ͸௕͍ؒɺਓ޻஌ೳͷਐาͷࢦඪͱͯ͠༻͍ΒΕ͖ͯ·ͨ͠ɻ࠷ۙͰ͸ɺ୳ࡧͱֶशΛ༻͍ͨ
    Ξϓϩʔν͕ɺҰ࿈ͷ׬શ৘ใήʔϜʹ͓͍ͯڧྗͳੑೳΛ͓ࣔͯ͠ΓɺήʔϜཧ࿦తͳਪ࿦ͱֶश
    Λ༻͍ͨΞϓϩʔν͸ɺಛఆͷෆ׬શ৘ใϙʔΧʔͷมछʹ͓͍ͯڧྗͳੑೳΛ͍ࣔͯ͠ΔɻPlayer
    of Games͸ɺΨΠυ෇͖୳ࡧɺࣗݾֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨɺ͜Ε·ͰͷΞϓϩʔν
    Λ౷߹ͨ͠൚༻తͳΞϧΰϦζϜΛ঺հ͢ΔɻPlayer of Gamesʯ͸ɺେن໛ͳ׬શɾෆ׬શ৘ใήʔ
    Ϝʹ͓͍ͯɺܦݧతʹڧྗͳύϑΥʔϚϯεΛୡ੒ͨ͠ॳΊͯͷΞϧΰϦζϜͰ͋Γɺ೚ҙͷ؀ڥʹ
    ରԠ͢ΔਅͷҙຯͰͷ൚༻ΞϧΰϦζϜʹ޲͚ͨॏཁͳҰาͱͳΔɻ͜ͷΞϧΰϦζϜ͸ɺ೚ҙͷ؀
    ڥʹରͯ͠ਅʹҰൠతͳΞϧΰϦζϜΛఏڙ͢ΔͨΊͷॏཁͳεςοϓͰ͋ΔɻPlayer of Games͸ɺ
    νΣεͱғޟͰڧྗͳੑೳΛൃش͠ɺϔουΞοϓɾϊʔϦϛοτɾςΩαεɾϗʔϧσϜɾϙʔΧʔ
    Ͱެ։͞Ε͍ͯΔ࠷ڧͷΤʔδΣϯτʢSlumbotʣΛഁΓɺΨΠυ෇͖୳ࡧɺֶशɺήʔϜཧ࿦తਪ
    ࿦ͷՁ஋Λࣔ͢ෆ׬શ৘ใήʔϜͰ͋ΔScotland Yardͷ࠷ઌ୺ͷΤʔδΣϯτΛഁͬͨɻ
    w ໨తɿ׬શɾෆ׬શ৘ใήʔϜʹద༻Ͱ͖Δ൚༻ήʔϜΞϧΰϦζϜͷݚڀ
    w ੒Ռɿ୳ࡧɺֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨ౷ҰΞϧΰϦζϜʮ1P(ʯͷ։ൃ
    w ํ๏ɿ(5$'3 HSPXJOHUSFFDPVOUFSGBDUVBMSFHSFUNJOJNJ[BUJPO
    α΢ϯυηϧϑϓϨΠ
    w ݻ༗໊ɿ1P( 1MBZFSPG(BNFT

    w ஶऀॴଐɿ%FFQ.JOE
    http://arxiv.org/abs/2112.03178v1

    View Slide

  87. GT-CFR: growing-tree counterfactual regret
    minimization


    ੒௕໦ ൓࣮త ޙչ ࠷খԽ๏

    View Slide

  88. View Slide

  89. 9. NL-Augmenter:λεΫʹԠͨࣗ͡વݴޠ֦ுͷͨΊͷϑϨʔϜϫʔΫ


    (ݪจ: NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation)
    σʔλΦʔάϝϯςʔγϣϯ͸ɺࣗવݴޠॲཧ(NLP)ʹ͓͚ΔϞσϧͷϩόετੑධՁ΍ɺ
    ֶशσʔλͷଟ༷ੑΛߴΊΔͨΊͷॏཁͳཁૉͰ͋Δɻຊ࿦จͰ͸ɺPythonϕʔεͷࢀՃ
    ܕࣗવݴޠॲཧϑϨʔϜϫʔΫͰ͋ΔNL-AugmenterΛ঺հ͠·͢ɻ͜ͷϑϨʔϜϫʔΫ
    ͸ɺม׵ʢσʔλͷमਖ਼ʣͱϑΟϧλʢಛఆͷಛ௃ʹԠͨ͡σʔλͷ෼ׂʣͷ྆ํͷ࡞੒
    Λαϙʔτ͠·͢ɻ͜ͷϑϨʔϜϫʔΫͱɺ༷ʑͳࣗવݴޠλεΫͷͨΊͷ117ͷม׵ͱ
    23ͷϑΟϧλͷॳظηοτʹ͍ͭͯઆ໌͢Δɻ·ͨɺ͍͔ͭ͘ͷม׵Λ༻͍ͯҰൠతͳࣗ
    વݴޠϞσϧͷϩόετੑΛ෼ੳ͢Δ͜ͱͰɺNL-Augmenterͷ༗ޮੑΛ࣮ূ͢ΔɻΠϯ
    ϑϥετϥΫνϟʔɺσʔλΧʔυɺϩόετωε෼ੳ݁Ռ͸ɺNL-AugmenterͷϦϙδ
    τϦ https://github.com/GEM-benchmark/NL-Augmenter Ͱެ։͞Ε͍ͯ·͢ɻ
    w ໨తɿࣗવݴޠॲཧͷϩόετੑධՁɾσʔλͷଟ༷ੑΛߴΊΔ
    w ੒ՌɿࢀՃܕࣗવݴޠॲཧ"VHVNFOUBUJPOϑϨʔϜϫʔΫͷެ։
    w ํ๏ɿλεΫʹԠͨ͡ม׵ॲཧ܈ͱɺಛ௃ʹσʔλ෼ׂͷͨΊͷϑΟϧλ܈ͷఏڙ
    w ݻ༗໊ɿ/-"VHVNFOUFS
    w ஶऀॴଐɿ(PPHMF#SBJO(PPHMF3FTFBSDIଞଟ਺ ౦ژେֶͳͲ

    http://arxiv.org/abs/2112.02721v1

    View Slide

  90. ྫ: John likes expensive Italian pizzas ͷ
    Augmentation.

    View Slide

  91. ม׵ॲཧ(117छྨҎ্)

    View Slide

  92. ϑΟϧλॲཧ(23छྨҎ্)

    View Slide

  93. IUUQTHFNCFODINBSLDPN

    View Slide

  94. 10. FuseDream:CLIP+GANۭؒͷ࠷దԽʹΑΔֶशෆཁͷςΩετը૾ੜ੒γεςϜ


    (ݪจ: FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space
    Optimization)


    ࣗવݴޠʹΑΔ໋ྩ͔Βը૾Λੜ੒͢Δ͜ͱ͸ɺڵຯਂ͘΋ඇৗʹࠔ೉ͳ՝୊Ͱ͢ɻզʑ͸ɺ࠶ֶश͞ΕͨCLIPදݱͷྗͱط੡ͷը૾
    ੜ੒ثʢGANʣΛ૊Έ߹ΘͤΔ͜ͱͰɺςΩετ͔Βը૾΁ͷੜ੒ʹΞϓϩʔν͍ͯ͠·͢ɻGANͷજࡏۭؒͰ࠷దԽΛߦ͍ɺ༩͑
    ΒΕͨೖྗςΩετͰ࠷େͷCLIPείΞΛୡ੒͢Δը૾Λݟ͚ͭग़͠·͢ɻςΩετ͔Βը૾΁ͷੜ੒ϞσϧΛθϩ͔Βֶश͢Δैདྷ
    ͷख๏ͱൺֱͯ͠ɺCLIP+GANͷΞϓϩʔν͸ɺֶशෆཁɺθϩγϣοτͰɺҟͳΔδΣωϨʔλͰ؆୯ʹΧελϚΠζ͢Δ͜ͱ͕Ͱ
    ͖·͢ɻ ͔͠͠ɺGANۭؒͰCLIPείΞΛ࠷దԽ͢Δ͜ͱ͸ඇৗʹࠔ೉ͳ࠷దԽ໰୊Λ౤͔͚͓͛ͯΓɺAdamͳͲͷط੡ͷΦϓ
    ςΟϚΠβʔͰ͸ຬ଍ͷ͍݁͘ՌΛಘΔ͜ͱ͕Ͱ͖ͳ͍ɻຊݚڀͰ͸ɺFuseDreamύΠϓϥΠϯΛఏҊ͠ɺCLIP+GANΞϓϩʔνΛ3
    ͭͷॏཁͳٕज़Ͱվળ͠·͢ɻ1ʣAugCLIPείΞɿը૾ʹϥϯμϜͳ֦ுΛՃ͑Δ͜ͱͰɺCLIPͷ໨తΛϩόετԽ͢Δɻ2) ࠷దԽ
    ͷͨΊͷ৽͍͠ॳظԽ͓ΑͼΦʔόʔύϥϝʔλԽઓུʹΑΓɺGANۭؒʹ͓͚Δඇತͷ஍ܗΛޮ཰తʹφϏήʔτ͢Δ͜ͱ͕Ͱ͖
    Δɻ3) ৽نͷೋஈ֊࠷దԽํࣜΛར༻ͯ͠ɺෳ਺ͷը૾Λ߹੒͠ɺGANۭؒΛ֦ுͯ͠σʔλόΠΞεΛࠀ෰͢Δ߹੒ੜ੒ٕज़ɻ
    FuseDream͸ɺҟͳΔೖྗςΩετʹΑͬͯଅਐ͞Εͨ৔߹ɺ༷ʑͳΦϒδΣΫτɺഎܠɺܳज़తελΠϧɺ͞Βʹ͸զʑ͕࢖༻͢Δ
    GANͷτϨʔχϯάσʔλʹ͸ݱΕͳ͍৽͍͠൓࣮ࡏͷίϯηϓτΛ࣋ͭߴ඼࣭ͷը૾Λੜ੒͢Δ͜ͱ͕Ͱ͖Δɻఆྔతʹ͸ɺ
    FuseDreamʹΑͬͯੜ੒͞Εͨը૾͸ɺΞʔΩςΫνϟͷઃܭ΍τϨʔχϯάΛ௥Ճ͢Δ͜ͱͳ͘ɺMS COCOσʔληοτͰτοϓ
    ϨϕϧͷInceptionείΞͱFIDείΞΛಘΔ͜ͱ͕Ͱ͖·͢ɻզʑͷίʔυ͸ https://github.com/gnobitab/FuseDream Ͱެ։͞Ε
    ͍ͯ·͢ɻ
    w ໨తɿࣗવݴޠ͔Βͷը૾ੜ੒Ϟσϧ$-*1("/ͷվྑ
    w ੒Ռɿτοϓ.4$0$0σʔληοτͰτοϓϨϕϧͷ*ODFQUJPO '*%είΞΛಘͨ
    w ํ๏ɿ"VH$-*1είΞɺΦʔόʔύϥϝʔλԽɺೋஈ֊࠷దԽΛ௥Ճ
    w ݻ༗໊ɿ'VTF%SFBN
    w ஶऀॴଐɿςΩαεେֶΦʔεςΟϯߍΧϦϑΥϧχΞେֶαϯσΟΤΰߍ
    http://arxiv.org/abs/2112.01573v1

    View Slide

  95. CLIP+GAN ΍ BigSleepͱͷൺֱ

    View Slide

  96. View Slide

  97. DeepL Translator (deepl.com)
    https://www.deepl.com/en/translator

    View Slide