Slide 1

Slide 1 text

AI࠷৽࿦จಡΈձ2022೥1݄ ᷂tech vein ழມ ॆԝ

Slide 2

Slide 2 text

ࣗݾ঺հ ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ) גࣜձࣾ tech vein / DeepRad גࣜձࣾ ֤୅දऔక໾ ݉ σϕϩούʔ twitter: @ino2222

Slide 3

Slide 3 text

Facebook άϧʔϓͷ঺հ IUUQTXXXGBDFCPPLDPNHSPVQT

Slide 4

Slide 4 text

ΞδΣϯμ Archive Sanity (arxiv-sanity.com) ͔ΒϐοΫΞο ϓͨ͠ɺarxiv.org ͷաڈ1ϲ݄ؒͷ࿦จ঺հɻ ɾҰ൪ؾʹͳͬͨ࿦จͷ঺հ ɾtop recentͷ࿦จτοϓ10 Ϧετ ɾtop hype ͷ࿦จτοϓ10 Ϧετ

Slide 5

Slide 5 text

Archive Sanity? https://www.arxiv-sanity.com/top

Slide 6

Slide 6 text

໨࣍

Slide 7

Slide 7 text

Top10 Recent 1. Plenoxels: Radiance Fields without Neural Networks 2. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training

Slide 8

Slide 8 text

Top10 Hype 1. Critical Sentence Identi fi cation in Legal Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization 1JDL6Q

Slide 9

Slide 9 text

Top10 Recent (ςʔϚผ) 1. Plenoxels: Radiance Fields without Neural Networks 2. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training CLIP CLIP NeRF NeRF Video Video NLP SSL SSL SSL Transformer Transformer Transformer Transformer

Slide 10

Slide 10 text

Top10 Hype (ςʔϚผ) 1. Critical Sentence Identi fi cation in Legal Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization CLIP NeRF Video NLP NLP NLP NLP Transformer GAN GAN NeRF CLIP NeRF

Slide 11

Slide 11 text

Pick up

Slide 12

Slide 12 text

Top Hype 7. δΦϝτϦΛߟྀͨ͠ޮ཰తͳ3࣍ݩੜ੒Adversarial Networks (ݪจ: Ef fi cient Geometry-aware 3D Generative Adversarial Networks) ୯؟ͷ2DࣸਅΛ༻͍ͯɺଟ؟తʹ੔߹ੑͷ͋Δߴ඼࣭ͳը૾΍3Dܗঢ়Λڭࢣͳ͠Ͱੜ੒͢Δ͜ͱ ͸ɺ௕೥ͷ՝୊Ͱͨ͠ɻطଘͷ3࣍ݩGAN͸ɺܭࢉྔ͕ଟ͍͔ɺ3࣍ݩతʹ੔߹ੑͷͳ͍ۙࣅΛߦ ͏͔ͷ͍ͣΕ͔Ͱ͋Γɺલऀ͸ੜ੒͞ΕΔը૾ͷ඼࣭ͱղ૾౓Λ੍ݶ͠ɺޙऀ͸ଟࢹ఺ͷ੔߹ੑ ͱܗঢ়ͷ඼࣭ʹѱӨڹΛ༩͑ΔɻຊݚڀͰ͸ɺ͜ΕΒͷۙࣅʹա౓ʹґଘ͢Δ͜ͱͳ͘ɺ3D GAN ͷܭࢉޮ཰ͱը࣭Λ޲্ͤ͞Δɻ͜ͷ໨తͷͨΊʹɺզʑ͸දݱྗ๛͔ͳ໌ࣔత-ඇ໌ࣔతϋΠϒ ϦουωοτϫʔΫΞʔΩςΫνϟΛಋೖ͠ɺଞͷઃܭ্ͷબ୒ͱ߹Θͤͯɺߴղ૾౓ͷଟࢹ఺Ұ ؏ੑͷ͋Δը૾ΛϦΞϧλΠϜͰ߹੒͢Δ͚ͩͰͳ͘ɺߴ඼࣭ͷ3Dܗঢ়Λੜ੒͠·͢ɻಛ௃ྔͷ ੜ੒ͱχϡʔϥϧϨϯμϦϯάΛ੾Γ཭͢͜ͱͰɺզʑͷϑϨʔϜϫʔΫ͸StyleGAN2ͷΑ͏ͳ࠷ ઌ୺ͷ2D CNNδΣωϨʔλΛ׆༻͠ɺͦͷޮ཰ੑͱදݱྗΛड͚ܧ͙͜ͱ͕Ͱ͖·͢ɻFFHQͱ AFHQ CatsΛ༻͍࣮ͨݧͳͲʹΑΓɺ࠷ઌ୺ͷ3DରԠ߹੒Λ࣮ূ͍ͯ͠·͢ɻ w ໨తɿ%ࣸਅ͔Β࣍ݩΛߟྀͨ͠%ࣸਅΛੜ੒͢Δ("/ͷੑೳ޲্ w ੒Ռɿਓؒɾಈ෺ͷإࣸਅΛཱମతʹॲཧͯ͠ը૾Λੜ੒͢Δ("/ͷ4P5"Λ։ൃ w ํ๏ɿ4UZMF("/7PMVNF3FOEFSFS w ݻ༗໊ɿ&(% & ff i DJFOU(FPNFUSZBXBSF%(FOFSBUJWF"EWFSTBSJBM/FUXPSLT w ஶऀॴଐɿ4UBOGPSE6OJWFSTJUZ/7*%*" http://arxiv.org/abs/2112.07945v1

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

σϞಈը(URLࢀর) IUUQTNBUUIFXBDIBOHJUIVCJP&(%

Slide 15

Slide 15 text

EG3D GAN Framework (StyleGAN2ϕʔε)

Slide 16

Slide 16 text

Tri-Plane ͸3ํ޲͔Βͷը૾(΍ಛ௃)Λݩʹ NeRFͷΑ͏ʹ࠲ඪຖͷີ౓ɾ৭Λਪଌ͢Δ

Slide 17

Slide 17 text

[෮श] NeRF IUUQTBSYJWPSHBCT

Slide 18

Slide 18 text

NeRF vs Tri-Plane Tri-Plane͸MipNeRFΑΓܰྔͰߴੑೳ

Slide 19

Slide 19 text

NeRFͱTriPlaneͷϞσϧαΠζɾ଎౓ͷൺֱ (SSO: 3ํ޲ͷը૾, GAN:3ํ޲ͷStyleGANͷಛ௃σʔλ)

Slide 20

Slide 20 text

௿ղ૾౓ը૾ͷੜ੒ޙɺ SupervisonͰղ૾౓Λ্͍͛ͯΔ

Slide 21

Slide 21 text

ߴ଎ɾߴੑೳɻ RTX3090 GPUͰ26~36FPS

Slide 22

Slide 22 text

Top recent: Best10

Slide 23

Slide 23 text

1. Plenoxels:χϡʔϥϧωοτϫʔΫͷͳ͍ϥσΟΞϯεϑΟʔϧυ (ݪจ: Plenoxels: Radiance Fields without Neural Networks) ϑΥτϦΞϦεςΟοΫͳϏϡʔ߹੒ͷͨΊͷγεςϜɺPlenoxels (plenoptic voxels)Λ঺հ͠·͢ɻPlenoxels͸ɺγʔϯΛٿ໘ௐ࿨ͷ ͋Δૄͳ3࣍ݩάϦουͱͯ͠දݱ͠·͢ɻ͜ͷදݱ͸ɺΩϟϦϒ Ϩʔγϣϯ͞Εͨը૾͔Βɺޯ഑๏ͱਖ਼ଇԽʹΑͬͯɺਆܦ੒෼Λ ؚ·ͣʹ࠷దԽ͢Δ͜ͱ͕Ͱ͖·͢ɻඪ४తͳϕϯνϚʔΫλεΫ ʹ͓͍ͯɺPlenoxels͸Neural Radiance FieldsΑΓ΋2ܻҎ্ߴ଎ʹ ࠷దԽ͞Εɺࢹ֮తͳ඼࣭Λଛͳ͏͜ͱ͸͋Γ·ͤΜɻ w ໨తɿϏϡʔ߹੒γεςϜͷվળ w ੒Ռɿ/F3'ͱಉਫ਼౓Ͱɺഒ଎ֶ͘शͰ͖ΔϞσϧ͕Ͱ͖ͨ w ํ๏ɿ/F3'ΞϧΰϦζϜΛࢀߟʹඇχϡʔϥϧωοτͰ࠶࣮૷ͨ͠ 574) w ݻ༗໊ɿ1MFOPYFMT QMFOPQUJDWPYFMT w ஶऀॴଐɿ6$#FSLFMFZ http://arxiv.org/abs/2112.05131v1

Slide 24

Slide 24 text

Plenoxels: Neural NetworkΛ࢖Θͣʹ Radiance FieldsΛॲཧ͢Δ͜ͱͰߴ଎Խͨ͠

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

d

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

GitHubͰެ։͞Ε͍ͯΔ ಋೖࢀߟهࣄIUUQTQZUIPOSFQPDPNSFQPTYZVTWPYQZUIPOEFFQMFBSOJOH IUUQTHJUIVCDPNTBSBGSJEPWQMFOPYFMT

Slide 29

Slide 29 text

2. GLIDE:ςΩετ༠ಋܕ֦ࢄϞσϧʹΑΔϑΥτϦΞϦεςΟοΫͳը૾ੜ੒ɾฤूΛ໨ࢦͯ͠ (ݪจ: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models) ֦ࢄϞσϧ͸ɺಛʹɺଟ༷ੑͱ஧࣮ੑΛτϨʔυΦϑ͢ΔΨΠμϯεٕज़ͱ૊Έ߹Θͤͨ৔߹ʹɺ ߴ඼࣭ͷ߹੒ը૾Λੜ੒͢Δ͜ͱ͕࠷ۙࣔ͞Ε͍ͯΔɻຊݚڀͰ͸ɺςΩετΛ৚݅ͱͨ͠ը૾߹ ੒ͷ໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺ2ͭͷҟͳΔΨΠμϯεઓུΛൺֱ͢ΔɻຊݚڀͰ͸ɺς Ωετ৚݅෇͖ը૾߹੒໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺCLIPΨΠμϯεͱ෼ྨثͳ͠ΨΠμϯ εͱ͍͏2ͭͷΨΠμϯεઓུΛൺֱͨ͠ɻͦͷ݁Ռɺޙऀͷํ͕ɺ࣮ࣸੑͱΩϟϓγϣϯͷྨࣅ ੑͷ྆ํʹ͓͍ͯਓؒͷධՁऀʹ޷·Εɺ࣮ࣸతͳαϯϓϧΛੜ੒͢Δ͜ͱ͕ଟ͍͜ͱ͕Θ͔ͬ ͨɻ·ͨɺ35ԯݸͷύϥϝʔλΛ࣋ͭςΩετ৚݅෇͖֦ࢄϞσϧΛ༻͍ͯɺ෼ྨثෆཁͷΨΠμ ϯεΛߦͬͨ৔߹ɺDALL-EͷαϯϓϧΑΓ΋ਓؒͷධՁऀʹ޷·ΕΔ͜ͱ͕෼͔Γ·ͨ͠ɻ͞Β ʹɺ͜ͷϞσϧΛඍௐ੔ͯ͠ը૾ͷΠϯϖΠϯςΟϯάΛߦ͏͜ͱͰɺςΩετۦಈܕͷڧྗͳը ૾ฤू͕ՄೳʹͳΔ͜ͱ͕Θ͔Γ·ͨ͠ɻϑΟϧλϦϯά͞ΕͨσʔληοτͰΑΓখ͞ͳϞσϧ Λ܇࿅͠ɺͦͷίʔυͱॏΈΛ https://github.com/openai/glide-text2im Ͱެ։͠·ͨ͠ɻ w ໨తɿUFYUUPJNBHFϞσϧͷվྑ w ੒ՌɿςΩετ͔Βͷը૾ͷੜ੒ɾ෦෼ฤू͕ՄೳͳߴੑೳϞσϧΛ։ൃ 
 ϑϧαΠζ͸ߴੑೳ͗͢ΔͷͰɺෆਖ਼๷ࢭ༻ͷϑΟϧλ͋ΓσʔληοτͱখܕϞσϧͷΈެ։ w ํ๏ɿ$-*1%J ff VTJPO.PEFM w ݻ༗໊ɿ(-*%& (VJEFE-BOHVBHFUP*NBHF%J ff VTJPOGPS(FOFSBUJPOBOE&EJUJOH w ஶऀॴଐɿ0QFO"* http://arxiv.org/abs/2112.10741v2

Slide 30

Slide 30 text

Diffusion Model ཭ࢄతͳঢ়ଶͷϊΠζ༧ଌΛֶशͯ͠ɺλεΫಛԽͷֶ शͳ͠ʹϚϧνλεΫʹద༻Ͱ͖Δσʔλੜ੒Ϟσϧ

Slide 31

Slide 31 text

CLIP Text to image

Slide 32

Slide 32 text

GLIDE=CLIP + Diffusion Model

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

ςΩετͰͷը૾ੜ੒ʴ ςΩετʹΑΔ෦෼ฤू <ը૾ੜ੒>ډ৺஍ͷྑ͍ϦϏϯά <ϚεΫՃ޻>ιϑΝʔ্෦ͷนʹίʔΪʔͷֆ <ϚεΫՃ޻>ιϑΝʔલʹؙܕͷίʔώʔςʔϒϧ <ϚεΫՃ޻>ίʔώʔςʔϒϧͷ্ʹՖළ <ϚεΫՃ޻>ιϑΝʔ͕෦԰ͷ֯ʹ͋Δ

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

3. Uni-Perceiver:θϩγϣοτ͓Αͼ਺γϣοτͷλεΫͷͨΊͷ൚༻తͳ஌֮ͷͨΊͷࣄલֶश༻౷߹ΞʔΩςΫνϟ (ݪจ: Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks) ಈ෺ͷੜ෺ֶతͳ஌ೳγεςϜ͸ɺҟͳΔϞμϦςΟͷ৘ใΛ౷߹͠ɺ༷ʑͳλεΫͷͨΊʹಉ࣌ʹॲཧ͢Δ͜ͱͰ ੈքΛೝ͍ࣝͯ͠·͢ɻҰํɺݱࡏͷػցֶशͷݚڀ͸ɺλεΫʹಛԽͨ͠ύϥμΠϜʹै͍ͬͯΔͨΊɺλεΫؒ ͷඇޮ཰ͳ࿈ܞ΍ɺ৽͍͠λεΫͷͨΊͷ஌֮Ϟσϧͷ։ൃʹ͔͔Δߴ͍ݶքίετʹͭͳ͕͍ͬͯΔɻຊ࿦จͰ ͸ɺUni-Perceiverͱ໊෇͚ΒΕͨ൚༻తͳ஌֮ΞʔΩςΫνϟΛ঺հ͢ΔɻUni-Perceiver͸ɺ౷Ұ͞ΕͨϞσϦϯά ͱڞ༗ύϥϝʔλͰ༷ʑͳϞμϦςΟͱλεΫΛॲཧ͢Δɻ۩ମతʹ͸ɺUni-Perceiver͸ɺ೚ҙͷϞμϦςΟ͔Βͷ ҟͳΔλεΫೖྗͱλʔήοτΛɺϞμϦςΟʹͱΒΘΕͳ͍TransformerΤϯίʔμͱܰྔͳϞμϦςΟݻ༗ͷ τʔΫϯԽثΛ༻͍ͯɺ౷Ұ͞ΕͨදݱۭؒʹΤϯίʔυ͠·͢ɻҟͳΔ஌֮λεΫ͸ɺಉ͡ఆࣜԽͱͯ͠ϞσϧԽ ͞Ε·͢ɻͭ·ΓɺͦΕͧΕͷೖྗʹର͢Δ࠷େ໬౓ͷλʔήοτΛɺදݱͷྨࣅੑΛ௨ͯ͠ݟ͚ͭΔͷͰ͢ɻ͜ͷ Ϟσϧ͸ɺ͍͔ͭ͘ͷϢχϞʔμϧ͓ΑͼϚϧνϞʔμϧͳλεΫͰࣄલʹֶश͞Εɺࣄલֶशͷஈ֊Ͱ͸ొ৔͠ͳ ͔ͬͨ৽نλεΫΛؚΉɺ͞·͟·ͳԼྲྀλεΫͰධՁ͞ΕΔɻͦͷ݁Ռɺνϡʔχϯάͳ͠Ͱࣄલֶशͨ͠Ϟσϧ ͸ɺ৽نλεΫͰ͋ͬͯ΋ଥ౰ͳੑೳΛୡ੒Ͱ͖Δ͜ͱ͕Θ͔ͬͨɻ·ͨɺԼྲྀλεΫͷ1%ͷσʔλʹରͯ͠ਝ଎ͳ νϡʔχϯάΛߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ʹ͍ۙϨϕϧ·ͰੑೳΛ޲্ͤ͞Δ͜ͱ͕Ͱ͖Δɻ͞ΒʹɺϑϧσʔλͰ ͷඍௐ੔Λߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ͱಉ౳Ҏ্ͷ݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻίʔυΛެ։͠·͢ɻ w ໨తɿ5SBOTGPSNFSͷϚϧνλεΫֶशɾར༻ͷվળ w ੒Ռɿଟ༷ͳೖग़ྗܗࣜʹରԠͨ͠5SBOTGPSNFSʮ6OJ1FSDFJWFSʯͷ։ൃ w ํ๏ɿ5SBOTGPSNFSϞμϦςΟݻ༗ͷτʔΫϯԽث w ݻ༗໊ɿ6OJ1FSDFJWFS w ஶऀॴଐɿ4FOTF5JNF3FTFBSDI੢҆ަ௨େֶ߳ߓதจେֶ http://arxiv.org/abs/2112.01522v1

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

ը૾ɾςΩετɾϏσΦΛڞ௨ ॲཧͰ͖ΔܗࣜʹτʔΫϯԽ

Slide 39

Slide 39 text

ϚϧνϞμϦςΟͰࣄલֶशͨ͠Ϟσϧɻ ϚϧνλεΫͰνϡʔχϯάͯ͠ར༻Ͱ͖Δ

Slide 40

Slide 40 text

4. ϚεΫ͞Εͨಛ௃ͷ༧ଌʹΑΔࣗݾڭࢣ෇͖ࢹ֮త༧උ܇࿅ (ݪจ: Masked Feature Prediction for Self-Supervised Visual Pre-Training) զʑ͸ɺϏσΦϞσϧͷࣗݾڭࢣ෇͖ࣄલֶशͷͨΊͷMasked Feature Prediction (MaskFeat)Λൃද͢ Δɻຊख๏Ͱ͸ɺ·ͣɺೖྗγʔέϯεͷҰ෦ΛϥϯμϜʹϚεΫ͠ɺ࣍ʹɺϚεΫ͞ΕͨྖҬͷಛ௃ Λ༧ଌ͢Δɻ5छྨͷಛ௃ྔΛݕ౼ͨ݁͠Ռɼख࡞Γͷಛ௃ྔهड़ࢠͰ͋ΔHistograms of Oriented GradientsʢHOGʣ͕ɼੑೳͱޮ཰ͷ྆໘Ͱಛʹ༏Ε͍ͯΔ͜ͱ͕Θ͔ͬͨɽ·ͨɺHOGͷہॴతͳί ϯτϥετͷਖ਼نԽ͸ɺྑ޷ͳ݁ՌΛಘΔͨΊʹෆՄܽͰ͋Γɺ͜Ε͸ɺHOGΛࢹ֮ೝࣝʹ༻͍ͨҎલ ͷݚڀͱಉ༷Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻզʑͷΞϓϩʔν͸ɺ๛෋ͳࢹ֮త஌ࣝΛֶश͠ɺେن໛ͳ TransformerϕʔεͷϞσϧΛۦಈ͢Δ͜ͱ͕Ͱ͖·͢ɻϞσϧͷॏΈ΍؂ࢹΛ௥Ճ͢Δ͜ͱͳ͘ɺϥϕ ϧͷͳ͍ϏσΦͰࣄલֶश͞ΕͨMaskFeat͸ɺKinetics-400ͷMViT-LͰ86.7%ɺKinetics-600Ͱ 88.3%ɺKinetics-700Ͱ80.4%ɺAVAͰ38.8mAPɺSSv2Ͱ75.0%ͱ͍͏ɺ͜Ε·Ͱʹͳ͍݁ՌΛୡ੒͠ ͨɻMaskFeat͸͞Βʹɺ1ϑϨʔϜͷಈըͱղऍͰ͖Δը૾ೖྗʹ΋ҰൠԽ͠ɺImageNetͰڝ૪ྗͷ ͋Δ݁ՌΛಘ͍ͯ·͢ɻ w ໨తɿϏσΦΛର৅ʹͨ͠ϚεΩϯάը૾ิ׬ͷֶश w ੒Ռɿࣗݾڭࢣ͖ͭϏσΦֶशํ๏.BTL'FBUͷ։ൃ w ํ๏ɿϚεΫ͞Εͨίϯςϯπͷಛ௃ )0( Λ௚઀ճؼͯ͠ࣄલֶश͢Δ w ݻ༗໊ɿ.BTL'FBU .BTLFE'FBUVSF1SFEJDUJPO w ஶऀॴଐɿ'BDFCPPL"*3FTFBSDI+PIOT)PQLJOT6OJWFSTJUZ http://arxiv.org/abs/2112.09133v1

Slide 41

Slide 41 text

ϚεΫө૾ͷHOG(ޯ഑ͷώετάϥ Ϝ)Λ༧ଌ͢Δ

Slide 42

Slide 42 text

ϐΫηϧ༧ଌ vs HOG༧ଌ ϐΫηϧ༧ଌ͸໛༷ͷ৭ਪଌ΍ෳࡶͳߏ଄ͷ༧ଌ͕೉͍͕͠ɺ HOG͸ಛ௃ͷΈΛѻ͏ͷͰ͏·͘ਪଌ͠΍͍͢

Slide 43

Slide 43 text

ө૾Λ̏࣍ݩ(ॎԣ+࣌ؒ)ʹΩϡʔ ϒঢ়ʹϚεΩϯάֶͯ͠श͢Δ

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

5. ࣗݾೝࣝʹ͸On2 ͷهԱ͕ඞཁͳ͍ (ݪจ: Self-attention Does Not Need On2 Memory) ຊ࿦จͰ͸ɺ഑ྻͷ௕͞ʹରͯ͠O1 ͷϝϞϦΛඞཁͱ͢Δඇৗʹ୯७ͳ஫ҙͷΞϧΰϦζ ϜͱɺOn2ͷϝϞϦΛඞཁͱ͢Δࣗݾ஫ҙ΁ͷ֦ுΛࣔ͠·͢ɻ͜Ε͸ɺࣗݾ஫ҙ͕On2ͷ ϝϞϦΛඞཁͱ͢ΔͱΑ͘ݴΘΕΔͷͱ͸ରরతͰ͢ɻ࣌ؒతͳෳࡶ͞͸ґવͱͯ͠On2Ͱ ͕͢ɺ࠷ۙͷՃ଎ثͰ͸ɺܭࢉೳྗͰ͸ͳ͘σόΠεͷϝϞϦ੍͕ݶཁҼͱͳΔ͜ͱ͕Α͘ ͋Γ·͢ɻͦͷͨΊɺΞςϯγϣϯʹඞཁͳϝϞϦྔΛݮΒ͢͜ͱͰɺଞͷํ๏Ͱ͸࣮ݱͰ ͖ͳ͍Α͏ͳ௕͍γʔέϯεͷॲཧ͕ՄೳʹͳΓ·͢ɻຊݚڀͰ͸ɺΞΫηϥϨʔλ༻ͷ࣮ ༻తͳ࣮૷Λఏڙ͠·͢ɻ͜ͷ࣮૷Ͱ͸ɺO√nͷϝϞϦΛඞཁͱ͠ɺ਺஋తʹ҆ఆ͓ͯ͠ Γɺඪ४తͳΞςϯγϣϯͷ࣮૷ͷϥϯλΠϜͷ਺ύʔηϯτҎ಺ʹऩ·͍ͬͯ·͢ɻ· ͨɺϝϞϦޮ཰Λҡ࣋͠ͳ͕Βؔ਺Λඍ෼͢Δํ๏Λࣔ͠·͢ɻ഑ྻ௕16384ʹରͯ͠ɺࣗ ݾ஫໨ͷϝϞϦΦʔόʔϔου͸ɺਪ࿦Ͱ59ഒɺඍ෼Ͱ32ഒʹ࡟ݮ͞Εͨɻ w ໨తɿ4FMGBUUFOUJPOϞσϧͷলྗԽ w ੒Ռɿ5SBOTGPSNFSͷϝϞϦඞཁྔΛOͷ৐͔Β㲋ʹ࡟ݮ w ํ๏ɿ5SBOTGPSNFSͷϝϞϦίετ͕ߴ͍࣮૷Λ෦෼తʹมߋͨ͠ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ(PPHMF3FTFBSDI http://arxiv.org/abs/2112.05682v2

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

6. Mega-NeRF: Ծ૝ϑϥΠεϧʔͷͨΊͷେن໛NeRFͷεέʔϥϒϧͳߏங (ݪจ: Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs) ຊݚڀͰ͸ɺओʹυϩʔϯσʔλ͔Βऩूͨ͠ɺϏϧ΍֗۠ʹ·͕ͨΔେن໛ͳϏδϡΞϧΩϟϓνϟ͔ΒɺχϡʔϥϧϥδΞ ϯεϑΟʔϧυʢNeRFʣΛ׆༻ͯ͠ɺΠϯλϥΫςΟϒͳ3D؀ڥΛߏங͢Δํ๏Λݕ౼͍ͯ͠·͢ɻैདྷɺNeRF͕ධՁ͞Εͯ ͖ͨ୯Ұ෺ମͷγʔϯͱ͸ରরతʹɺ͜ͷઃఆͰ͸ɺ(1)γʔϯͷখ͞ͳαϒηοτ͔͠ଊ͍͑ͯͳ͍ɺর໌৚݅ͷҟͳΔԿઍ΋ ͷը૾ΛऔΓࠐΉඞཁ͕͋Δɺ(2)୯ҰͷGPUͰૉ๿ʹֶशͰ͖ΔൣғΛ௒͑ͨɺ๏֎ʹߴ͍Ϟσϧ༰ྔͱϨΠαϯϓϦϯάͷ ཁٻ͕͋Δɺ(3)೚ҙͷ਺ͷՄೳͳࢹ఺͕͋ΔͨΊɺ(ϦΞϧλΠϜNeRFϨϯμϥʔ͕௨ৗߦ͏Α͏ʹ)͢΂ͯͷؔ࿈৘ใΛࣄલʹ ܭࢉ͢Δ͜ͱ͸ෆՄೳͰ͋Δɺͱ͍ͬͨෳ਺ͷ՝୊͕͋Γ·͢ɻ͜ΕΒͷ՝୊Λղܾ͢ΔͨΊʹɺ·ͣɺେن໛ͳγʔϯͷՄࢹ ੑ౷ܭΛ෼ੳ͠ɺύϥϝʔλ͕γʔϯͷҟͳΔྖҬʹಛԽ͞Ε͍ͯΔૄͳωοτϫʔΫߏ଄ͷಈػ෇͚Λߦ͍·͢ɻ͞Βʹɺγ ϯϓϧͳزԿֶతΫϥελϦϯάΞϧΰϦζϜΛಋೖ͠ɺτϨʔχϯάը૾ʢͱ͍͏ΑΓ΋ϐΫηϧʣΛɺฒྻʹτϨʔχϯά Ͱ͖ΔҟͳΔNeRFαϒϞδϡʔϧʹ෼ׂ͠·͢ɻQuad 6kσʔληοτɺUrbanScene3Dσʔληοτɺ͓Αͼզʑͷυϩʔ ϯө૾͔Βऔಘͨ͠γʔϯΛର৅ʹɺզʑͷΞϓϩʔνΛධՁͨ͠ͱ͜ΖɺฏۉͰPSNRΛ11ˋҎ্޲্ͤ͞ͳ͕Βɺֶश଎౓ Λ3ഒʹ޲্ͤ͞Δ͜ͱ͕Ͱ͖·ͨ͠ɻଓ͍ͯɺMega-NeRFʹՃ͑ͯ࠷ۙͷNeRFߴ଎Ϩϯμϥʔͷ࣮ূධՁΛߦ͍ɺ࣌ؒతͳ ίώʔϨϯεΛར༻ͨ͠৽͍͠ख๏Λ঺հ͠·͢ɻզʑͷख๏͸ɺPSNR඼࣭Λ0.5dbҎ಺ʹ཈͑ͳ͕ΒɺैདྷͷNeRFϨϯμϦ ϯάʹൺ΂ͯ40ഒͷߴ଎ԽΛୡ੒͠ɺطଘͷߴ଎Ϩϯμϥʔͷ஧࣮౓Λ্ճΔ݁Ռͱͳͬͨɻ w ໨తɿΠϯλϥΫςΟϒͳେن໛/F3'%؀ڥͷߏங w ੒Ռɿ/F3'ͱಉੑೳͰ̏ഒ଎͍େن໛ۭؒ޲͚/F3'Λ։ൃ w ํ๏ɿυϩʔϯө૾σʔληοτ౳Λର৅ʹۭؒ෼ׂͯ͠ฒྻͰ/F3'ֶशͨ͠ w ݻ༗໊ɿ.FHB/F3' w ஶऀॴଐɿ$BSOFHJF.FMMPO6OJWFSTJUZ"SHP"* ΧϦϑΥϧχΞͷࣗಈӡసελʔτΞοϓ http://arxiv.org/abs/2112.10703v1

Slide 48

Slide 48 text

NeRF, NeRF++, MegaNeRFͷ ֶशख๏ͷҧ͍

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

ਫ਼౓ˢɹֶश଎౓ˢˢˢ

Slide 52

Slide 52 text

7. ౷Ұޯ഑ϑϨʔϜϫʔΫʹΑΔSiamese Self-Supervised Learningͷ౳Ձੑͷ୳ٻ (ݪจ: Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework) ࣗݾڭࢣ෇ֶ͖श͸ɺਓؒͷΞϊςʔγϣϯͳ͠Ͱڧྗͳࢹ֮දݱΛநग़͢ΔͨΊͷେ͖ͳՄೳੑΛ͍ࣔͯ͠Δɻࣗ ݾڭࢣ෇ֶ͖शΛ༷ʑͳ؍఺͔Βѻ͏ͨΊʹɺ༷ʑͳ࡞඼͕ఏҊ͞Ε͍ͯΔɻ(1) ରൺֶश๏ʢMoCo, SimCLRͳͲʣ ͸ɺֶशͷํ޲ੑΛܾΊΔͨΊʹਖ਼ෛ྆ํͷαϯϓϧΛར༻͢Δɻ(2) ඇରশωοτϫʔΫ๏ʢBYOL, SimSiamͳ Ͳʣ͸ɺ༧ଌωοτϫʔΫͷಋೖͱఀࢭޯ഑ૢ࡞ʹΑͬͯෛͷαϯϓϧΛऔΓআ͘ɻ(3) ಛ௃૷০๏ʢBarlow Twins, VICRegͳͲʣ͸ɺಛ௃࣍ݩؒͷ৑௕ੑΛݮΒ͢͜ͱΛ໨తͱ͍ͯ͠Δɻ͜ΕΒͷख๏͸ɺ༷ʑͳಈػ͔Βɺઃܭ͞ Εͨଛࣦؔ਺͕͔ͳΓҟͳ͍ͬͯΔΑ͏Ͱ͢ɻ·ͨɺ࠷ऴతͳਫ਼౓΋༷ʑͰɺ࡞඼ʹΑͬͯҟͳΔωοτϫʔΫ΍τ ϦοΫ͕ར༻͞Ε͍ͯ·͢ɻຊݚڀͰ͸ɺ͜ΕΒͷख๏͕ಉ͡ܗࣜʹ౷ҰͰ͖Δ͜ͱΛࣔ͠·͢ɻͦΕͧΕͷଛࣦؔ ਺Λൺֱ͢ΔͷͰ͸ͳ͘ɺޯ഑෼ੳʹΑͬͯ౷Ұ͞ΕͨࣜΛಋ͖ग़͠·͢ɻ͞ΒʹɺެฏͰৄࡉͳ࣮ݧΛߦ͍ɺ྆ऀ ͷੑೳΛൺֱ͠·ͨ͠ɻͦͷ݁Ռɺ͜ΕΒͷख๏ͷؒʹ͸΄ͱΜͲΪϟοϓ͕ͳ͘ɺϞϝϯλϜΤϯίʔμͷ࢖༻͕ ੑೳΛ޲্ͤ͞ΔॏཁͳཁૉͰ͋Δ͜ͱ͕Θ͔ͬͨɻ͜ͷ౷Ұ͞ΕͨϑϨʔϜϫʔΫ͔Βɺզʑ͸UniGradΛఏҊ͠ ·͢ɻUniGrad͸ɺࣗݾڭࢣ෇ֶ͖शͷͨΊͷγϯϓϧͰޮՌతͳޯ഑ܗࣜͰ͢ɻUniGrad͸ɺϝϞϦόϯΫ΍༧ଌ ωοτϫʔΫΛඞཁͱ͠ͳ͍͕ɺ࠷ઌ୺ͷੑೳΛୡ੒͢Δ͜ͱ͕Ͱ͖ɺଞͷֶशઓུΛ༰қʹ࠾༻͢Δ͜ͱ͕Ͱ͖ Δɻ·ͨɺઢܗධՁ΍ଟ͘ͷԼྲྀλεΫͰͷ޿ൣͳ࣮ݧʹΑΓɺͦͷ༗ޮੑ͕ࣔ͞Ε͍ͯ·͢ɻίʔυΛެ։͢Δɻ w ໨తɿࣗݾڭࢣֶ͖ͭशΞϧΰϦζϜͷ౷ҰԽ w ํ๏ɿ֤ࣗݾڭࢣ෇ֶ͖शख๏Λ෼ੳ w ੒Ռɿࣗݾڭࢣֶ͖ͭशͷͨΊͷڞ௨ͷࣜΛಋ͖ग़ͯ͠ɺطଘͱ΄΅ಉ౳ͷਫ਼౓ͩͬͨ w ݻ༗໊ɿ6OJ(SBE w ஶऀॴଐɿਗ਼՚େֶ4FOTF5JNF3FTFBSDIᔳߐେֶ๺ژਓ޻஌ೳݚڀӃ http://arxiv.org/abs/2112.05141v1

Slide 53

Slide 53 text

ओͳҧ͍͸ɺ ɾਖ਼ͷྨࣅ౓ܭࢉ ɾෛͷྨࣅ౓ܭࢉ ɾMPTTܭࢉ ˠ6OJ(SBEͰఆࣜԽͨ͠ɻ

Slide 54

Slide 54 text

֤ख๏ʹ͍ͭͯɺ ੑೳ(Linear Eval)͸΄΅ಉ͡

Slide 55

Slide 55 text

͍ͣΕͷख๏Ͱ΋ Momentum Encoder͕͋Δͱੑೳ͕ +2%޲্ͨ͠

Slide 56

Slide 56 text

౷ҰԽͨ͠ڧΈͰ Data Augmentation ͯ͠ɺ طଘख๏Λ྇կ

Slide 57

Slide 57 text

8. Կஹ΋ͷτʔΫϯ͔Βݕࡧͯ͠ݴޠϞσϧΛվળ͢Δ (ݪจ: Improving language models by retrieving from trillions of tokens) େن໛ίʔύε͔Βݕࡧ͞ΕͨจॻνϟϯΫΛɺઌߦ͢ΔτʔΫϯͱͷہॴతͳྨࣅੑʹج͍ͮ ͯ৚݅෇͚͢Δ͜ͱͰɺࣗಈճؼܕݴޠϞσϧΛڧԽ͢Δɻ2ஹݸͷτʔΫϯσʔλϕʔεΛ༻ ͍ͯɺզʑͷRetrieval-Enhanced Transformer (RETRO)͸ɺ25ഒগͳ͍ύϥϝʔλΛ༻͍͍ͯΔ ʹ΋͔͔ΘΒͣɺthe PileσʔληοτͰGPT-3΍Jurassic-1ͱಉ౳ͷੑೳΛಘΔ͜ͱ͕Ͱ͖ͨɻ RETROͷੑೳ͸ɺඍௐ੔ͷޙɺ࣭໰Ԡ౴ͷΑ͏ͳԼྲྀͷ஌ࣝू໿ܕͷλεΫʹม׵͞Ε·͢ɻ RETRO͸frozen Bert retrieverɺdifferential encoderɺchunked cross-attentionػߏΛ૊Έ߹Θ ͤͯɺֶश࣌ʹ௨ৗফඅ͞ΕΔσʔλΑΓ΋ܻҧ͍ʹଟ͘ͷσʔλʹج͍ͮͯτʔΫϯΛ༧ଌ͠ ·͢ɻRETRO͸௨ৗεΫϥον͔Βֶश͠·͕͢ɺࣄલʹֶशͨ͠ม׵ثΛݕࡧ͠ͳ͕Βਝ଎ʹ RETRO fi t͢Δ͜ͱ΋Ͱ͖ɺྑ޷ͳੑೳΛಘΔ͜ͱ͕Ͱ͖·͢ɻࢲͨͪͷݚڀ͸ɺ͜Ε·Ͱʹͳ͍ ن໛ͷ໌ࣔతͳهԱʹΑͬͯݴޠϞσϧΛվળ͢ΔͨΊͷ৽ͨͳಓΛ։͘΋ͷͰ͢ɻ w ໨తɿ(15ɾ+VSBTTJDͷΑ͏ͳࣗಈճؼܕݴޠϞσϧͷվળ w ੒ՌɿطଘͷࣗવݴޠֶशϞσϧ405"ͱಉੑೳͰഒܰྔͳ3&530Λ࣮૷ w ํ๏ɿGSP[FO#FSUSFUSJFWFSEJ ff FSFOUJBMFODPEFSDIVOLFEDSPTTBUUFOUJPO w ݻ༗໊ɿ3&530 3FUSJFWBM&OIBODFE5SBOTGPSNFS w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.04426v1

Slide 58

Slide 58 text

ࠨ: ύϥϝʔλ਺ͱੑೳ(RETRO:OFF͕ݕࡧ0ͰϕʔεϥΠϯͱಉ౳) தԝ: τʔΫϯݕࡧ਺ͱੑೳ ӈ: ۙ๣୳ࡧ਺ͱੑೳ

Slide 59

Slide 59 text

9. BEVT: BERT Pretraining of Video Transformers (ݪจ: BEVT: BERT Pretraining of Video Transformers) ຊ࿦จ͸ɺϏσΦม׵ثͷBERTࣄલֶशʹ͍ͭͯݚڀ͍ͯ͠·͢ɻ͜Ε͸؆୯ͳ͜ͱͰ͕͢ɺ࠷ۙͷը૾ม׵ͷBERT ࣄલֶशͷ੒ޭΛߟ͑Δͱɺݚڀ͢ΔՁ஋ͷ͋Δ֦ுͰ͢ɻຊ࿦จͰ͸ɺϏσΦදݱֶशΛۭؒදݱֶशͱ࣌ؒతμΠ φϛΫεֶशʹ෼཭͢ΔBEVTΛಋೖ͠·͢ɻ۩ମతʹ͸ɺBEVT͸ɺ·ͣը૾σʔλʹରͯ͠ϚεΩϯά͞Εͨը૾Ϟ σϦϯάΛߦ͍ɺ࣍ʹϏσΦσʔλʹରͯ͠ϚεΩϯά͞ΕͨϏσΦϞσϦϯάͱಉ࣌ʹϚεΩϯά͞Εͨը૾ϞσϦ ϯάΛߦ͍·͢ɻ͜ͷઃܭͷಈػ͸ɺ࣍ͷ2ͭͷ఺ʹ͋Γ·͢ɻ1) ը૾σʔλͰֶश͞Εͨม׵ث͸ɼద੾ͳۭؒϓϦʔ ΞΛఏڙ͠ɼεΫϥονͰֶश͞Εͨ৔߹ʹ͸͠͹͠͹ܭࢉෛՙ͕͔͔ΔϏσΦม׵ثͷֶशΛ༰қʹ͢Δ͜ͱ͕Ͱ͖ Δɽ 2) ਖ਼͍͠༧ଌΛߦ͏ͨΊʹඞཁͳࣝผతͳख͕͔Γɼ͢ͳΘۭͪؒత͓Αͼ࣌ؒతͳ৘ใ͸ɼΫϥε಺͓ΑͼΫϥ εؒͷมಈ͕େ͖͍ͨΊɼҟͳΔϏσΦؒͰมԽ͢ΔɽBEVT͕ඇৗʹ༗๬ͳ݁ՌΛಘͨ3ͭͷνϟϨϯδϯάͳϏσΦ ϕϯνϚʔΫͰɺ޿ൣғͳ࣮ݧΛߦ͍·ͨ͠ɻKinetics 400Ͱ͸ɺೝࣝ͸ओʹࣝผతͳۭؒදݱʹґଘ͓ͯ͠ΓɺBEVT ͸ڧྗͳڭࢣ෇͖ϕʔεϥΠϯͱಉ౳ͷ݁ՌΛୡ੒͠·ͨ͠ɻ·ͨɺSomething-Something-V2ͱDiving 48Ͱ͸ɺ࣌ؒ తͳμΠφϛΫεʹґଘ͢ΔϏσΦΛର৅ͱ͍ͯ͠·͕͢ɺBEVT͸ଞͷ͢΂ͯͷϕʔεϥΠϯΑΓ΋໌Β͔ʹ༏Ε͓ͯ ΓɺͦΕͧΕ70.6%ͱ86.7%ͷτοϓ1ਫ਼౓ͱ͍͏࠷ઌ୺ͷੑೳΛୡ੒͠·ͨ͠ɻ w ໨తɿ#&35ͰϏσΦม׵͢Δݚڀ w ੒Ռɿ4PNFUIJOH4PNFUIJOH7ͱ%JWJOHͰτοϓਫ਼౓Λୡ੒ w ํ๏ɿը૾ϚεΩϯά#&35ϞσϧͱϏσΦϚεΩϯά#&35ϞσϧΛڠௐֶͤͯ͞श w ݻ༗໊ɿ#&75 w ஶऀॴଐɿ෮୴େֶίϯ ピ ϡʔλαΠΤϯεֶ෦্ւ஌ೳ৘ใॲཧΩʔϥϘ.JDSPTPGU$MPVE "* http://arxiv.org/abs/2112.01529v1

Slide 60

Slide 60 text

Ϟσϧߏ଄ͷ֓೦ਤ

Slide 61

Slide 61 text

BEVTϑϨʔϜϫʔΫ

Slide 62

Slide 62 text

Something-Something-V2ͱDiving 48ͰTOP1

Slide 63

Slide 63 text

…ͱࢥͬͨΒɺSomething-Something-V2ͷ݁ՌͰɺ 
 BEVT ͸ MViT-L + MaskFeat ʹ͸ෛ͚͍ͯͨɻ 


Slide 64

Slide 64 text

10. SLIP: Self-supervision meets Language-Image Pre-training (ݪจ: SLIP: Self-supervision meets Language-Image Pre-training) ࠷ۙͷݚڀͰ͸ɺ೉қ౓ͷߴ͍ࢹ֮ೝࣝλεΫʹ͓͍ͯɺࣗݾڭࢣ෇͖ͷࣄલֶश͕ڭࢣ෇ֶ͖शΑΓ ΋༏Ε͍ͯΔ͜ͱ͕ࣔ͞Ε͍ͯ·͢ɻ·ͨɺݴޠ؂ಜΛ༻ֶ͍ͨशͷ৽͍͠ΞϓϩʔνͰ͋ΔCLIP͸ɺ ༷ʑͳϕϯνϚʔΫͰ༗๬ͳੑೳΛ͍ࣔͯ͠·͢ɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖श͕ɺࢹ֮දݱͷֶश ʹ͓͚Δݴޠ؂ಜͷར༻ʹ໾ཱ͔ͭͲ͏͔Λݕ౼͢ΔɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱCLIPʹΑΔࣄ લֶशΛ૊Έ߹ΘͤͨϚϧνλεΫֶशϑϨʔϜϫʔΫͰ͋ΔSLIPΛ঺հ͢ΔɻVision TransformersΛ ༻͍ͯࣄલֶशΛߦͬͨޙɺදݱ඼࣭ΛపఈతʹධՁ͠ɺθϩγϣοτసૹɺઢܗ෼ྨɺΤϯυπʔΤ ϯυͷඍௐ੔ͱ͍͏3ͭͷҟͳΔઃఆͷԼͰɺCLIPͱࣗݾڭࢣ෇ֶ͖शͷ྆ํͱੑೳΛൺֱ͠·͢ɻ ImageNet͓Αͼͦͷଞͷσʔληοτʹ͓͍ͯɺSLIP͸ਫ਼౓Λେ෯ʹ޲্ͤ͞Δ͜ͱ͕Θ͔Γ·͠ ͨɻ͞ΒʹɼϞσϧαΠζɼֶशεέδϡʔϧɼࣄલֶशσʔληοτΛม࣮͑ͯݧΛߦ͍ɼ͜ͷ݁Ռ Λݕূ͠·ͨ͠ɽͦͷ݁ՌɼSLIP͸ɼࣗݾεʔύʔϏδϣϯʢઢܗਫ਼౓8.1%૿ʣͱݴޠεʔύʔϏδϣ ϯʢθϩγϣοτਫ਼౓5.2%૿ʣͷ྆ํͷར఺ΛڗडͰ͖Δ͜ͱ͕Θ͔ͬͨɽ w ໨తɿը૾ͷࣗݾڭࢣֶ͖ͭश͕$-*1Ͱͷݴޠͷࣗݾڭࢣֶ͖ͭशʹ΋ԸܙΛ༩͑Δ͔ௐࠪ w ํ๏ɿ$-*1ࣗݾڭࢣ෇ֶ͖श 4JN$-3 w ੒Ռθϩγϣοτਫ਼౓͕޲্ͨ͠ w ݻ༗໊ɿ4-*1 w ஶऀॴଐɿ6$#FSLFMFZ 'BDFCPPL"*3FTFBSDI '"*3 http://arxiv.org/abs/2112.12750v1

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Top hype: Best10

Slide 69

Slide 69 text

1. ϚϧνΫϥε෼ྨΛ༻͍ͨ๏ྫʹ͓͚Δॏཁจͷࣝผ (ݪจ: Critical Sentence Identi fi cation in Legal Cases Using Multi-Class Classi fi cation) ๏཯෼໺Ͱ͸ɺςΩετܗࣜͷ๲େͳσʔλؚ͕·Ε͍ͯ·͢ɻͦͷͨΊɺ ͜ͷ෼໺ͷ෼ੳχʔζʹԠ͑ΔͨΊʹ͸ɺࣗવݴޠॲཧʢNatural Language Processing: NLPʣͷద༻͕ඞཁͱͳΓ·͢ɻNLPͷਐา͸ɺ࣮༻Խ΍ֶज़ݚ ڀͷܗͰɺ๏཯෼໺Λ͸͡Ίͱ͢Δ༷ʑͳྖҬʹ޿͕͍ͬͯ·͢ɻ๏཯ͷઐ ໳Ոʹͱͬͯɺૌুʹ͓͚Δॏཁͳจষɺࣄ࣮ɺٞ࿦Λಛఆ͢Δ͜ͱ͸ɺୀ ۶ͳ࡞ۀͰ͢ɻຊݚڀͰ͸ɺϚϧνΫϥε෼ྨͷͨΊͷจຒΊࠐΈͷར༻Λ ݕ౼͠ɺૌুࣄ݅ͷओཁͳ౰ࣄऀͷ؍఺͔Βɺૌুࣄ݅ʹ͓͚ΔॏཁͳจΛ ಛఆ͢Δɻ·ͨɺΧςΰϦʔผͷΫϩεΤϯτϩϐʔଛࣦΛར༻͢Δ͜ͱ Ͱɺਫ਼౓Λ޲্ͤ͞ΔͨΊʹɺλεΫݻ༗ͷଛࣦؔ਺Λఆٛ͠·͢ɻ w ໨తɿ๏཯ͷઐ໳Ոͷ๏ྫ෼ੳࢧԉ w ੒Ռɿ๏ྫͷॏཁจΛϚϧνΫϥε෼ྨ͢ΔϞσϧͷ࣮૷ w ํ๏ɿ#&35ಠࣗଛࣦؔ਺ఆٛ w ݻ༗໊ɿ4FNBOUJD4JNJMBSJUZ4DPSF454 w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB ϞϥτΡϫେֶ http://arxiv.org/abs/2111.05721v2

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

2. γϯϋϥޠͷηϯνϝϯτΛٻΊͯγϯϋϥޠͷ౤ߘʹର͢ΔFacebookͷ൓ԠΛ༧ଌ͢Δ (ݪจ: Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts) FacebookͷωοτϫʔΫͰ͸ɺϢʔβʔ͕ςΩετʹର͢Δ൓ԠΛɺײ৘ͷྨܕԽʹΑͬͯه࿥͢Δ ͜ͱ͕Ͱ͖·͢ɻ͜ͷωοτϫʔΫ͸ɺେن໛Ͱ͋ΔͨΊɺ஫ऍ෇͖ͷηϯνϝϯτσʔλͷओཁͳ σʔληοτͱͳ͍ͬͯ·͢ɻຊ࿦จͰ͸ɺεϦϥϯΧͷจ຺Λத৺ͱͨ͠10೥෼ͷFacebookͷ౤ ߘσʔλ͔ΒಘΒΕͨ਺ඦສͷ൓ԠΛ༻͍ͯɺΦϯϥΠϯͷγϯϋϥޠςΩετίϯςϯπͷηϯν ϝϯτݕग़ʹର͢ΔʮݟΔਓͷ໨ʯͷΞϓϩʔνΛϞσϧԽ͢Δɻ3छྨͷηϯνϝϯτ෼ੳϞσϧ ͕ߏங͞Ε͓ͯΓɺϦΞΫγϣϯͷݶఆ͞Εͨαϒηοτɺ͢΂ͯͷϦΞΫγϣϯɺϙδςΟϒ/ωΨ ςΟϒͷ੕ධՁ஋Λಋ͖ग़͢Ϟσϧ͕ߟྀ͞Ε͍ͯ·͢ɻͦͯ͠ɺ͜ΕΒͷϞσϧ͕؍࡯ऀͷ൓ԠΛ ଊ͑Δͷʹ༗ޮͰ͋Δ͔Ͳ͏͔Λܭࢉ͠ɺٞ࿦ͨ͠ɻ෼ੳͷ݁ՌɺγϯϋϥޠͷίϯςϯπͰ͸ɺϦ ΞΫγϣϯͷೋ஋෼ྨ͕ଞͷΞϓϩʔνʹൺ΂ͯஶ͘͠ਖ਼֬Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻ͞ΒʹɺࣅͨΑ ͏ͳϦΞΫγϣϯΛؚΊΔͱɺଞͷϦΞΫγϣϯΛਖ਼֬ʹ༧ଌ͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΓ·͢ɻ w ໨తɿࢿݯ͕ශࠔͳ஍ҬݴޠɺγϯϋϥޠͷࢿݯΪϟοϓΛຒΊΔͨΊͷ࣮ݧతͳࢼΈ w ੒Ռɿγϯϋϥޠͷײ৘༧ଌϞσϧͷ࣮૷ w ํ๏ɿγϯϋϥޠͷ'BDFCPPL౤ߘͷੜͷίʔύεΛֶशσʔλͱͯ͠࢖͏ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB-*3/&BTJB http://arxiv.org/abs/2112.00468v1

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

3. Plenoxels:χϡʔϥϧωοτϫʔΫͷͳ͍ϥσΟΞϯεϑΟʔϧυ (ݪจ: Plenoxels: Radiance Fields without Neural Networks) ॏෳ http://arxiv.org/abs/2112.05131v1

Slide 74

Slide 74 text

4. ͋ͳͨͷ࡞඼Λݟ͍ͤͯͩ͘͞ɻεΫϥονύουʹΑΔݴޠϞσϧΛ࢖ͬͨதڃऀ޲͚ܭࢉػ (ݪจ: Show Your Work: Scratchpads for Intermediate Computation with Language Models) ֶशࡁΈͷେن໛ͳݴޠϞσϧ͸ɺݱ࣮తͳςΩετͷੜ੒΍ίϯϐϡʔλϓϩά ϥϜͷ߹੒ͳͲɺʮϫϯύεʯͰ࣮ߦͰ͖ΔλεΫͰ͸ඇৗʹߴ͍ੑೳΛൃش͢ Δɻ͔͠͠ɺ੔਺ͷ଍͠ࢉ΍ϓϩάϥϜͷ࣮ߦͳͲɺແݶͷଟஈ֊ܭࢉΛඞཁͱ͢ ΔλεΫͰ͸ۤઓ͠·͢ɻڻ͘΂͖͜ͱʹɺ͜ΕΒͷϞσϧ͸ɺෳࡶͳଟஈ֊ܭࢉ Λɺͨͱ͑਺γϣοτͷྖҬͰ͋ͬͯ΋ɺ్தͷܭࢉ݁ՌΛࣔ͠ͳ͕Βʮεςο ϓɾόΠɾεςοϓʯͰ࣮ߦ͢ΔΑ͏ʹٻΊΒΕΔͱɺ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ ಛʹɺதؒతͳܭࢉεςοϓΛʮεΫϥονύουʯʹग़ྗ͢ΔΑ͏ʹࢦࣔ͢Δ͜ ͱͰɺଟஈ֊ͷܭࢉΛ࣮ߦͰ͖ΔΑ͏ʹτϥϯεϑΥʔϚʔΛ܇࿅͠·͢ɻ௕͍଍͠ ࢉ͔Β೚ҙͷϓϩάϥϜͷ࣮ߦ·ͰɺঃʑʹෳࡶʹͳΔҰ࿈ͷλεΫʹ͓͍ͯɺεΫ ϥονύου͕ݴޠϞσϧͷଟஈ֊ܭࢉͷೳྗΛܶతʹ޲্ͤ͞Δ͜ͱΛࣔͨ͠ɻ w ໨తɿϓϩάϥϛϯάݴޠͳͲͷܭࢉॲཧͷλεΫֶशվળ w ํ๏ɿεΫϥονύουʹεςοϓͣͭதؒͷ݁ՌΛࣔͯ͠5SBOTGPSNFSΛֶशͤ͞Δ w ੒ՌɿҰ౓ʹ݁ՌΛਪଌ͠Α͏ͱࣦͯ͠ഊ͢ΔιʔείʔυͰ΋εςοϓόΠεςοϓͳΒਖ਼ ࣮͘͠ߦͰ͖Δ͜ͱ͕Θ͔ͬͨ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ.*5(PPHMF3FTFBSDI #SBJO5FBN#MVFTIJGU5FBN http://arxiv.org/abs/2112.00114v1

Slide 75

Slide 75 text

μΠϨΫτֶशͱεςοϓֶशͷྫ

Slide 76

Slide 76 text

ྫ: ࢉ਺ͷܭࢉࣜ(଍͠ࢉ) • ܻ͝ͱʹ଍ͨ݁͠ՌΛࣔ͢(C͸܁Γ্͕Γ)

Slide 77

Slide 77 text

ྫ: ࢉ਺ͷܭࢉࣜ(ଟ߲ࣜͷܭࢉ) • ߲͝ͱʹxΛ୅ೖͯ͠ܭࢉͨ݁͠ՌΛࣔ͢ • μΠϨΫτʹൺ΂ͯϑΝΠϯνϡʔχϯά݁ Ռ͕ 31.8 -> 50.7%ʹ޲্

Slide 78

Slide 78 text

ྫ: Python ϓϩάϥϜ • μΠϨΫτʹൺ΂ͯϑΝΠϯνϡʔχϯά݁ Ռ͕ 20% -> 41.5%ʹ޲্

Slide 79

Slide 79 text

5. BANMo: ଟ͘ͷΧδϡΞϧϏσΦ͔ΒΞχϝʔγϣϯՄೳͳ3DਆܦϞσϧΛߏங͢Δ (ݪจ: BANMo: Building Animatable 3D Neural Models from Many Casual Videos) ଟؔઅܕͷ3࣍ݩܗঢ়෮ݩͷͨΊͷઌߦݚڀ͸ɺଟ͘ͷ৔߹ɺಛघͳηϯαʔʢྫɿಉظͨ͠ϚϧνΧϝϥγεςϜʣ΍ɺ͋Β ͔͡Ίߏங͞Εͨ3࣍ݩมܗϞσϧʢྫɿSMAL΍SMPLʣʹґଘ͍ͯ͠·͢ɻ͜ͷΑ͏ͳख๏͸ɺࣗવքͷଟ༷ͳ෺ମͷηοτ ʹରԠ͢Δ͜ͱ͕Ͱ͖·ͤΜɻBANMo͸ɺಛघͳηϯαʔ΍ࣄલʹఆٛ͞ΕͨςϯϓϨʔτܗঢ়Λඞཁͱ͠ͳ͍ํ๏Ͱ͢ɻ BANMo͸ɺඍ෼ՄೳͳϨϯμϦϯάϑϨʔϜϫʔΫΛ༻͍ͯɺଟ͘ͷ୯؟ΧδϡΞϧϏσΦ͔Βɺߴ஧࣮౓Ͱؔઅͷ͋Δ3DϞ σϧʢܗঢ়ͱΞχϝʔγϣϯՄೳͳεΩχϯά΢ΣΠτΛؚΉʣΛߏங͠·͢ɻଟ͘ͷϏσΦΛ࢖༻͢Δ͜ͱͰɺΧϝϥϏϡʔ ͱΦϒδΣΫτͷΞʔςΟΩϡϨʔγϣϯΛΑΓଟ͘Χόʔ͢Δ͜ͱ͕Ͱ͖·͕͢ɺഎܠ΍র໌৚݅ͳͲ͕ҟͳΔγʔϯؒͷର Ԡؔ܎Λཱ֬͢Δ͜ͱʹେ͖ͳ՝୊͕͋Γ·͢ɻզʑͷॏཁͳಎ࡯͸ɺʢ1ʣؔઅࠎͱϒϨϯυεΩχϯάΛར༻ͨ͠ݹయతͳ มܗՄೳͳܗঢ়Ϟσϧɺʢ2ʣޯ഑ϕʔεͷ࠷దԽʹదͨ͠ମੵχϡʔϥϧϥδΞϯεϑΟʔϧυʢNeRFʣɺʢ3ʣϐΫηϧͱ ؔઅϞσϧͷؒͷରԠؔ܎Λੜ੒͢Δਖ਼४ຒΊࠐΈɺͱ͍͏3ͭͷྲّྀΛ౷߹͢Δ͜ͱͰ͢ɻຊݚڀͰ͸ɺඍ෼Մೳ͓Αͼ൓స ՄೳͳؔઅมܗΛՄೳʹ͢ΔχϡʔϥϧϒϨϯυεΩχϯάϞσϧΛಋೖͨ͠ɻ͜ͷΑ͏ͳϞσϧΛਖ਼४ຒΊࠐΈͱ૊Έ߹Θͤ Δ͜ͱͰɺϏσΦؒͷີͳରԠؔ܎Λཱ֬͢Δ͜ͱ͕Ͱ͖ɺαΠΫϧҰ؏ੑΛ࣋ͬͨࣗݾڭࢣԽ͕ՄೳͱͳΔɻBANMo͸ɺ࣮ σʔλ͓Αͼ߹੒σʔλʹ͓͍ͯɺਓؒ΍ಈ෺Λର৅ͱͨ͠ઌߦݚڀΑΓ΋ߴ͍஧࣮౓ͷ3D࠶ߏ੒Λࣔ͠ɺ৽͍͠ࢹ఺΍ϙʔζ ͔ΒϦΞϧͳը૾ΛϨϯμϦϯά͢ΔೳྗΛඋ͍͑ͯ·͢ɻϓϩδΣΫτͷ΢Σϒϖʔδɿbanmo-www.github.io w ໨తɿҰൠతͳಈը͔Βɺ̏%ը૾σʔλΛ࡞Δ w ੒Ռɿਓؒͱ࢛଍าߦಈ෺ʢೣͳͲʣͷ̏࣍ݩը૾σʔλΛ࡞ΔϞσϧΛ։ൃͨ͠ w ํ๏ɿಉҰͷਓ෺ɾಈ෺ʹ͍ͭͯͷෳ਺ಈը͔Βͷ%FOTF1PTF$4&ֶश w ݻ༗໊ɿ#"/.P w ஶऀॴଐɿ.FUB"*$BSOFHJF.FMMPO6OJWFSTJUZ.FUB3FBMJUZ-BCT http://arxiv.org/abs/2112.12761v2

Slide 80

Slide 80 text

ಛघͳΧϝϥΛ࢖Θͣɺී௨ͷϏσΦө૾͔ ΒΞχϝʔγϣϯՄೳͳ̏࣍ݩσʔλΛ࡞Δ

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

ϏσΦͷϑϨʔϜ(ίϚ)Λֶश͢Δ΄ Ͳɺ̏࣍ݩϞσϧ͕ਖ਼֬ʹͳ͍ͬͯ͘

Slide 84

Slide 84 text

6. GLIDE:ςΩετ༠ಋܕ֦ࢄϞσϧʹΑΔϑΥτϦΞϦεςΟοΫͳը૾ੜ੒ɾฤूΛ໨ࢦͯ͠ (ݪจ: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models) ॏෳ http://arxiv.org/abs/2112.10741v2

Slide 85

Slide 85 text

7. δΦϝτϦΛߟྀͨ͠ޮ཰తͳ3࣍ݩੜ੒Adversarial Networks (ݪจ: Ef fi cient Geometry-aware 3D Generative Adversarial Networks) PickUp http://arxiv.org/abs/2112.07945v1

Slide 86

Slide 86 text

8. ήʔϜͷϓϨΠϠʔ (ݪจ: Player of Games) ήʔϜ͸௕͍ؒɺਓ޻஌ೳͷਐาͷࢦඪͱͯ͠༻͍ΒΕ͖ͯ·ͨ͠ɻ࠷ۙͰ͸ɺ୳ࡧͱֶशΛ༻͍ͨ Ξϓϩʔν͕ɺҰ࿈ͷ׬શ৘ใήʔϜʹ͓͍ͯڧྗͳੑೳΛ͓ࣔͯ͠ΓɺήʔϜཧ࿦తͳਪ࿦ͱֶश Λ༻͍ͨΞϓϩʔν͸ɺಛఆͷෆ׬શ৘ใϙʔΧʔͷมछʹ͓͍ͯڧྗͳੑೳΛ͍ࣔͯ͠ΔɻPlayer of Games͸ɺΨΠυ෇͖୳ࡧɺࣗݾֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨɺ͜Ε·ͰͷΞϓϩʔν Λ౷߹ͨ͠൚༻తͳΞϧΰϦζϜΛ঺հ͢ΔɻPlayer of Gamesʯ͸ɺେن໛ͳ׬શɾෆ׬શ৘ใήʔ Ϝʹ͓͍ͯɺܦݧతʹڧྗͳύϑΥʔϚϯεΛୡ੒ͨ͠ॳΊͯͷΞϧΰϦζϜͰ͋Γɺ೚ҙͷ؀ڥʹ ରԠ͢ΔਅͷҙຯͰͷ൚༻ΞϧΰϦζϜʹ޲͚ͨॏཁͳҰาͱͳΔɻ͜ͷΞϧΰϦζϜ͸ɺ೚ҙͷ؀ ڥʹରͯ͠ਅʹҰൠతͳΞϧΰϦζϜΛఏڙ͢ΔͨΊͷॏཁͳεςοϓͰ͋ΔɻPlayer of Games͸ɺ νΣεͱғޟͰڧྗͳੑೳΛൃش͠ɺϔουΞοϓɾϊʔϦϛοτɾςΩαεɾϗʔϧσϜɾϙʔΧʔ Ͱެ։͞Ε͍ͯΔ࠷ڧͷΤʔδΣϯτʢSlumbotʣΛഁΓɺΨΠυ෇͖୳ࡧɺֶशɺήʔϜཧ࿦తਪ ࿦ͷՁ஋Λࣔ͢ෆ׬શ৘ใήʔϜͰ͋ΔScotland Yardͷ࠷ઌ୺ͷΤʔδΣϯτΛഁͬͨɻ w ໨తɿ׬શɾෆ׬શ৘ใήʔϜʹద༻Ͱ͖Δ൚༻ήʔϜΞϧΰϦζϜͷݚڀ w ੒Ռɿ୳ࡧɺֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨ౷ҰΞϧΰϦζϜʮ1P(ʯͷ։ൃ w ํ๏ɿ(5$'3 HSPXJOHUSFFDPVOUFSGBDUVBMSFHSFUNJOJNJ[BUJPO α΢ϯυηϧϑϓϨΠ w ݻ༗໊ɿ1P( 1MBZFSPG(BNFT w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.03178v1

Slide 87

Slide 87 text

GT-CFR: growing-tree counterfactual regret minimization ੒௕໦ ൓࣮త ޙչ ࠷খԽ๏

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

9. NL-Augmenter:λεΫʹԠͨࣗ͡વݴޠ֦ுͷͨΊͷϑϨʔϜϫʔΫ (ݪจ: NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation) σʔλΦʔάϝϯςʔγϣϯ͸ɺࣗવݴޠॲཧ(NLP)ʹ͓͚ΔϞσϧͷϩόετੑධՁ΍ɺ ֶशσʔλͷଟ༷ੑΛߴΊΔͨΊͷॏཁͳཁૉͰ͋Δɻຊ࿦จͰ͸ɺPythonϕʔεͷࢀՃ ܕࣗવݴޠॲཧϑϨʔϜϫʔΫͰ͋ΔNL-AugmenterΛ঺հ͠·͢ɻ͜ͷϑϨʔϜϫʔΫ ͸ɺม׵ʢσʔλͷमਖ਼ʣͱϑΟϧλʢಛఆͷಛ௃ʹԠͨ͡σʔλͷ෼ׂʣͷ྆ํͷ࡞੒ Λαϙʔτ͠·͢ɻ͜ͷϑϨʔϜϫʔΫͱɺ༷ʑͳࣗવݴޠλεΫͷͨΊͷ117ͷม׵ͱ 23ͷϑΟϧλͷॳظηοτʹ͍ͭͯઆ໌͢Δɻ·ͨɺ͍͔ͭ͘ͷม׵Λ༻͍ͯҰൠతͳࣗ વݴޠϞσϧͷϩόετੑΛ෼ੳ͢Δ͜ͱͰɺNL-Augmenterͷ༗ޮੑΛ࣮ূ͢ΔɻΠϯ ϑϥετϥΫνϟʔɺσʔλΧʔυɺϩόετωε෼ੳ݁Ռ͸ɺNL-AugmenterͷϦϙδ τϦ https://github.com/GEM-benchmark/NL-Augmenter Ͱެ։͞Ε͍ͯ·͢ɻ w ໨తɿࣗવݴޠॲཧͷϩόετੑධՁɾσʔλͷଟ༷ੑΛߴΊΔ w ੒ՌɿࢀՃܕࣗવݴޠॲཧ"VHVNFOUBUJPOϑϨʔϜϫʔΫͷެ։ w ํ๏ɿλεΫʹԠͨ͡ม׵ॲཧ܈ͱɺಛ௃ʹσʔλ෼ׂͷͨΊͷϑΟϧλ܈ͷఏڙ w ݻ༗໊ɿ/-"VHVNFOUFS w ஶऀॴଐɿ(PPHMF#SBJO(PPHMF3FTFBSDIଞଟ਺ ౦ژେֶͳͲ http://arxiv.org/abs/2112.02721v1

Slide 90

Slide 90 text

ྫ: John likes expensive Italian pizzas ͷ Augmentation.

Slide 91

Slide 91 text

ม׵ॲཧ(117छྨҎ্)

Slide 92

Slide 92 text

ϑΟϧλॲཧ(23छྨҎ্)

Slide 93

Slide 93 text

IUUQTHFNCFODINBSLDPN

Slide 94

Slide 94 text

10. FuseDream:CLIP+GANۭؒͷ࠷దԽʹΑΔֶशෆཁͷςΩετը૾ੜ੒γεςϜ (ݪจ: FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization) ࣗવݴޠʹΑΔ໋ྩ͔Βը૾Λੜ੒͢Δ͜ͱ͸ɺڵຯਂ͘΋ඇৗʹࠔ೉ͳ՝୊Ͱ͢ɻզʑ͸ɺ࠶ֶश͞ΕͨCLIPදݱͷྗͱط੡ͷը૾ ੜ੒ثʢGANʣΛ૊Έ߹ΘͤΔ͜ͱͰɺςΩετ͔Βը૾΁ͷੜ੒ʹΞϓϩʔν͍ͯ͠·͢ɻGANͷજࡏۭؒͰ࠷దԽΛߦ͍ɺ༩͑ ΒΕͨೖྗςΩετͰ࠷େͷCLIPείΞΛୡ੒͢Δը૾Λݟ͚ͭग़͠·͢ɻςΩετ͔Βը૾΁ͷੜ੒ϞσϧΛθϩ͔Βֶश͢Δैདྷ ͷख๏ͱൺֱͯ͠ɺCLIP+GANͷΞϓϩʔν͸ɺֶशෆཁɺθϩγϣοτͰɺҟͳΔδΣωϨʔλͰ؆୯ʹΧελϚΠζ͢Δ͜ͱ͕Ͱ ͖·͢ɻ ͔͠͠ɺGANۭؒͰCLIPείΞΛ࠷దԽ͢Δ͜ͱ͸ඇৗʹࠔ೉ͳ࠷దԽ໰୊Λ౤͔͚͓͛ͯΓɺAdamͳͲͷط੡ͷΦϓ ςΟϚΠβʔͰ͸ຬ଍ͷ͍݁͘ՌΛಘΔ͜ͱ͕Ͱ͖ͳ͍ɻຊݚڀͰ͸ɺFuseDreamύΠϓϥΠϯΛఏҊ͠ɺCLIP+GANΞϓϩʔνΛ3 ͭͷॏཁͳٕज़Ͱվળ͠·͢ɻ1ʣAugCLIPείΞɿը૾ʹϥϯμϜͳ֦ுΛՃ͑Δ͜ͱͰɺCLIPͷ໨తΛϩόετԽ͢Δɻ2) ࠷దԽ ͷͨΊͷ৽͍͠ॳظԽ͓ΑͼΦʔόʔύϥϝʔλԽઓུʹΑΓɺGANۭؒʹ͓͚Δඇತͷ஍ܗΛޮ཰తʹφϏήʔτ͢Δ͜ͱ͕Ͱ͖ Δɻ3) ৽نͷೋஈ֊࠷దԽํࣜΛར༻ͯ͠ɺෳ਺ͷը૾Λ߹੒͠ɺGANۭؒΛ֦ுͯ͠σʔλόΠΞεΛࠀ෰͢Δ߹੒ੜ੒ٕज़ɻ FuseDream͸ɺҟͳΔೖྗςΩετʹΑͬͯଅਐ͞Εͨ৔߹ɺ༷ʑͳΦϒδΣΫτɺഎܠɺܳज़తελΠϧɺ͞Βʹ͸զʑ͕࢖༻͢Δ GANͷτϨʔχϯάσʔλʹ͸ݱΕͳ͍৽͍͠൓࣮ࡏͷίϯηϓτΛ࣋ͭߴ඼࣭ͷը૾Λੜ੒͢Δ͜ͱ͕Ͱ͖Δɻఆྔతʹ͸ɺ FuseDreamʹΑͬͯੜ੒͞Εͨը૾͸ɺΞʔΩςΫνϟͷઃܭ΍τϨʔχϯάΛ௥Ճ͢Δ͜ͱͳ͘ɺMS COCOσʔληοτͰτοϓ ϨϕϧͷInceptionείΞͱFIDείΞΛಘΔ͜ͱ͕Ͱ͖·͢ɻզʑͷίʔυ͸ https://github.com/gnobitab/FuseDream Ͱެ։͞Ε ͍ͯ·͢ɻ w ໨తɿࣗવݴޠ͔Βͷը૾ੜ੒Ϟσϧ$-*1("/ͷվྑ w ੒Ռɿτοϓ.4$0$0σʔληοτͰτοϓϨϕϧͷ*ODFQUJPO '*%είΞΛಘͨ w ํ๏ɿ"VH$-*1είΞɺΦʔόʔύϥϝʔλԽɺೋஈ֊࠷దԽΛ௥Ճ w ݻ༗໊ɿ'VTF%SFBN w ஶऀॴଐɿςΩαεେֶΦʔεςΟϯߍΧϦϑΥϧχΞେֶαϯσΟΤΰߍ http://arxiv.org/abs/2112.01573v1

Slide 95

Slide 95 text

CLIP+GAN ΍ BigSleepͱͷൺֱ

Slide 96

Slide 96 text

No content

Slide 97

Slide 97 text

DeepL Translator (deepl.com) https://www.deepl.com/en/translator