Slide 1

Slide 1 text

ૣҴాେֶ ৿ౡൟੜݚڀࣨ D3 ߝౡलथ StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis ΘΓ͔͠పఈղઆ

Slide 2

Slide 2 text

ࣗݾ঺հ 2 ߝౡ लथʢtwitterɿ@maguroIslandʣ uॴଐ ૣҴాେֶ ത࢜3೥ʵ৿ౡൟੜݚڀࣨ uݚڀςʔϚ म࢜՝ఔɿਂ૚ը૾ੜ੒Ϟσϧͷܭࢉྔɾύϥϝʔλ࡟ݮ ത࢜՝ఔ1೥ɿෳ਺෺ମΛର৅ͱͨ͠ڭࢣແ͠લܠഎܠ෼ղ ത࢜՝ఔ2೥〜ɿEmbodied AIؔ࿈ ࢈૯ݚʢݩʣɿ෰ͱਓͷϖΞσʔλΛඞཁͱ͠ͳ͍Ծ૝ࢼண

Slide 3

Slide 3 text

͸͡Ίʹ 3 ຊࢿྉͷਤ͸࿦จͳͲ͔ΒҾ༻͢Δ৔߹ʹ͸ɺ֘౰ϖʔδͷҾ༻จݙͷ΋ͷΛར༻͍ͯ͠·͢ ·ͨɺҰ෦ஶऀʢߝౡʣͷཧղ͕ո͍͠ͱ͜Ζ͕͋Γɺؒҧ͍΋ؚ·Ε͍ͯΔՄೳੑ͕͋Δ͜ͱΛ ྃ͝ঝ͍ͩ͘͞ʢగਖ਼఺͕͋Ε͹࿈བྷΛ͓ئ͍͍ͨ͠·͢ʣ ͔ͳΓͷྔͷ਺͕ࣜొ৔͠·͢ ՄೳͳݶΓஸೡͳઆ໌Λ৺͕͚·͕͢ɺΘ͔Γʹ͍͘ͱ͜Ζ͕͋Ε͹ɺίϝϯτ΍࣭໰ͳͲΛ ͍͚ͨͩ·͢ͱ޾͍Ͱ͢

Slide 4

Slide 4 text

Contents l TL;DR l എܠɺ໨త l ख๏ l ݁Ռ l ల๬ 4

Slide 5

Slide 5 text

TL;DRʢ͋Β·͠ʣ 5 ߴղ૾౓ը૾ੜ੒͕Ͱ͖ΔNeRF + GAN ༷ʑͳ஌ݟΛҰͭͷ࿦จͰ֫ಘͰ͖ΔཻͰ/౓ඒຯ͍͠࿦จ ʻ/F3' ("/ͷख๏ʹ͓͚Δ໰୊఺ʼ l ߴղ૾౓ը૾ੜ੒ͷࡍʹܭࢉίετ͕ංେԽ͕ͪ͠ ʻ͜ͷ࿦จͷ؊ʼ l ΞʔςΟϑΝΫτ͕ग़ͳ͍VQTBNQMFS l ܭࢉίετΛܰ͘͢ΔͨΊͷৠཹ l 3FWJTJUJOH 1SPHSFTTJWF (SPXJOH

Slide 6

Slide 6 text

TL;DRʢ͋Β·͠ʣ 6 ߴղ૾౓ը૾ੜ੒͕Ͱ͖ΔNeRF + GAN ༷ʑͳ஌ݟΛҰͭͷ࿦จͰ֫ಘͰ͖ΔཻͰ/౓ඒຯ͍͠࿦จ

Slide 7

Slide 7 text

Contents l TL;DR l എܠɺ໨త p /F3' p ໨త l ख๏ l ݁Ռ l ల๬ 7

Slide 8

Slide 8 text

NeRF 8 NeRF [1] ͸֤ࢹ఺͔Βݟͨͱ͖ͷ3࣍ݩ࠲ඪͷً౓ʢRGBʣͱີ౓ʢα஋ʣΛֶश͢Δ͜ͱͰɺ ֶशޙʹ೚ҙࢹ఺ͰͷϨϯμϦϯάʢඳըʣ͕ՄೳʹͳΔख๏ ಈը͸ஶऀΒͷproject page [2] ͜Ε͸ΊͪΌͪ͘Ό୺ં͍ͬͯΔͷͰɺStyleNeRFͷཧղʹඞཁͳ߲໨ʹ͍ͭͯৄ͘͠આ໌͠·͢

Slide 9

Slide 9 text

NeRF 9 NeRF͸Volume Renderingͱ͍͏ख๏Λ༻͍ͯը૾ΛϨϯμϦϯά͠·͢ Volume Rendering͸ޡղΛڪΕͣʹݴ͏ͱɺ֤ࢹ఺͔Βݟͨͱ͖ʢޫઢΛඈ͹͢ʣʹɺ ݟ͍͑ͯΔͱ͜ΖΛඳը͢Δख๏Ͱ͢ʢ෺ମ಺෦ʹ͍ͭͯ͸ݟ͑ͳ͍ͷͰඳը͸͠ͳ͍ʣ ࣜΛ௥͍ͬͯ͘͜ͱͰཧղ͕Ͱ͖·͢ʢࣜ͸StyleNeRF [3] ΑΓʣ ϨϯμϦϯάޙͷը૾

Slide 10

Slide 10 text

NeRF 10 NeRF͸Volume Renderingͱ͍͏ख๏Λ༻͍ͯը૾ΛϨϯμϦϯά͠·͢ Volume Rendering͸ޡղΛڪΕͣʹݴ͏ͱɺ֤ࢹ఺͔Βݟͨͱ͖ʢޫઢΛඈ͹͢ʣʹɺ ݟ͍͑ͯΔͱ͜ΖΛඳը͢Δख๏Ͱ͢ʢ෺ମ಺෦ʹ͍ͭͯ͸ݟ͑ͳ͍ͷͰඳը͸͠ͳ͍ʣ ࣜΛ௥͍ͬͯ͘͜ͱͰཧղ͕Ͱ͖·͢ʢࣜ͸StyleNeRF [3] ΑΓʣ ϨϯμϦϯάޙͷը૾ σ! (・)͸ີ౓ 𝒓(𝑠)͸ޫઢɺ𝒓 𝑠 = 𝑜 + 𝑠𝒅 0 ≤ s < tɺo͸ࢹ఺ͷݪ఺ɺ𝒅͸֯౓දݱ(θ, φ) ఆੑతʹ͸ີ౓͕ߴ͍෦෼Ͱexpͷ஋͕খ͘͞ͳΓɺ ݁Ռతʹີ౓σ" (𝒓 𝑡 )ͷ஋͕খ͘͞ͳΔͷͰɺ ෺ମ಺෦͕ݟ͑ͳ͍͜ͱΛදݱͰ͖Δ

Slide 11

Slide 11 text

NeRF 11 NeRF͸Volume Renderingͱ͍͏ख๏Λ༻͍ͯը૾ΛϨϯμϦϯά͠·͢ Volume Rendering͸ޡղΛڪΕͣʹݴ͏ͱɺ֤ࢹ఺͔Βݟͨͱ͖ʢޫઢΛඈ͹͢ʣʹɺ ݟ͍͑ͯΔͱ͜ΖΛඳը͢Δख๏Ͱ͢ʢ෺ମ಺෦ʹ͍ͭͯ͸ݟ͑ͳ͍ͷͰඳը͸͠ͳ͍ʣ ࣜΛ௥͍ͬͯ͘͜ͱͰཧղ͕Ͱ͖·͢ʢࣜ͸StyleNeRF [3] ΑΓʣ ϨϯμϦϯάޙͷը૾ 𝑝" (𝑡)ʹΑًͬͯ౓𝑐" (𝒓 𝑡 , 𝒅) ͕ॏΈ෇͚͞ΕΔͷͰɺ ϨϯμϦϯά͞Εͨը૾Ͱ ෺ମ಺෦͸ݟ͑ͳ͍ σ! (・)͸ີ౓ 𝒓(𝑠)͸ޫઢɺ𝒓 𝑠 = 𝑜 + 𝑠𝒅 0 ≤ s < tɺo͸ࢹ఺ͷݪ఺ɺ𝒅͸֯౓දݱ(θ, φ) ఆੑతʹ͸ີ౓͕ߴ͍෦෼Ͱexpͷ஋͕খ͘͞ͳΓɺ ݁Ռతʹີ౓σ" (𝒓 𝑡 )ͷ஋͕খ͘͞ͳΔͷͰɺ ෺ମ಺෦͕ݟ͑ͳ͍͜ͱΛදݱͰ͖Δ

Slide 12

Slide 12 text

NeRF 12 ࣜ͸ࢹ఺Λݪ఺ͱͨ͠ͱ͖ʹର৅ͱͳΔྖҬʹ͍ͭͯੵ෼͠·͢ ۠ؒ͸< 㱣>ͱͳ͍ͬͯ·͕͢ɺ𝑝!(𝑡)ʹΑΓର৅ྖҬ< 𝑡>·Ͱͷ஋ʹ͔͠ͳΓ·ͤΜ ཁ͢Δʹɺ෺ମͱഎܠΛؚΊͨྖҬʢe.g. 64x64x64ʣͷ3࣍ݩ࠲ඪʹ͍ͭͯ2࣍ݩʹू໿͠·͢ ͪͳΈʹ𝑐! ɾ , σ!(ɾ)͸ॏΈ𝑤ͷMLPʹΑͬͯύϥϝλϥΠζ͞Ε͍ͯ·͢

Slide 13

Slide 13 text

NeRF 13 NeRFʹ͓͍ͯ࠲ඪ΍ࢹ఺ͷඍࡉͳมԽͰً౓ͱີ౓͸มԽ͕େ͖͍ খ͍͞มԽͰେ͖͘มԽ͢Δߴप೾ͳؔ਺Λۙࣅ͢Δͷ͕NN͸ۤख ͦ͜ͰɺNeRF͸ߴप೾ͳϑʔϦΤಛ௃Ͱ࠲ඪͱࢹ఺ΛຒΊࠐΜͰೖྗͱ͢Δ͜ͱͰɺNNࣗମ͸ ௿प೾ͳؔ਺Λۙࣅ͢Δ͚ͩͰΑ͍Positional EncodingΛ࠾༻͍ͯ͠Δ [4] ʢࣜ͸StyleNeRF [3] ΑΓʣ ϑʔϦΤಛ௃ 𝑥͸ม਺Ͱ࠲ඪ𝒙ͱࢹ఺𝒅ΛೖΕΔ ※ [4]͸NeRFͷୈҰஶऀ͕ڞಉୈҰஶऀͰɺNeRFͷલʹݚڀ͍ͯ͠Δ

Slide 14

Slide 14 text

໨త 14 %ϙʔζΛ੍ޚՄೳʹͨ͠طଘݚڀͰ͸512"Ҏ্ͷߴղ૾౓ը૾ੜ੒͸ࠔ೉Ͱ͋ͬͨ /F3'ϕʔε͸ܭࢉίετ͕ղ૾౓ʹԠͯ͡ංେԽͯ͠͠·͏ͷͰɺͦ΋ͦ΋ֶश͕Ͱ͖ͳ͍ /F3'ϕʔεͰͳ͍)PMP("/͸ͦ΋ͦ΋ֶश͕҆ఆ͠ͳ͍ ໨తͱͯ͠͸ l %ϙʔζΛ੍ޚՄೳʹͨ͠ߴղ૾౓ը૾ੜ੒ΛՄೳʹ͢Δ l %ϙʔζΛ੍ޚՄೳʹͨ͠ߴղ૾౓ը૾ੜ੒ͷࡍͷܭࢉίετΛݮΒ͢

Slide 15

Slide 15 text

Contents l TL;DR l എܠɺ໨త l ख๏ l ݁Ռ l ల๬ 15

Slide 16

Slide 16 text

ख๏ 16 લ൒͕ͭ͜ͷ࿦จͷϝΠϯͱͳΔ࿩Ͱ͢ ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 17

Slide 17 text

ख๏ 17 લ൒͕ͭ͜ͷ࿦จͷϝΠϯͱͳΔ࿩Ͱ͢ ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 18

Slide 18 text

ख๏ 18 ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 19

Slide 19 text

ܭࢉίετ࡟ݮͷͨΊͷ7PMVNF3FOEFSJOHͷۙࣅ 19 ઌఔͷ/F3'ͷ7PMVNF3FOEFSJOHͷࣜʹ͓͍ͯɺ4UZMF/F3'Ͱͷ৽ͨͳఆࣜԽΛݟ͍͖ͯ·͢ ஫໨͢΂͖͸ॏΈ𝑤ͷMLPʹΑͬͯύϥϝλϥΠζ͞Ε͍ͯΔ𝑐! ɾ , σ!(ɾ)Ͱ͢ 2૚ͷMLP ࠲ඪ𝒙ΛೖΕͨMLPʢޙड़ʣ Positional Encoding

Slide 20

Slide 20 text

ܭࢉίετ࡟ݮͷͨΊͷ7PMVNF3FOEFSJOHͷۙࣅ 20 ઌఔͷ/F3'ͷ7PMVNF3FOEFSJOHͷࣜʹ͓͍ͯɺ4UZMF/F3'Ͱͷ৽ͨͳఆࣜԽΛݟ͍͖ͯ·͢ ஫໨͢΂͖͸ॏΈ𝑤ͷMLPʹΑͬͯύϥϝλϥΠζ͞Ε͍ͯΔ𝑐! ɾ , σ!(ɾ)Ͱ͢ 𝑔" ͸ͦΕͧΕFC૚ Positional Encoding 𝑓͸StyleGAN2ͷmapping network ϊΠζ𝑧ΛMLPͷmapping networkͰ දݱ͠΍͍͢ಛ௃𝑤ʹม׵

Slide 21

Slide 21 text

ܭࢉίετ࡟ݮͷͨΊͷ7PMVNF3FOEFSJOHͷۙࣅ 21 Ұ౓ඞཁͳࣜʹ͍ͭͯ੔ཧ͓͖ͯ͠·͢ 𝑝!( ɾ)ɿً౓ͷͨΊͷີ౓ʹΑΔॏΈ ℎɿ2૚ͷMLP φ! # (ɾ)ɿn૚ͷNN 𝒓 𝑡 = 𝑜 + 𝒅𝑡ɿࢹ఺ͷݪ఺𝑜ɺ֯౓දݱ𝒅ɺݪ఺͔Βͷڑ཭𝑡Ͱද͞Εͨޫઢ ξ(ɾ)ɿPositional Encoding

Slide 22

Slide 22 text

ܭࢉίετ࡟ݮͷͨΊͷ7PMVNF3FOEFSJOHͷۙࣅ 22 𝑝!( ɾ)ɿً౓ͷͨΊͷີ౓ʹΑΔॏΈ ℎɿ2૚ͷMLP φ! # (ɾ)ɿn૚ͷNN 𝒓 𝑡 = 𝑜 + 𝒅𝑡ɿࢹ఺ͷݪ఺𝑜ɺ֯౓දݱ𝒅ɺݪ఺͔Βͷڑ཭𝑡Ͱද͞Εͨޫઢ ξ(ɾ)ɿPositional Encoding ϨϯμϦϯάޙͷը૾ ℎͱφ! #!,#"(ɾ)Λੵ෼ͷ֎ʹग़͢ 𝒜 ・ ͸ີ౓ܭࢉͷ//ΛؚΜͰ ͍ΔͷͰɺఴࣈ͕𝑛%, 𝑛& ͱͳ͍ͬͯΔ ࣍ϖʔδͰهड़

Slide 23

Slide 23 text

ܭࢉίετ࡟ݮͷͨΊͷ7PMVNF3FOEFSJOHͷۙࣅ 23 ૚ͷ.-1 ℎͱɺً౓ͱີ౓ͷܭࢉ༻.-1 φ! #!,#"(ɾ)͸෺ମද໘͸ʹۙͮ͘ͷͰɺہॴతʹ͸ ઢܗؔ਺ͱͯ͠ѻ͑Δ ઢܗؔ਺ͷظ଴஋ʢੵ෼ʣ͸தؒ۠ؒͰධՁ͞Εͨؔ਺ͷ஋ͱ౳ՁͳͷͰɺྑ͍ۙࣅͱͯ͠ѻ͑Δ ˞ ਃ͠༁͋Γ·ͤΜɻӄؔ਺දݱपΓͷ࿩ͩͱ͸ࢥ͏ͷͰ͕͢ɺ͜͜ͷҙຯ͕͍·͍ͪߝౡ͸ ཧղͰ͖͍ͯͳ͍Ͱ͢ ℎ, φ! #!,#" (ɾ)Λ֎ʹग़͢͜ͱͰɺ௿ղ૾౓Ͱ3BEJBODF'JFMETΛಘ͔ͯΒߴղ૾౓ԽͰ͖·͢ ʢਤΛݟͨ΄͏͕Θ͔Γ΍͍͢ͷͰޙड़ʣ

Slide 24

Slide 24 text

ख๏ 24 ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 25

Slide 25 text

ΞʔςΟϑΝΫτͷͰͳ͍upsampler 25 StyleNeRFͰ࢖༻͕ݕ౼͞Εͨupsampler͸ҎԼͷ3ͭͰ͢ l Pixel Shuffle [5] l LIEF [6] ֶशՄೳͳupsampler νΣεϘʔυͷΑ͏ͳΞʔςΟϑΝΫτ΍ɺςΫενϟ͕ը૾ͷฏ໘ʹషΓ෇͖͕ى͜Γ·͢ chessboard artifact͸ [7] ʹͯɺtexture sticking͸ [8] ʹͯ֬ೝ͢ΔͱΘ͔Γ΍͍͢Ͱ͢ l Bilinear ֶशΛ͠ͳ͍upsampler bilinear upsampler͸ϩʔύεϑΟϧλͷ໾ׂΛՌͨ͢ͷͰɺ׈Β͔ͳը૾Λग़ྗ͢Δ͜ͱͰ ্هͷΞʔςΟϑΝΫτ͸ग़ͳ͍Ͱ͕͢ɺ୅ΘΓʹ๐ͷΑ͏ͳΞʔςΟϑΝΫτ͕ग़·͢

Slide 26

Slide 26 text

ΞʔςΟϑΝΫτͷͰͳ͍upsampler 26 StyleNeRFͰఏҊ͞Ε͍ͯΔͷ͸ֶशϕʔεͷpixel shuffleͱ׈Β͔ʹ͢Δbilinear upsamplerΛ ଍͍͍ͯ͠ͱ͜ͲΓ͠Α͏ͱ͍͏΋ͷͰ͢ ࢸͬͯγϯϓϧͰɺpixel shuffleΛ͔͚ͨ͋ͱʹϩʔύεϑΟϧλͰ׈Β͔ʹ͢Δͱ͍͏΋ͷͰ͢

Slide 27

Slide 27 text

ख๏ 27 ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 28

Slide 28 text

Revisiting Progressive Growing 28 Progressive Growingͱ͸ஈ֊తʹੜ੒ղ૾౓Λ্͛ͳ͕ΒֶशΛਐΊֶ͍ͯ͘शํ๏ StyleGAN2 [9] Ͱࣃͷ޲͖ͳͲ͕ಛఆͷղ૾౓ͷ૚Ͱੜ੒͞Εͯɺਖ਼͍͠޲͖Ͱੜ੒͞Εͳ͍ ݱ৅Λ๷͙ͨΊʹProgressive Growing͸ഇࢭ͞Ε·ͨ͠ ͔͠͠ͳ͕ΒɺProgressive Growing͸ֶशͷ҆ఆੑͱ͍͏ҙຯͰͷҖྗ͸݈ࡏ ͦ͜ͰɺStyleNeRFͰ͸Ұ෦มߋͨ͠Progressive GrowingΛ࠾༻͠·ͨ͠

Slide 29

Slide 29 text

Revisiting Progressive Growing 29 (a)ɿΦϦδφϧͷProgressive GrowingΛ࠾༻ͨ͠৔߹Ͱɺ్தͷto RGBͷ૚ͰҰ౓RGBը૾ʹ ໭͠ɺ࢒ࠩϒϩοΫͷΑ͏ʹ଍͠߹Θͤͯ࣍ͷղ૾౓ʹਐΉ (b)ɿStyleNeRF͸࢒ࠩϒϩοΫΛແ͘͢͜ͱͰܭࢉ଎౓Λ޲্ʢ඼࣭ʹมԽͳ͠ʣ (c)ɿStyleNeRFͷDiscriminatorͷ࢒ࠩϒϩοΫ͸ͦͷ··

Slide 30

Slide 30 text

ख๏ 30 ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 31

Slide 31 text

ܭࢉίετΛܰ͘͢ΔͨΊͷৠཹ 31 Volume RenderingͷۙࣅͰɺً౓༻ͷ2૚ͷMLP h' ͱً౓ີ౓Λॲཧ͢ΔNN φ( )#,)"(ɾ)Λ ظ଴஋ʢੵ෼ʣͷ֎ʹग़͠·ͨ͠ ͜ΕʹΑΓɺҎԼͷਤͷΑ͏ʹ௿ղ૾౓ͷ૚ͰRadiance FieldsΛܭࢉ͠ɺً౓ີ౓Λॲཧ͢Δ NN φ( )#,)"(ɾ)Ͱ௿ղ૾౓ͷRadiance Fields͔Βͷ2࣍ݩදݱΛߴղ૾౓Խ͢Δ͜ͱ͕Մೳʹʂ

Slide 32

Slide 32 text

ܭࢉίετΛܰ͘͢ΔͨΊͷৠཹ 32 ͜ΕʹΑΓɺҎԼͷਤͷΑ͏ʹ௿ղ૾౓ͷ૚ͰRadiance FieldsΛܭࢉ͠ɺً౓ີ౓Λॲཧ͢Δ // φ! #!,#"(ɾ)Ͱ௿ղ૾౓ͷRadiance Fields͔Βͷ2࣍ݩදݱΛߴղ૾౓Խ͢Δ͜ͱ͕Մೳʹʂ ෼཭͞Εًͨ౓ܭࢉ༻ͷ2૚ͷMLP

Slide 33

Slide 33 text

ܭࢉίετΛܰ͘͢ΔͨΊͷৠཹ 33 ͜ΕʹΑΓɺҎԼͷਤͷΑ͏ʹ௿ղ૾౓ͷ૚ͰRadiance FieldsΛܭࢉ͠ɺً౓ີ౓Λॲཧ͢Δ // φ! #!,#"(ɾ)Ͱ௿ղ૾౓ͷRadiance Fields͔Βͷ2࣍ݩදݱΛߴղ૾౓Խ͢Δ͜ͱ͕Մೳʹʂ ෼཭͞Εًͨ౓ܭࢉ༻ͷ2૚ͷMLP ௿ղ૾౓ͷRadiance Fields͔Βͷ 2࣍ݩදݱΛߴղ૾౓Խ͢ΔNN

Slide 34

Slide 34 text

ܭࢉίετΛܰ͘͢ΔͨΊͷৠཹ 34 ͨͩ͠ɺݱ࣮తʹ͸௿ղ૾౓͔Βߴղ૾౓ʹ͢ΔNN͸3࣍ݩදݱΛҡ࣋Ͱ͖ͳ͍ ͦ͜Ͱɺߴղ૾౓ͷStyleNeRFͷग़ྗը૾ͱɺStyleNeRFͷग़ྗը૾͔ΒRadiance FieldsΛ ܭࢉΛͯ͠NeRFͷVolume Renderingͨ͠΋ͷͱMSEΛऔΔ ͜ΕʹΑΓɺ3D consistency͕औΕΔʢNeRF-path Regularizationʣ ϥϯμϜͳSݸͷ఺Ͱ ฏۉΛऔΔ 𝑅#$ ͸௿ղ૾౓ͷRadiance FieldsͰɺ 𝑅#$ ΛStyleNeRFͷVolume Rendering [i, j]ͷఴࣈ͸ग़ྗը૾ͷ࠲ඪ 𝑅%&' ͸StyleNeRFͷग़ྗը૾Ͱɺ 𝑅%&' ͔ΒRadiance FieldsΛܭࢉͯ͠ɺ NeRFͷVolume Rendering [i, j]ͷఴࣈ͸ग़ྗը૾ͷ࠲ඪ

Slide 35

Slide 35 text

ख๏ 35 ΞʔςΟϑΝΫτͷͰͳ͍ upsampler ܭࢉίετ࡟ݮͷͨΊͷ Volume Renderingͷۙࣅ Revisiting Progressive Growing ܭࢉίετΛ ܰ͘͢ΔͨΊͷৠཹ ً౓༧ଌͷωοτϫʔΫ΁ͷ ࢹ఺৚݅ೖྗͷऔΓ΍Ί ϊΠζೖྗΛ2D͔Β3D΁

Slide 36

Slide 36 text

ͦͷଞઃఆ 36 l ً౓༧ଌͷωοτϫʔΫ΁ͷࢹ఺৚݅ೖྗͷऔΓ΍Ί Radiance Fieldsͷܭࢉʹ͓͚Δً౓༧ଌͷωοτϫʔΫͷํʹ͸ࢹ఺ೖྗΛແ͍ͯ͘͠Δ ࢹ఺ʹΑًͬͯ౓͸มԽ͢Δ͕ɺࢹ఺Λ৚݅ͱͯ͠ೖΕΔͱ·΍͔͠ͷ૬ؔΛ֫ಘ͠΍͘͢ͳΔ ີ౓ܭࢉ͸ݩʑࢹ఺ʹରͯ͠ෆมͰ͋ͬͯ΄͍͠ͷͰɺ͸ͳ͔Β෼཭͞Ε͍ͯΔ l ϊΠζೖྗΛ2D͔Β3D΁ StyleGAN2ͷΑ͏ʹϊΠζΛ2DͰೖΕΔͱɺStyleNeRFͰ͸ϊΠζʹΔόϦΤʔγϣϯ͕ ը૾ͷ2࣍ݩฏ໘্ʹషΓ෇͍ͯ͠·͏͜ͱ͕໰୊ͱͳ͍ͬͯΔ ͦ͜Ͱɺ3DͷҐஔʹରԠͨ͠ϊΠζΛ2Dฏ໘΁ม׵͢Δʢ࠶ϥελϥΠζʣ

Slide 37

Slide 37 text

ख๏·ͱΊʴͦͷଞઃఆʹ͍ͭͯ 37 ʻఏҊख๏ʼ l Volume Renderingͷۙࣅʢ௿ղ૾౓ͷRadiance Fields͔Βߴղ૾౓ը૾ͷੜ੒ʣ l Pixel Shuffle + ϩʔύεϑΟϧλͷupsampler l ࢒ࠩϒϩοΫΛແͨ͘͠Progressive Growing l Volume RenderingͷۙࣅʹΑΔ3D conssitency૕ࣦΛิర͢ΔͨΊͷৠཹʢNeRF path Regu.ʣ l ً౓༧ଌωοτϫʔΫ΁ͷࢹ఺ೖྗͷഇࢭ l 3DʹରԠͨ͠ϊΠζೖྗ ʻͦͷଞઃఆʼ l mapping networkͱDiscriminatorͱ໨తؔ਺͸StyleGAN2ͱಉ͡ l NeRFදݱʹ͸NeRF++ [10] Λ࢖༻ p NeRF++͸എܠͱલܠΛผʑͷωοτϫʔΫͰϞσϧԽ p લܠഎܠผʑϞσϧԽ͸BlockGAN [11] ΍GIRAFFE [12] Ͱ΋ར༻͞Ε͍ͯΔ

Slide 38

Slide 38 text

ख๏·ͱΊʴͦͷଞઃఆʹ͍ͭͯ 38 ʻΧϝϥϙʔζ༧ଌͷࣗݾڭࢣ͋Γֶशʼ Χϝϥϙʔζ͸جຊతʹin the wildͳσʔλʹ͸ଘࡏ͠ͳ͍ͷͰɺGeneratorͷֶशʹ࢖͏ ΧϝϥϙʔζΛ༧ଌ͢ΔPredictorΛ༻ҙͯࣗ͠ݾڭࢣ͋ΓֶशΛߦ͏ ͜Ε͸HoloGAN [13] ΍ଞͷख๏Ͱ΋ߦΘΕΔࣗݾڭࢣ͋Γֶश

Slide 39

Slide 39 text

3D aware GANs 39 ͜ͷ࿦จͷൺֱͱͳΔ3Dʹ͍ͭͯ໌ࣔతʹऔΓѻ͑ΔΑ͏ʹͨ͠ੜ੒ϞσϧΛ঺հ͠·͢ l HoloGAN l GRAF l GIRAFFE l π-GAN

Slide 40

Slide 40 text

HoloGAN 40 NeRFΛऔΓೖΕ͍ͯͳ͍ݩ૆ͱݴͬͯ΋͍͍3D aware GANsͷ1ͭ 1. 3࣍ݩܗঢ়ͷconstantʢStyleGANಉֶ༷शՄೳͳύϥϝʔλ͔ΒελʔτʣΛೖྗ

Slide 41

Slide 41 text

HoloGAN 41 NeRFΛऔΓೖΕ͍ͯͳ͍ݩ૆ͱݴͬͯ΋͍͍3D aware GANsͷ1ͭ 1. 3࣍ݩܗঢ়ͷconstantʢStyleGANಉֶ༷शՄೳͳύϥϝʔλ͔ΒελʔτʣΛೖྗ 2. ΧϝϥϙʔζΛೖྗͯ͠3࣍ݩಛ௃ۭؒͰճస౳Λߦ͏

Slide 42

Slide 42 text

HoloGAN 42 NeRFΛऔΓೖΕ͍ͯͳ͍ݩ૆ͱݴͬͯ΋͍͍3D aware GANsͷ1ͭ 1. 3࣍ݩܗঢ়ͷconstantʢStyleGANಉֶ༷शՄೳͳύϥϝʔλ͔ΒελʔτʣΛೖྗ 2. ΧϝϥϙʔζΛೖྗͯ͠3࣍ݩಛ௃ۭؒͰճస౳Λߦ͏ 3. 2࣍ݩ΁bilinear resamplingͰϨϯμϦϯάʢbilinear resamplingʹ͍ͭͯ͸লུʣ

Slide 43

Slide 43 text

HoloGAN 43 NeRFΛऔΓೖΕ͍ͯͳ͍ݩ૆ͱݴͬͯ΋͍͍3D aware GANsͷ1ͭ 1. 3࣍ݩܗঢ়ͷconstantʢStyleGANಉֶ༷शՄೳͳύϥϝʔλ͔ΒελʔτʣΛೖྗ 2. ΧϝϥϙʔζΛೖྗͯ͠3࣍ݩಛ௃ۭؒͰճస౳Λߦ͏ 3. 2࣍ݩ΁bilinear resamplingͰϨϯμϦϯάʢbilinear resamplingʹ͍ͭͯ͸লུʣ 4. 2Dͷը૾ੜ੒ͱಉ༷ͷϓϩηε

Slide 44

Slide 44 text

HoloGAN 44 NeRFΛऔΓೖΕ͍ͯͳ͍ݩ૆ͱݴͬͯ΋͍͍3D aware GANsͷ1ͭ 1. 3࣍ݩܗঢ়ͷconstantʢStyleGANಉֶ༷शՄೳͳύϥϝʔλ͔ΒελʔτʣΛೖྗ 2. ΧϝϥϙʔζΛೖྗͯ͠3࣍ݩಛ௃ۭؒͰճస౳Λߦ͏ 3. 2࣍ݩ΁bilinear resamplingͰϨϯμϦϯάʢbilinear resamplingʹ͍ͭͯ͸লུʣ 4. 2Dͷը૾ੜ੒ͱಉ༷ͷϓϩηε 5. ೖྗΧϝϥϙʔζ͸ੜ੒ը૾͔Β༧ଌ͢Δʢࣗݾڭࢣ͋Γֶशʣ

Slide 45

Slide 45 text

HoloGAN 45 3DϙʔζΛίϯτϩʔϧͨ͠ੜ੒͕Մೳ ͔͠͠ͳ͕Βɺஶऀʢߝౡʣ͸͜ΕΛֶशͨ͜͠ͱ͕͋Γ·͕͢ɺΧϝϥϙʔζ͕ڭࢣͱͯ͠ ༩͑ΒΕ͍ͯΔΘ͚Ͱ͸ͳ͍ͷͰɺඇৗʹෆ҆ఆͰ͢ʢ3D -> 2Dͷbilinear resampling΋ݪҼʣ ·ͨɺੜ੒ը૾͸΋ͬͱτϩέͯΔը૾͕ଟ͘ݟΒΕ·͢ ը૾ੜ੒෼໺͋Δ͋ΔͷνΣϦʔϐοΩϯάͰ͢

Slide 46

Slide 46 text

GRAF [14] 46 γϯϓϧʹGANʹNeRFΛऔΓೖΕͨख๏ ͦͷ··NeRFΛ࢖͏ͷͰɺ΋ͪΖΜܭࢉίετ͕ߴ͍

Slide 47

Slide 47 text

π-GAN [15] 47 SIRENͱݺ͹ΕΔӄؔ਺දݱʹ͓͍ͯڧྗͳωοτϫʔΫΛϕʔεͱͨ͠GAN + NeRF Πϝʔδͱͯ͠͸HoloGAN + SIREN + NeRF

Slide 48

Slide 48 text

GIRAFFE [12] 48 BlockGANಉ༷ɺલܠഎܠΛผʑʹϞσϧԽͨ͠GAN + NeRFͷϞσϧ π-GAN͸concurrent work

Slide 49

Slide 49 text

GIRAFFE [12] 49 ͪͳΈʹGIRAFFE + StyleGAN2ͱ͍͏StyleNeRFͷΑ͏ͳconcurrent workͰ͋Δ GIRAFFE HD [16] ͱ͍͏࿦จ΋ొ৔͍ͯ͠Δ

Slide 50

Slide 50 text

Contents l TL;DR l എܠɺ໨త l ख๏ l ݁Ռ p Ablation Study p ϥϯμϜը૾ੜ੒ p Χϝϥϙʔζ੍ޚ p ༷ʑͳԠ༻ u Style Mixing u Style Interpolation u GAN inversion l ల๬ 50

Slide 51

Slide 51 text

Ablation Study 51 l (a)ɿw/o Progressive Growing ൅ͷ෼͚໨͕ࢹ఺ʹ௥ਵ͠ͳ͍ StyleGAN2ͱ͸૬൓͢Δ݁Ռʹ l (b)ɿw/o NeRF-path Regularization 3D consistencyͷ૕ࣦ l (c)ɿw/ view condition ·΍͔͠ͷ૬ؔʹΑΔ3D consistencyͷ૕ࣦ l (table)ɿupsamplerͷൺֱ pixel shuffle + ϩʔύεϑΟϧλͷఏҊख๏͕ ࠷΋඼࣭͕͍͍݁Ռʹ

Slide 52

Slide 52 text

ϥϯμϜը૾ੜ੒ 52 ͲͷσʔληοτͰ΋टඌҰ؏ͯ͠3D consistency͕͋Γͭͭߴ඼࣭ͳੜ੒ ͨͩɺগ͠ଞͷख๏Λѱ͘νΣϦʔϐοΩϯά͗͢͠ͳؾ΋ɺɺɺ

Slide 53

Slide 53 text

ϥϯμϜը૾ੜ੒ 53 ఆྔతʹ΋StyleGAN2ʢ2D GANʣͱίϯύϥͰɺ3D aware GANsͷதͰ͸Ϳͬͪ͗Γ ϨϯμϦϯά࣌ؒతʹ΋GRAF΍π-GANͱൺֱ͔ͯ͠ͳΓ଎͍

Slide 54

Slide 54 text

Χϝϥϙʔζ੍ޚ 54 ܇࿅σʔλʹ͸ແ͍Α͏ͳۃ୺ͳΧϝϥϙʔζมԽʹ΋൚Խ͍ͯ͠Δ ͜ͷ݁Ռ͸GANs͸Dataset BiasΛͦͷ··൓ө͢Δͱ͍͏جຊతͳ໰୊ [17] Λଧͪഁ͓ͬͯΓɺ ৽ͨͳGANsͷղੳͷҰาͱͳΔ͔΋͠Εͳ͍

Slide 55

Slide 55 text

Χϝϥϙʔζ੍ޚ 55 ΞϓϦέʔγϣϯͱͯ͠appendixͰෳ਺ࢹ఺ੜ੒͔ͯ͠Βɺෳ਺ࢹ఺͔Β3࣍ݩ෮ݩ͢Δख๏ͷ COLMAPΛ༻͍ͨ3࣍ݩ෮ݩ݁Ռ΋͋Δ

Slide 56

Slide 56 text

༷ʑͳԠ༻ 56 l Style Mixing StyleGAN [18] Ͱ΋ߦΘΕ͍ͯͨStyle Mixingͷ݁Ռ Source Aͷਓʹରͯ͠Source Bͷਓͷಛ௃ΛೖΕΔ 3BEJBODF'JFMETܭࢉͷखલͰೖΕΔͱ ਓ෺ಛ௃͕มԽ͢Δ 3BEJBODF'JFMETܭࢉͷޙʹೖΕΔͱ ഽͳͲͷࡉ͔͍ಛ௃͕มԽ͢Δ

Slide 57

Slide 57 text

༷ʑͳԠ༻ 57 l Style Interpolation ΧϝϥϙʔζΛม͑ͳ͕Β2ͭͷੜ੒ը૾ؒͷελΠϧಛ௃Λ಺ૠ͍ͯ͠Δ

Slide 58

Slide 58 text

༷ʑͳԠ༻ 58 l GAN inversion p Χϝϥϙʔζͷࣗݾڭࢣ͋Γֶशʹ༻͍ͨpredictorΛ࢖͍·Θͯ͠ϙʔζਪఆ͕Մೳ p GAN inversionͰજࡏۭؒʹ࣮ը૾ΛຒΊࠐΜͰ͔ΒɺΧϝϥϙʔζͷมߋ΍CLIPʹΑΔ ฤूΛՄೳʹ͢ΔʢCLIPʹΑΔฤू͸StyleCLIP [19] ࢀরʣ

Slide 59

Slide 59 text

Contents l TL;DR l എܠɺ໨త l ख๏ l ݁Ռ l ల๬ 59

Slide 60

Slide 60 text

ల๬ 60 ʻLimitationʼ l 3D mesh͕௿ղ૾౓ͷ΋ͷ͔͠ͳ͍ͷͰɺͦ͜͸π-GANͳͲʹྼΔ p ͜Ε͸concurrent workͷEG3D [20] Ͱղܾ͞Ε͍ͯΔ

Slide 61

Slide 61 text

ల๬ 61 ʻLimitationʼ l 3D mesh͕௿ղ૾౓ͷ΋ͷ͔͠ͳ͍ͷͰɺͦ͜͸π-GANͳͲʹྼΔ p ͜Ε͸concurrent workͷEG3D [20] Ͱղܾ͞Ε͍ͯΔ l CompCarsͷΑ͏ͳෳࡶͳܗঢ়ͩͱ·ͩ·ͩΞʔςΟϑΝΫτ͕໨ཱͭ

Slide 62

Slide 62 text

Reference 62 [1] Mildenhall et al., “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, ECCV, 2020. [2] Mildenhall et al., ”NeRF project page”, https://www.matthewtancik.com/nerf, 2022೥5݄14೔Ӿཡ. [3] Gu et al., “StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis”, ICLR, 2022. [4] Tancik et al., “Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains”, NeurIPS, 2020. [5] Shi et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network “, CVPR, 2016. [6] Chen et al., “Learning Continuous Image Representation with Local Implicit Image Function”, CVPR, 2021. [7] Odena et al., “Deconvolution and Checkerboard Artifacts”, https://distill.pub/2016/deconv-checkerboard/, 2022೥5݄15೔ Ӿཡ [8] Karras et al., “Alias-Free Generative Adversarial Networks (StyleGAN3)”, https://nvlabs.github.io/stylegan3/, 2022೥5݄15೔ Ӿཡ [9] Karras et al., “Analyzing and Improving the Image Quality of StyleGAN”, CVPR, 2020. [10] Zhang et al., “NeRF++: Analyzing and Improving Neural Radiance Fields”, arXiv preprint, 2020. [11] Phuoc et al., “BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images”, NeurIPS, 2020. [12] Niemeyer and Geiger, “GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields”, CVPR, 2021. [13] Phuoc et al., “HoloGAN: Unsupervised Learning of 3D Representations From Natural Images”, ICCV, 2019. [14] Schwarz et al., “GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis”, NeurIPS, 2020. [15] Chan et al., “pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis”, CVPR, 2021. [16] Xue et al., “GIRAFFE HD: A High-Resolution 3D-aware Generative Model”, CVPR, 2022. [17] Jahanian et al., “On the "steerability" of generative adversarial networks”, ICLR, 2020. [18] Karras et al., “A Style-Based Generator Architecture for Generative Adversarial Networks”, CVPR, 2019. [19] Patashnik et al., “StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery”, ICCV, 2021. [20] Chan et al., “EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks”, arXiv preprint, 2021. <NeRFͷ೔ຊޠղઆͷܾఆ൛ʼ [21] ࢁ಺, “ࡾ࣍ݩۭؒͷχϡʔϥϧͳදݱͱNeRF”, https://blog.albert2005.co.jp/2020/05/08/nerf/, 2022೥5݄14Ӿཡ.