Slide 1

Slide 1 text

χϡʔϥϧ3࣍ݩ෮ݩೖ໳ ੪౻ ൏հ ୈ188ճCGɾୈ32ճDCCɾୈ231ճCVIM߹ಉݚڀൃදձ

Slide 2

Slide 2 text

੪౻ ൏հʢ͍͞ͱ͏ ͠ΎΜ͚͢ʣ • ϖϯγϧόχΞେֶ٬һݚڀһ (2014-2015) • ೆΧϦϑΥϧχΞେֶ PhD (2015-2020) • Πϯλʔϯ: FAIR, FRL, Adobe,ϚοΫεϓϥϯΫݚڀॴͳͲ • Reality Labs Research ݚڀһ (2020-) Computational Body Building 
 (SIGGRAPH 2015) PIFu/PIFuHD 
 (ICCV 2019, CVPR 2020) SCANimate 
 (CVPR 2021)

Slide 3

Slide 3 text

ΰʔϧ • χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫͷཧղ • ֤࠷৽ݚڀΛϑϨʔϜϫʔΫʹ౰ͯ͸ΊΒΕΔ • ֤ݚڀྖҬͷτϨϯυΛ཈͑Δ ͜ͷνϡʔτϦΞϧʹ͍ͭͯ

Slide 4

Slide 4 text

• Hand-craftedͳࣄલ෼෍͕ෆཁ 
 
 • σʔλͦͷ΋ͷ͔Βෳࡶͳࣄલ෼෍ΛಘΔ͜ͱ͕Ͱ͖Δ ͳͥσʔλυϦϒϯͳ3࣍ݩ෮ݩʁ PIFuHD [Saito2020] ϚϯϋολϯϫʔϧυԾઆ [Furukawa2009]

Slide 5

Slide 5 text

χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ

Slide 6

Slide 6 text

χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ

Slide 7

Slide 7 text

• Ԡ༻ઌʹΑܾͬͯ·Δ͜ͱ͕ଟ͍ • ྫɿखܰͳ3࣍ݩ෮ݩˠ୯؟ը૾͕ೖྗ • ೖྗɿը૾σʔλͱʢ෦෼తͳʣ3࣍ݩσʔλʹ෼ྨͰ͖Δ • ద੾ͳΤϯίʔμʔΛબ୒͢Δඞཁ͕͋Δ • SOTAͷΞʔΩςΫνϟΛબ୒͢Δͷ͕جຊ ೖྗσʔλʹ͍ͭͯ

Slide 8

Slide 8 text

• େҬಛ௃ʢશମܗঢ়ΛϕΫτϧͰදݱʣ • ΧςΰϦ಺ͷܗঢ়ͷྨࣅੑ͕ߴ͍ • ڧ੍͍໿ΛՃ͍͑ͨʢະ؍ଌͷ෦෼͕େ͖͍৔߹ͳͲʣ • ҙຯ্ͷฤूΛߦ͍͍ͨ • ہॴಛ௃ʢۭؒํ޲ͷ޿͕ΓΛอ࣋ͨ͠ಛ௃ྔʣ • ֶशσʔλ͕ݶΒΕ͍ͯΔ • ਫ਼ࡉͳܗঢ়Λ෮ݩ͍ͨ͠ • ہॴతͳฤूΛՃ͍͑ͨ େҬಛ௃vsہॴಛ௃

Slide 9

Slide 9 text

• ը૾σʔλͷ৔߹͸࠷৽ͷը૾ΤϯίʔμʔΛ࢖͏ͷ͕جຊ 
 e.g., VGG[Simonyan2014], ResNet[He2016], Hourglass[Newell2016] • λεΫʹԠͯ͡ޮՌΛൃش͢ΔΞʔΩςΫνϟ͕ҧ͏͜ͱ΋͋Δ 
 e.g., Hourglass →ϙʔζਪఆɺVGG→ը෩సࣸ Τϯίʔμʔɿ୯؟ը૾ɺਂ౓෇͖ը૾

Slide 10

Slide 10 text

• τϨϯυɿඇہॴతͳΤϯίʔμʔʢViT [Dosovitskiy2021]ͳͲʣ Τϯίʔμʔɿ୯؟ը૾ɺਂ౓෇͖ը૾ Lin [Lin2022]

Slide 11

Slide 11 text

• Χϝϥύϥϝʔλ͕ط஌ͷ৔߹ɺ 
 زԿతؔ܎ΛωοτϫʔΫʹ૊ΈࠐΉ • ྫɿϗϞάϥϑΟʔ [Yao2018] Τϯίʔμʔɿෳ਺ࢹ఺ը૾ https://medium.com/@NegativeMind//2d-3d෮ݩٕज़Ͱ࢖ΘΕΔ༻ޠ·ͱΊ-27403689da1b

Slide 12

Slide 12 text

• Kinect΍LiDARͳͲ͔ΒಘΒΕΔೖྗ͕ओ • ը૾΍ϝογϡͱҟͳΓɺ఺܈͸௖఺਺͕มಈͨ͠Γॱং͕ͳ͍ • ఺܈ɾεΩϟϯͷಛੑʹରԠͨ͠ΞʔΩςΫνϟ͕ඞཁʹͳͬͯ͘Δ Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]

Slide 13

Slide 13 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]

Slide 14

Slide 14 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]

Slide 15

Slide 15 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a] https://github.com/ThibaultGROUEIX/AtlasNet/blob/master/model/model_blocks.py x: ೖྗಛ௃ྔʢ௖఺࠲ඪɺ๏ઢͳͲʣ MLPͰ֤఺ͷxΛજࡏม਺ʹม׵ ֤఺ͷજࡏม਺Λmax poolingͰ౷߹ ౷߹͞Εͨજࡏม਺ʹ 
 ͞ΒʹMLPΛ͔͚ͯ 
 ࠷ऴతͳಛ௃ྔΛಘΔ

Slide 16

Slide 16 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]ͷ໰୊఺ • શମͷಛ௃͕̍ճͷMax poolingͰ౷߹ˠ֊૚తͳߏ଄ཧղ͕ࠔ೉ • ֊૚తͳMax poolingͷಋೖ (PointNet++ [Qi2017b])

Slide 17

Slide 17 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ Sparse Convolution 3D Convolution [Wu2015]: O(kdmn) 
 ϧʔϜαΠζͷεΩϟϯʹద༻ෆՄ Sparse 3D Convolution [Graham2017] 
 େن໛γʔϯͷεΩϟϯ͕ॲཧՄೳʹ

Slide 18

Slide 18 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ Ԡ༻ྫɿSparse Convolution େن໛ͳεΩϟϯͷิ׬ [Dai2020]

Slide 19

Slide 19 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet + 2D Convolutions [Peng2020] ఺܈ΛPointNetͰॲཧ͠ಛ௃ۭؒʹϚοϐϯάͨ͠ͷͪ 2࣍ݩฏ໘܈ʢTri-plane)ʹసࣸͯ͠৞ΈࠐΈωοτϫʔΫͰॲཧ

Slide 20

Slide 20 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ τϨϯυᶃɿ3࣍ݩੜ੒ϞσϧͷͨΊͷTri-planeදݱ EG3D [Chan2022]

Slide 21

Slide 21 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ τϨϯυᶃɿ3࣍ݩੜ੒ϞσϧͷͨΊͷTri-planeදݱ EG3D [Chan2022]

Slide 22

Slide 22 text

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ τϨϯυᶄɿճసෆมɾಉมΤϯίʔμʔ Vector Neurons [Deng2022] ௨ৗͷશ݁߹૚ εΧϥʔ Vector Neurons 3࣍ݩϕΫτϧ

Slide 23

Slide 23 text

Τϯίʔμʔʹ͓͚Δࠓޙͷ՝୊ ߴղ૾౓ɾಈత෺ମ΁ͷରԠ

Slide 24

Slide 24 text

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ γʔϯಛԽܕͷ3࣍ݩ෮ݩ

Slide 25

Slide 25 text

Instant-NGP [Mueller2022] ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ τϨϯυᶃɿσʔλߏ଄ͷվળʹΑΔ࠷దԽʹΑΔߴ଎෮ݩ

Slide 26

Slide 26 text

Nerfies [Park2021] ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ τϨϯυᶄɿมܗͷಉֶ࣌शˠಈత෺ମ΁ͷରԠ BANMO [Yang2022]

Slide 27

Slide 27 text

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ग़ྗσʔλɾσίʔμʔ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

Slide 28

Slide 28 text

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ ग़ྗσʔλɾσίʔμʔ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ग़ྗσʔλ

Slide 29

Slide 29 text

ϘΫηϧ • ໨ඪܗঢ়Λ3࣍ݩ֨ࢠঢ়ʹ֨ೲ • Occupancy • ූ߸෇͖ڑ཭ؔ਺ʢSDF) • TSDF • 3D Convolution͕ͦͷ··࢖͑Δ • ϝϞϦ࢖༻ྔ͕ϘτϧωοΫ: O(d3) [Choy2016; Maturana2015; Qi2016; Wu2015] Image credit [Mescheder2019]

Slide 30

Slide 30 text

ϘΫηϧ Ԡ༻ྫɿ̍ຕը૾͔ΒͷNon-parametricͳ3࣍ݩإ෮ݩ [Jackson2017]

Slide 31

Slide 31 text

ϘΫηϧ Ԡ༻ྫɿ൅ܕͷύϥϝʔλԽٴͼը૾͔Βͷਪఆ [Saito2018]

Slide 32

Slide 32 text

[Saito2018]

Slide 33

Slide 33 text

[Saito2018]

Slide 34

Slide 34 text

ϘΫηϧ 8෼໦ߏ଄Λ༻͍ͨޮ཰తͳ3࣍ݩܗঢ়෮ݩ [Reigler2017, Tatarchenko2017]

Slide 35

Slide 35 text

ਂ౓Ϛοϓ • ը૾ม׵(image-to-image translation)ͷҰछ: 
 RGB→Depth • ը૾ݚڀͷ࠷৽ٕज़͕Ԡ༻͠΍͍͢ 
 ʢυϝΠϯసҠɺGANͳͲʣ • ൚Խੑೳ͕ߴ͍Ұํɺ 
 ΧςΰϦ͝ͱͷਫ਼ࡉͳ෮ݩ͸ෆ޲͖ ̍ຕը૾͔Βͷߴղ૾ͳਂ౓Ϛοϓਪఆ [Miangoleh2021]

Slide 36

Slide 36 text

ਂ౓Ϛοϓ Ԡ༻ྫɿ֦ࢄϞσϧΛ༻͍ͨଟࢹ఺εςϨΦ [Shao2022]

Slide 37

Slide 37 text

఺܈ [Fan2017] • ໨ඪܗঢ়Λ௖఺ͷू߹ͱͯ͠දݱ • શ௖఺Λಉ࣌ʹग़ྗ͢ΔΞϓϩʔν͕ओྲྀ • τϙϩδʔͷมԽʹॊೈͰେ͖ͳมܗʹ΋ରԠՄ • ఺܈͔ΒϨϯμϦϯά౳ͷͨΊʹϝογϡԽ͢Δ ͜ͱ͕೉͘͠ߴ඼࣭ͳܗঢ়ग़ྗʹෆ޲͖ Image credit [Mescheder2019]

Slide 38

Slide 38 text

఺܈ [Fan2017] ը૾͔Βજࡏม਺Λճؼ͢ΔΤϯίʔμʔͱ જࡏม਺͔Β఺܈Λు͖ग़͢σίʔμʔΛֶश͢Δ

Slide 39

Slide 39 text

఺܈ͷԠ༻ྫ ਖ਼ن෼෍ʹԊͬͯ 
 αϯϓϧ͞Εͨ఺܈ λʔήοτ3࣍ݩܗঢ় ࿈ଓਖ਼نԽྲྀ ࿈ଓਖ਼نԽྲྀʹΑΔ఺܈ϞσϦϯά [Yang2020]

Slide 40

Slide 40 text

࿈ଓਖ਼نԽྲྀʹΑΔ఺܈ϞσϦϯά [Yang2020] ֶश࣌ʢΦʔτΤϯίʔμʔʣ ਪ࿦ʢαϯϓϦϯάʣ ఺܈ͷԠ༻ྫ

Slide 41

Slide 41 text

఺܈Λ༻͍ͨଟࢹ఺εςϨΦ [Chen2020] ఺܈ͷԠ༻ྫ

Slide 42

Slide 42 text

఺܈Λ༻͍ͨଟࢹ఺εςϨΦ [Chen2020] CNNʹΑΔ 
 ଟ૚ہॴಛ௃ྔ CNN ૈ͍ਂ౓Ϛοϓ ਖ਼ղ஋ ࢒ࠩ ఺܈্Ͱͷվྑ ਫ਼ࡉͳਂ౓Ϛοϓ ఺܈্Ͱͷ 
 ಛ௃ྔαϯϓϧ ܁Γฦ͠ʹΑΔ࠷దԽ ఺܈ͷԠ༻ྫ

Slide 43

Slide 43 text

఺܈ͷԠ༻ྫ ఺܈Λ༻͍ͨNeRF [Xu2022] ߴਫ਼౓ˍߴ଎ͳֶशΛ࣮ݱ

Slide 44

Slide 44 text

ϝογϡ • CGͰ͸࠷΋Ұൠతͳܗঢ়දݱ 
 →ϨϯμϦϯάΤϯδϯͱͷ૬ੑ΋ྑ͍ • ෳ਺ͷσίʔσΟϯάํ๏͕ଘࡏ͢Δ • Fully Connected (MLP) • Graph Convolution • AtlasNet • ৄࡉදݱͷֶश΍τϙϩδʔมԽ͕ࠔ೉ 3D ϞʔϑΝϒϧϞσϧ [Blanz1998]

Slide 45

Slide 45 text

ϝογϡ Graph Convolution [Ranjan2020] શ݁߹Ͱͳ͘ɺ֊૚తͳܗঢ়ͷֶश͕Ͱ͖ΔͷͰ 
 গͳ͍ύϥϝʔλʔͰΑΓදݱྗͷ͋ΔϞσϧ͕࣮ݱͰ͖Δ

Slide 46

Slide 46 text

ϝογϡ જࡏม਺ શ௖఺ͷू߹ มܗޙͷ3࣍ݩ࠲ඪ Ξτϥε [Groueix2018; Yang2018] MLP z MLP z ैདྷͷܗঢ়දݱ: 
 [Fan2017] f(z) = X, ℝZ → ℝn×3 AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 P ςΫενϟۭؒͷ 
 ೚ҙͷ఺

Slide 47

Slide 47 text

ϝογϡ มܗޙͷ3࣍ݩ࠲ඪ Ξτϥε [Groueix2018; Yang2018] MLP z P • ܗঢ়શମͷ௖఺࠲ඪͷ෼෍Λֶश͢Δ ୅ΘΓʹɺ֤ฏ໘ͷ“มܗ”ͱֶͯ͠शʂ ˠςΫενϟϚοϐϯάͷཁྖ • ද໘ܗঢ়ͷ࿈ଓੑΛߟྀ • ղ૾౓͕ݻఆ͞Εͳ͘ͳͬͨʂ • ෳ਺ͷΞτϥεΛֶश͢Δ͜ͱͰ 
 τϙϩδʔมԽʹରԠ AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 ςΫενϟۭؒͷ 
 ೚ҙͷ఺

Slide 48

Slide 48 text

ϝογϡ Ξτϥε [Groueix2018; Yang2018] • ܗঢ়શମͷ௖఺࠲ඪͷ෼෍Λֶश͢Δ ୅ΘΓʹɺ֤ฏ໘ͷ“มܗ”ͱֶͯ͠शʂ ˠςΫενϟϚοϐϯάͷཁྖ • ද໘ܗঢ়ͷ࿈ଓੑΛߟྀ • ղ૾౓͕ݻఆ͞Εͳ͘ͳͬͨʂ • ෳ਺ͷΞτϥεΛֶश͢Δ͜ͱͰ 
 τϙϩδʔมԽʹରԠ

Slide 49

Slide 49 text

ϝογϡ Ξτϥε [Groueix2018; Yang2018] • ܗঢ়શମͷ௖఺࠲ඪͷ෼෍Λֶश͢Δ ୅ΘΓʹɺ֤ฏ໘ͷ“มܗ”ͱֶͯ͠शʂ ˠςΫενϟϚοϐϯάͷཁྖ • ද໘ܗঢ়ͷ࿈ଓੑΛߟྀ • ղ૾౓͕ݻఆ͞Εͳ͘ͳͬͨʂ • ෳ਺ͷΞτϥεΛֶश͢Δ͜ͱͰ 
 τϙϩδʔมԽʹରԠ

Slide 50

Slide 50 text

ϝογϡʗΞτϥε Ԡ༻ྫɿϦΪϯάΛߟྀͨ͠Ξτϥε܈ʹΑΔணҥΞόλʔ[Ma2021]

Slide 51

Slide 51 text

ϝογϡʗΞτϥε [Ma2021] Ԡ༻ྫɿϦΪϯάΛߟྀͨ͠Ξτϥε܈ʹΑΔணҥΞόλʔ[Ma2021]

Slide 52

Slide 52 text

• 3࣍ݩܗঢ়Λؔ਺஋ͷϨϕϧηοτͰදݱ • Occupancy • SDF/TSDF • ϘΫηϧͱҧ͍ղ૾౓ͷ੍໿͕ͳ͍ • ֶशϕʔεͷ3࣍ݩ෮ݩʹ͓͚Δ 
 େ͖ͳϒϨΠΫεϧʔ • ϝογϡ౳ͷཅతͳܗঢ়நग़ͷͨΊʹ͸ ϚʔνϯΩϡʔϒ๏͕ඞཁ f(x, y, z) := x2 + y2 + z2 − r2 χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ Image credit [Mescheder2019]

Slide 53

Slide 53 text

มܗޙͷ3࣍ݩ࠲ඪ MLP z P Neural Implicit: f(z, P) = SDF, ℝZ × ℝ3 → ℝ MLP z P ςΫενϟۭؒͷ 
 ೚ҙͷ఺ AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 3࣍ݩ্ͷ 
 ೚ҙͷ఺ ࢀর఺ͷ 
 ූ߸෇͖ڑ཭ؔ਺ Neural Implicit [Chen/Park/Mescheder2019] χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Slide 54

Slide 54 text

Neural Implicit [Chen/Park/Mescheder2019] χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Slide 55

Slide 55 text

ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] RC • ࡉ෦ͷσΟςʔϧ͕ࣦΘΕͨΓɺଟ༷ͳܗঢ়ͷόϦΤʔγϣϯʹରԠͰ͖ͳ͍ • ෳ਺ࢹ఺ͷը૾Λ੔߹ੑΛอͬͨ··౷߹͢Δ͜ͱ͕ࠔ೉ େҬతͳΤϯίʔσΟϯά MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Slide 56

Slide 56 text

େҬతͳΤϯίʔσΟϯά • ࡉ෦ͷσΟςʔϧ͕ࣦΘΕͨΓɺଟ༷ͳܗঢ়ͷόϦΤʔγϣϯʹରԠͰ͖ͳ͍ • ෳ਺ࢹ఺ͷը૾Λ੔߹ੑΛอͬͨ··౷߹͢Δ͜ͱ͕ࠔ೉ RC ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Slide 57

Slide 57 text

RW×H×C ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] • ہॴతͳը૾ಛ௃ྔΛ࢖͏͜ͱͰɺগͳ͍σʔλ͔ΒͰ΋ߴਫ਼౓ͳ෮ݩΛ࣮ݱ • 3࣍ݩ্ۭؒͰಛ௃Λ౷߹Ͱ͖ΔͷͰ೚ҙͷೖྗࢹ఺ʹରԠ͕Մೳ ըૉϨϕϧͰͷΤϯίʔσΟϯά MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Slide 58

Slide 58 text

[Saito2019]

Slide 59

Slide 59 text

PIFuHD [Saito2020] PIFu [Saito2019]

Slide 60

Slide 60 text

[Saito2020]

Slide 61

Slide 61 text

σίʔμʔ:ܗঢ়දݱ·ͱΊ ఺܈ ϝογϡ ϘΫηϧ χϡʔϥϧ৔ ղ૾౓ ✅/❌ ✅ ❌ ✅ τϙϩδʔ ✅ ✅/❌ ✅ ✅ εϐʔυ ✅ ✅ ✅/❌ ❌ ϨϯμϦϯά ❌ ✅ ✅/❌ ✅ • ΫΦϦςΟˠχϡʔϥϧ৔ • ܗঢ়มԽͷগͳ͍υϝΠϯʢྫɿإʣˠϝογϡ • ࠓޙͷτϨϯυɿϋΠϒϦουදݱʢྫɿ఺܈×χϡʔϥϧ৔ʣ

Slide 62

Slide 62 text

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ग़ྗσʔλɾσίʔμʔ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

Slide 63

Slide 63 text

• ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ग़ྗσʔλ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ ग़ྗσʔλɾσίʔμʔ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ଛࣦؔ਺ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

Slide 64

Slide 64 text

ଛࣦؔ਺ɿڭࢣ͋Γֶश • ໨ඪܗঢ়ٴͼͦͷରԠ͕༩͑ΒΕ͍ͯΔ৔߹͸ɺσίʔμʔͷग़ྗ ݁Ռͱਖ਼ղ஋ͷޡࠩΛଛࣦؔ਺ʹͰ͖Δ • ܗঢ়͸͋Δ͕ରԠ͕༩͑ΒΕ͍ͯͳ͍৔߹ • ྫɿChamfer Distance

Slide 65

Slide 65 text

ଛࣦؔ਺ɿٯϨϯμϦϯά • ਖ਼ղܗঢ়͕༩͑ΒΕͳ͍৔߹ɺ 
 ը૾܈͔ΒٯϨϯμϦϯά໰୊Λղ͘͜ͱΛߟ͑Δ • ֤ܗঢ়දݱʹର͠ɺ༷ʑͳඍ෼ՄೳϨϯμϥ͕ଘࡏ • ఺܈ →Pulser [Lassner2021]ͳͲ • ϘΫηϧˠPTN [Yan2016]ͳͲ • ϝογϡˠOpenDR [Loper2014], NMR [Kato2019], Softras [Liu2019a]ͳͲ • ӄؔ਺ˠ[Liu2019b], IDR [Yariv2020], NeRF [Mildenhall2020]ͳͲ ϝογϡʹ͓͚ΔٯϨϯμϦϯά[Kato2018]

Slide 66

Slide 66 text

• ਖ਼ଇԽ߲Λ૊Έ߹ΘͤΔ͜ͱͰܗঢ়ʹ੍໿Λ͔͚Δ͜ͱ͕Ͱ͖Δ • Ill-posedͳ໰୊ઃఆͰ͸ಛʹ༗ޮ ଛࣦؔ਺ɿਖ਼ଇԽ߲ ଌ஍ઢ੍໿ʢLIMP [Cosmo2020]) ӄؔ਺ͷද໘๏ઢͷLpϊϧϜͷ૯࿨Λ੍໿߲ʹ 
 [Liu2019b]

Slide 67

Slide 67 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 xs xc

Slide 68

Slide 68 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 LBS xs xc xp

Slide 69

Slide 69 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 LBS xs xc xp ಉ͡ܗঢ়ʹҰக͢Δ͸ͣ xs = LBS(LBS−1(xs))

Slide 70

Slide 70 text

[Saito2021]

Slide 71

Slide 71 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ χϡʔϥϧ৔ͷϦϓγοπ࿈ଓਖ਼نԽ [Liu2022] τϨϯυᶃɿதؒ૚ͷਖ਼ଇԽ

Slide 72

Slide 72 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ ޯ഑ͷϥϓϥγΞϯਖ਼ଇԽ [Nicolet2021] τϨϯυᶄɿޯ഑ͷਖ਼ଇԽ

Slide 73

Slide 73 text

ଛࣦؔ਺ɿਖ਼ଇԽ߲ ճసಉมͳOptimizerʢVectorAdam [Ling2022]ʣ τϨϯυᶅɿOptimizerͷਖ਼ଇԽ

Slide 74

Slide 74 text

ϑϨʔϜϫʔΫͰΈΔ୯؟෮ݩ

Slide 75

Slide 75 text

• ୯؟ը૾ • ϘΫηϧ • ఺܈ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ୯؟෮ݩ૲૑ظ [Wu2015] [Fan2017] • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ)

Slide 76

Slide 76 text

• ϝογϡ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ਖ਼ଇԽ • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ϝογϡදݱͷ୆಄ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ Pixel2Mesh [Wang2018]

Slide 77

Slide 77 text

• ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ϘΫηϧ • ϝογϡ • ఺܈ • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ඍ෼ՄೳϨϯμϦϯάͷ༂ਐ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ఺܈ [Wang2019] 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ϘΫηϧ [Yan2016] ϝογϡ [Kato2018]

Slide 78

Slide 78 text

• ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • χϡʔϥϧ৔ 
 (ӄؔ਺ද໘) • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ৔େരൃ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ DeepSDF [Park2019] Occupancy Networks 
 [Mescheder2019] IM-Net [Chen2019]

Slide 79

Slide 79 text

• ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • χϡʔϥϧ৔ 
 (ӄؔ਺ද໘) • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ہॴχϡʔϥϧ৔ʹΑΔ൚Խੑೳ޲্ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN 
 (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ PIFu [Saito2019]

Slide 80

Slide 80 text

• χϡʔϥϧ৔ 
 (NeRF) • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ৔ɼඍ෼ՄೳϨϯμϦϯάͱग़ձ͏ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN 
 (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) PixelNeRF [Yu2021]

Slide 81

Slide 81 text

• χϡʔϥϧ৔ 
 (NeRF) • ୯؟ը૾ 2D CNN 
 (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ہॴಛ௃ྔͷઌ΁ 2D CNN 
 (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN 
 (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN 
 (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) ViT 
 (ඇہॴಛ௃ʣ ViT-NeRF [Lin2022]

Slide 82

Slide 82 text

·ͱΊ • ֤σʔλදݱͷಛੑΛཧղ͠ ΤϯίʔμʔɺσίʔμʔΛσβΠϯ͢Δ • ਖ਼ղܗঢ়ͷ༗ແΛߟྀ͠ɺద੾ʹଛࣦؔ਺Λఆٛ͢Δ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ • ڭࢣ͋Γֶश 
 (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ

Slide 83

Slide 83 text

Ҿ༻Ϧετᶃ • [Blanz1999] Blanz, Volker, and Thomas Vetter. "A morphable model for the synthesis of 3D faces." Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 1999. • [Chen2019] Chen, Zhiqin, and Hao Zhang. "Learning implicit fields for generative shape modeling." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Choy2016] Choy, Christopher B., et al. "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction." European conference on computer vision. Springer, Cham, 2016. • [Cosmo2020] Cosmo, Luca, et al. "Limp: Learning latent shape representations with metric preservation priors." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. • [Dai2020] Dai, Angela, Christian Diller, and Matthias Nießner. "Sg-nn: Sparse generative neural networks for self-supervised scene completion of rgb-d scans." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. • [Dosovitskiy2021] Alexey Dosovitskiy et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. • [Furukawa2009] Furukawa, Yasutaka, et al. "Manhattan-world stereo." 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009. • [Fan2017] Fan, Haoqiang, Hao Su, and Leonidas J. Guibas. "A point set generation network for 3d object reconstruction from a single image." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Graham2017] Graham, Benjamin, and Laurens van der Maaten. "Submanifold sparse convolutional networks." arXiv preprint arXiv:1706.01307 (2017). • [Groueix2018] Groueix, Thibault, et al. "A papier-mâché approach to learning 3d surface generation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. • [He2016] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. • [Jackson2017] Jackson, Aaron S., et al. "Large pose 3D face reconstruction from a single image via direct volumetric CNN regression." Proceedings of the IEEE International Conference on Computer Vision. 2017. • [Kato2018] Kato, Hiroharu, Yoshitaka Ushiku, and Tatsuya Harada. "Neural 3d mesh renderer." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. • [Lassner2021] Lassner, Christoph, and Michael Zollhofer. "Pulsar: Efficient Sphere-based Neural Rendering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

Slide 84

Slide 84 text

Ҿ༻Ϧετᶄ • [Lin2022] Kai-En Lin, Lin Yen-Chen, Wei-Sheng Lai, Tsung-Yi Lin, Yi-Chang Shih, and Ravi Ramamoorthi. Vision transformer for nerf-based view synthesis from a single input image. arXiv preprint arXiv:2207.05736, 2022. • [Ling2022] Selena Ling, Nicholas Sharp, and Alec Jacobson. Vectoradam for rotation equiv- ariant geometry optimization. arXiv preprint arXiv:2205.13599, 2022. • [Liu2019a] Liu, Shichen, et al. "Soft rasterizer: A differentiable renderer for image-based 3d reasoning." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. • [Liu2019b] Liu, Shichen, et al. "Learning to infer implicit surfaces without 3d supervision." NeurIPS 2019. • [Liu2022] Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, and Or Litany. Learning smooth neural functions via lipschitz regularization. SIGGRAPH, 2022. • [Loper2014] Loper, Matthew M., and Michael J. Black. "OpenDR: An approximate differentiable renderer." European Conference on Computer Vision. Springer, Cham, 2014. • [Ma2021] Ma, Qianli, et al. "SCALE: Modeling clothed humans with a surface codec of articulated local elements." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Maturana2015] Maturana, Daniel, and Sebastian Scherer. "Voxnet: A 3d convolutional neural network for real-time object recognition." 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015. • [Mescheder2019] Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Mildenhall2020] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020. • [Miangoleh2021] Miangoleh, S. Mahdi H., et al. "Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Mueller2022] Thomas Mueller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022. • [Newell2016] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European conference on computer vision. Springer, Cham, 2016. • [Nicolet2021] Baptiste Nicolet, Alec Jacobson, and Wenzel Jakob. Large steps in inverse rendering of geometry. ACM Transactions on Graphics (TOG), Vol. 40, No. 6, pp. 1–13, 2021. • [Park2019] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Peng2020] Peng, Songyou, et al. "Convolutional occupancy networks." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. • [Qi2016] Qi, Charles R., et al. "Volumetric and multi-view cnns for object classification on 3d data." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Slide 85

Slide 85 text

Ҿ༻Ϧετᶅ • [Qi2017] Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Qi2017b] Qi, Charles R., et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space." arXiv preprint arXiv:1706.02413 (2017) • [Ranjan2018] Ranjan, Anurag, et al. "Generating 3D faces using convolutional mesh autoencoders." Proceedings of the European Conference on Computer Vision (ECCV). 2018. • [Riegler2017] Riegler, Gernot, Ali Osman Ulusoy, and Andreas Geiger. "Octnet: Learning deep 3d representations at high resolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Saito2018] Saito, Shunsuke, et al. "3D hair synthesis using volumetric variational autoencoders." ACM Transactions on Graphics (TOG) 37.6 (2018): 1-12. • [Saito2019] Saito, Shunsuke, et al. "Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. • [Saito2020] Saito, Shunsuke, et al. "Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. • [Saito2021] Saito, Shunsuke, et al. "SCANimate: Weakly supervised learning of skinned clothed avatar networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Simonyan2014] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). • [Tancik2020] Tancik, Matthew, et al. "Fourier features let networks learn high frequency functions in low dimensional domains." arXiv preprint arXiv:2006.10739 (2020). • [Yan2016] Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. Perspec- tive transformer nets: Learning single-view 3d object reconstruction without 3d supervision. Advances in neural information processing systems, Vol. 29, , 2016. • [Yariv2020] Yariv, Lior, et al. "Multiview neural surface reconstruction by disentangling geometry and appearance." arXiv preprint arXiv:2003.09852 (2020). • [Yao2018] Yao, Yao, et al. "Mvsnet: Depth inference for unstructured multi-view stereo." Proceedings of the European Conference on Computer Vision (ECCV). 2018. • [Yan2016] Yan, Xinchen, et al. "Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision." arXiv preprint arXiv:1612.00814 (2016). • [Yang2018] Yang, Yaoqing, et al. "Foldingnet: Point cloud auto-encoder via deep grid deformation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. • [Yu2021] Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neu- ral radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. • [Wang2018] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV), pp. 52– 67, 2018. • [Wang2019] Wang Yifan, Felice Serena, Shihao Wu, Cengiz O ̈ztireli, and Olga Sorkine- Hornung. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), Vol. 38, No. 6, pp. 1–14, 2019. • [Wu2015] Wu, Zhirong, et al. "3d shapenets: A deep representation for volumetric shapes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.