ニューラル3次元復元入門

χϡʔϥϧ3࣍ݩ෮ݩೖ໳ ੪౻ ൏հ ୈ188ճCGɾୈ32ճDCCɾୈ231ճCVIM߹ಉݚڀൃදձ

੪౻ ൏հʢ͍͞ͱ͏ ͠ΎΜ͚͢ʣ • ϖϯγϧόχΞେֶ٬һݚڀһ (2014-2015) • ೆΧϦϑΥϧχΞେֶ PhD (2015-2020)
• Πϯλʔϯ: FAIR, FRL, Adobe,ϚοΫεϓϥϯΫݚڀॴͳͲ • Reality Labs Research ݚڀһ (2020-) Computational Body Building   (SIGGRAPH 2015) PIFu/PIFuHD   (ICCV 2019, CVPR 2020) SCANimate   (CVPR 2021)

ΰʔϧ • χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫͷཧղ • ֤࠷৽ݚڀΛϑϨʔϜϫʔΫʹ౰ͯ͸ΊΒΕΔ • ֤ݚڀྖҬͷτϨϯυΛ཈͑Δ ͜ͷνϡʔτϦΞϧʹ͍ͭͯ

• Hand-craftedͳࣄલ෼෍͕ෆཁ     • σʔλͦͷ΋ͷ͔Βෳࡶͳࣄલ෼෍ΛಘΔ͜ͱ͕Ͱ͖Δ ͳͥσʔλυϦϒϯͳ3࣍ݩ෮ݩʁ PIFuHD [Saito2020] ϚϯϋολϯϫʔϧυԾઆ
[Furukawa2009]

χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ •
ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ

χϡʔϥϧࡾ࣍ݩ෮ݩͷϑϨʔϜϫʔΫ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ •
χϡʔϥϧ৔ σίʔμʔ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ

• Ԡ༻ઌʹΑܾͬͯ·Δ͜ͱ͕ଟ͍ • ྫɿखܰͳ3࣍ݩ෮ݩˠ୯؟ը૾͕ೖྗ • ೖྗɿը૾σʔλͱʢ෦෼తͳʣ3࣍ݩσʔλʹ෼ྨͰ͖Δ • ద੾ͳΤϯίʔμʔΛબ୒͢Δඞཁ͕͋Δ • SOTAͷΞʔΩςΫνϟΛબ୒͢Δͷ͕جຊ
ೖྗσʔλʹ͍ͭͯ

• େҬಛ௃ʢશମܗঢ়ΛϕΫτϧͰදݱʣ • ΧςΰϦ಺ͷܗঢ়ͷྨࣅੑ͕ߴ͍ • ڧ੍͍໿ΛՃ͍͑ͨʢະ؍ଌͷ෦෼͕େ͖͍৔߹ͳͲʣ • ҙຯ্ͷฤूΛߦ͍͍ͨ • ہॴಛ௃ʢۭؒํ޲ͷ޿͕ΓΛอ࣋ͨ͠ಛ௃ྔʣ
• ֶशσʔλ͕ݶΒΕ͍ͯΔ • ਫ਼ࡉͳܗঢ়Λ෮ݩ͍ͨ͠ • ہॴతͳฤूΛՃ͍͑ͨ େҬಛ௃vsہॴಛ௃

• ը૾σʔλͷ৔߹͸࠷৽ͷը૾ΤϯίʔμʔΛ࢖͏ͷ͕جຊ   e.g., VGG[Simonyan2014], ResNet[He2016], Hourglass[Newell2016] • λεΫʹԠͯ͡ޮՌΛൃش͢ΔΞʔΩςΫνϟ͕ҧ͏͜ͱ΋͋Δ  
e.g., Hourglass →ϙʔζਪఆɺVGG→ը෩సࣸ Τϯίʔμʔɿ୯؟ը૾ɺਂ౓෇͖ը૾

• τϨϯυɿඇہॴతͳΤϯίʔμʔʢViT [Dosovitskiy2021]ͳͲʣ Τϯίʔμʔɿ୯؟ը૾ɺਂ౓෇͖ը૾ Lin [Lin2022]

• Χϝϥύϥϝʔλ͕ط஌ͷ৔߹ɺ   زԿతؔ܎ΛωοτϫʔΫʹ૊ΈࠐΉ • ྫɿϗϞάϥϑΟʔ [Yao2018] Τϯίʔμʔɿෳ਺ࢹ఺ը૾ https://medium.com/@NegativeMind//2d-3d෮ݩٕज़Ͱ࢖ΘΕΔ༻ޠ·ͱΊ-27403689da1b

• Kinect΍LiDARͳͲ͔ΒಘΒΕΔೖྗ͕ओ • ը૾΍ϝογϡͱҟͳΓɺ఺܈͸௖఺਺͕มಈͨ͠Γॱং͕ͳ͍ • ఺܈ɾεΩϟϯͷಛੑʹରԠͨ͠ΞʔΩςΫνϟ͕ඞཁʹͳͬͯ͘Δ Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a] https://github.com/ThibaultGROUEIX/AtlasNet/blob/master/model/model_blocks.py x: ೖྗಛ௃ྔʢ௖఺࠲ඪɺ๏ઢͳͲʣ MLPͰ֤఺ͷxΛજࡏม਺ʹม׵ ֤఺ͷજࡏม਺Λmax poolingͰ౷߹ ౷߹͞Εͨજࡏม਺ʹ
  ͞ΒʹMLPΛ͔͚ͯ   ࠷ऴతͳಛ௃ྔΛಘΔ

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet [Qi2017a]ͷ໰୊఺ • શମͷಛ௃͕̍ճͷMax poolingͰ౷߹ˠ֊૚తͳߏ଄ཧղ͕ࠔ೉ • ֊૚తͳMax poolingͷಋೖ (PointNet++
[Qi2017b])

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ Sparse Convolution 3D Convolution [Wu2015]: O(kdmn)   ϧʔϜαΠζͷεΩϟϯʹద༻ෆՄ Sparse
3D Convolution [Graham2017]   େن໛γʔϯͷεΩϟϯ͕ॲཧՄೳʹ

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ Ԡ༻ྫɿSparse Convolution େن໛ͳεΩϟϯͷิ׬ [Dai2020]

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ PointNet + 2D Convolutions [Peng2020] ఺܈ΛPointNetͰॲཧ͠ಛ௃ۭؒʹϚοϐϯάͨ͠ͷͪ 2࣍ݩฏ໘܈ʢTri-plane)ʹసࣸͯ͠৞ΈࠐΈωοτϫʔΫͰॲཧ

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ τϨϯυᶃɿ3࣍ݩੜ੒ϞσϧͷͨΊͷTri-planeදݱ EG3D [Chan2022]

Τϯίʔμʔɿ఺܈ɺεΩϟϯσʔλ τϨϯυᶄɿճసෆมɾಉมΤϯίʔμʔ Vector Neurons [Deng2022] ௨ৗͷશ݁߹૚ εΧϥʔ Vector Neurons 3࣍ݩϕΫτϧ

Τϯίʔμʔʹ͓͚Δࠓޙͷ՝୊ ߴղ૾౓ɾಈత෺ମ΁ͷରԠ

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ
• ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ γʔϯಛԽܕͷ3࣍ݩ෮ݩ

Instant-NGP [Mueller2022] ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ τϨϯυᶃɿσʔλߏ଄ͷվળʹΑΔ࠷దԽʹΑΔߴ଎෮ݩ

Nerfies [Park2021] ΤϯίʔμʔϨεࡾ࣍ݩ෮ݩ τϨϯυᶄɿมܗͷಉֶ࣌शˠಈత෺ମ΁ͷରԠ BANMO [Yang2022]

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ
• ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ग़ྗσʔλɾσίʔμʔ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ
ग़ྗσʔλɾσίʔμʔ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ σίʔμʔ ग़ྗσʔλ

ϘΫηϧ • ໨ඪܗঢ়Λ3࣍ݩ֨ࢠঢ়ʹ֨ೲ • Occupancy • ූ߸෇͖ڑ཭ؔ਺ʢSDF) • TSDF •
3D Convolution͕ͦͷ··࢖͑Δ • ϝϞϦ࢖༻ྔ͕ϘτϧωοΫ: O(d3) [Choy2016; Maturana2015; Qi2016; Wu2015] Image credit [Mescheder2019]

ϘΫηϧ Ԡ༻ྫɿ̍ຕը૾͔ΒͷNon-parametricͳ3࣍ݩإ෮ݩ [Jackson2017]

ϘΫηϧ Ԡ༻ྫɿ൅ܕͷύϥϝʔλԽٴͼը૾͔Βͷਪఆ [Saito2018]

[Saito2018]

ϘΫηϧ 8෼໦ߏ଄Λ༻͍ͨޮ཰తͳ3࣍ݩܗঢ়෮ݩ [Reigler2017, Tatarchenko2017]

ਂ౓Ϛοϓ • ը૾ม׵(image-to-image translation)ͷҰछ:   RGB→Depth • ը૾ݚڀͷ࠷৽ٕज़͕Ԡ༻͠΍͍͢   ʢυϝΠϯసҠɺGANͳͲʣ
• ൚Խੑೳ͕ߴ͍Ұํɺ   ΧςΰϦ͝ͱͷਫ਼ࡉͳ෮ݩ͸ෆ޲͖ ̍ຕը૾͔Βͷߴղ૾ͳਂ౓Ϛοϓਪఆ [Miangoleh2021]

ਂ౓Ϛοϓ Ԡ༻ྫɿ֦ࢄϞσϧΛ༻͍ͨଟࢹ఺εςϨΦ [Shao2022]

఺܈ [Fan2017] • ໨ඪܗঢ়Λ௖఺ͷू߹ͱͯ͠දݱ • શ௖఺Λಉ࣌ʹग़ྗ͢ΔΞϓϩʔν͕ओྲྀ • τϙϩδʔͷมԽʹॊೈͰେ͖ͳมܗʹ΋ରԠՄ • ఺܈͔ΒϨϯμϦϯά౳ͷͨΊʹϝογϡԽ͢Δ
͜ͱ͕೉͘͠ߴ඼࣭ͳܗঢ়ग़ྗʹෆ޲͖ Image credit [Mescheder2019]

఺܈ [Fan2017] ը૾͔Βજࡏม਺Λճؼ͢ΔΤϯίʔμʔͱ જࡏม਺͔Β఺܈Λు͖ग़͢σίʔμʔΛֶश͢Δ

఺܈ͷԠ༻ྫ ਖ਼ن෼෍ʹԊͬͯ   αϯϓϧ͞Εͨ఺܈ λʔήοτ3࣍ݩܗঢ় ࿈ଓਖ਼نԽྲྀ ࿈ଓਖ਼نԽྲྀʹΑΔ఺܈ϞσϦϯά [Yang2020]

࿈ଓਖ਼نԽྲྀʹΑΔ఺܈ϞσϦϯά [Yang2020] ֶश࣌ʢΦʔτΤϯίʔμʔʣ ਪ࿦ʢαϯϓϦϯάʣ ఺܈ͷԠ༻ྫ

఺܈Λ༻͍ͨଟࢹ఺εςϨΦ [Chen2020] ఺܈ͷԠ༻ྫ

఺܈Λ༻͍ͨଟࢹ఺εςϨΦ [Chen2020] CNNʹΑΔ   ଟ૚ہॴಛ௃ྔ CNN ૈ͍ਂ౓Ϛοϓ ਖ਼ղ஋ ࢒ࠩ ఺܈্Ͱͷվྑ
ਫ਼ࡉͳਂ౓Ϛοϓ ఺܈্Ͱͷ   ಛ௃ྔαϯϓϧ ܁Γฦ͠ʹΑΔ࠷దԽ ఺܈ͷԠ༻ྫ

఺܈ͷԠ༻ྫ ఺܈Λ༻͍ͨNeRF [Xu2022] ߴਫ਼౓ˍߴ଎ͳֶशΛ࣮ݱ

ϝογϡ • CGͰ͸࠷΋Ұൠతͳܗঢ়දݱ   →ϨϯμϦϯάΤϯδϯͱͷ૬ੑ΋ྑ͍ • ෳ਺ͷσίʔσΟϯάํ๏͕ଘࡏ͢Δ • Fully Connected
(MLP) • Graph Convolution • AtlasNet • ৄࡉදݱͷֶश΍τϙϩδʔมԽ͕ࠔ೉ 3D ϞʔϑΝϒϧϞσϧ [Blanz1998]

ϝογϡ Graph Convolution [Ranjan2020] શ݁߹Ͱͳ͘ɺ֊૚తͳܗঢ়ͷֶश͕Ͱ͖ΔͷͰ   গͳ͍ύϥϝʔλʔͰΑΓදݱྗͷ͋ΔϞσϧ͕࣮ݱͰ͖Δ

ϝογϡ જࡏม਺ શ௖఺ͷू߹ มܗޙͷ3࣍ݩ࠲ඪ Ξτϥε [Groueix2018; Yang2018] MLP z MLP
z ैདྷͷܗঢ়දݱ:   [Fan2017] f(z) = X, ℝZ → ℝn×3 AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 P ςΫενϟۭؒͷ   ೚ҙͷ఺

ϝογϡ มܗޙͷ3࣍ݩ࠲ඪ Ξτϥε [Groueix2018; Yang2018] MLP z P • ܗঢ়શମͷ௖఺࠲ඪͷ෼෍Λֶश͢Δ
୅ΘΓʹɺ֤ฏ໘ͷ“มܗ”ͱֶͯ͠शʂ ˠςΫενϟϚοϐϯάͷཁྖ • ද໘ܗঢ়ͷ࿈ଓੑΛߟྀ • ղ૾౓͕ݻఆ͞Εͳ͘ͳͬͨʂ • ෳ਺ͷΞτϥεΛֶश͢Δ͜ͱͰ   τϙϩδʔมԽʹରԠ AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 ςΫενϟۭؒͷ   ೚ҙͷ఺

ϝογϡ Ξτϥε [Groueix2018; Yang2018] • ܗঢ়શମͷ௖఺࠲ඪͷ෼෍Λֶश͢Δ ୅ΘΓʹɺ֤ฏ໘ͷ“มܗ”ͱֶͯ͠शʂ ˠςΫενϟϚοϐϯάͷཁྖ • ද໘ܗঢ়ͷ࿈ଓੑΛߟྀ
• ղ૾౓͕ݻఆ͞Εͳ͘ͳͬͨʂ • ෳ਺ͷΞτϥεΛֶश͢Δ͜ͱͰ   τϙϩδʔมԽʹରԠ

ϝογϡʗΞτϥε Ԡ༻ྫɿϦΪϯάΛߟྀͨ͠Ξτϥε܈ʹΑΔணҥΞόλʔ[Ma2021]

ϝογϡʗΞτϥε [Ma2021] Ԡ༻ྫɿϦΪϯάΛߟྀͨ͠Ξτϥε܈ʹΑΔணҥΞόλʔ[Ma2021]

• 3࣍ݩܗঢ়Λؔ਺஋ͷϨϕϧηοτͰදݱ • Occupancy • SDF/TSDF • ϘΫηϧͱҧ͍ղ૾౓ͷ੍໿͕ͳ͍ • ֶशϕʔεͷ3࣍ݩ෮ݩʹ͓͚Δ
  େ͖ͳϒϨΠΫεϧʔ • ϝογϡ౳ͷཅతͳܗঢ়நग़ͷͨΊʹ͸ ϚʔνϯΩϡʔϒ๏͕ඞཁ f(x, y, z) := x2 + y2 + z2 − r2 χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ Image credit [Mescheder2019]

มܗޙͷ3࣍ݩ࠲ඪ MLP z P Neural Implicit: f(z, P) = SDF,
ℝZ × ℝ3 → ℝ MLP z P ςΫενϟۭؒͷ   ೚ҙͷ఺ AtlasNet: f(z, P) = p, ℝZ × ℝ2 → ℝ3 3࣍ݩ্ͷ   ೚ҙͷ఺ ࢀর఺ͷ   ූ߸෇͖ڑ཭ؔ਺ Neural Implicit [Chen/Park/Mescheder2019] χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

Neural Implicit [Chen/Park/Mescheder2019] χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] RC • ࡉ෦ͷσΟςʔϧ͕ࣦΘΕͨΓɺଟ༷ͳܗঢ়ͷόϦΤʔγϣϯʹରԠͰ͖ͳ͍ • ෳ਺ࢹ఺ͷը૾Λ੔߹ੑΛอͬͨ··౷߹͢Δ͜ͱ͕ࠔ೉ େҬతͳΤϯίʔσΟϯά MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

େҬతͳΤϯίʔσΟϯά • ࡉ෦ͷσΟςʔϧ͕ࣦΘΕͨΓɺଟ༷ͳܗঢ়ͷόϦΤʔγϣϯʹରԠͰ͖ͳ͍ • ෳ਺ࢹ఺ͷը૾Λ੔߹ੑΛอͬͨ··౷߹͢Δ͜ͱ͕ࠔ೉ RC ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

RW×H×C ըૉ୯Ґͷӄؔ਺දݱʢPIFu) [Saito2019/2020] • ہॴతͳը૾ಛ௃ྔΛ࢖͏͜ͱͰɺগͳ͍σʔλ͔ΒͰ΋ߴਫ਼౓ͳ෮ݩΛ࣮ݱ • 3࣍ݩ্ۭؒͰಛ௃Λ౷߹Ͱ͖ΔͷͰ೚ҙͷೖྗࢹ఺ʹରԠ͕Մೳ ըૉϨϕϧͰͷΤϯίʔσΟϯά MLP χϡʔϥϧ৔ʢӄؔ਺ۂ໘ʣ

[Saito2019]

PIFuHD [Saito2020] PIFu [Saito2019]

[Saito2020]

σίʔμʔ:ܗঢ়දݱ·ͱΊ ఺܈ ϝογϡ ϘΫηϧ χϡʔϥϧ৔ ղ૾౓ ✅/❌ ✅ ❌ ✅
τϙϩδʔ ✅ ✅/❌ ✅ ✅ εϐʔυ ✅ ✅ ✅/❌ ❌ ϨϯμϦϯά ❌ ✅ ✅/❌ ✅ • ΫΦϦςΟˠχϡʔϥϧ৔ • ܗঢ়มԽͷগͳ͍υϝΠϯʢྫɿإʣˠϝογϡ • ࠓޙͷτϨϯυɿϋΠϒϦουදݱʢྫɿ఺܈×χϡʔϥϧ৔ʣ

• ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ
• ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ग़ྗσʔλɾσίʔμʔ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

• ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔
σίʔμʔ ग़ྗσʔλ • ୯؟ը૾ • ਂ౓෇͖ը૾ • ෳ਺ը૾ • ఺܈ɾεΩϟϯ Τϯίʔμʔ ೖྗσʔλ ग़ྗσʔλɾσίʔμʔ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ ଛࣦؔ਺ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ

ଛࣦؔ਺ɿڭࢣ͋Γֶश • ໨ඪܗঢ়ٴͼͦͷରԠ͕༩͑ΒΕ͍ͯΔ৔߹͸ɺσίʔμʔͷग़ྗ ݁Ռͱਖ਼ղ஋ͷޡࠩΛଛࣦؔ਺ʹͰ͖Δ • ܗঢ়͸͋Δ͕ରԠ͕༩͑ΒΕ͍ͯͳ͍৔߹ • ྫɿChamfer Distance

ଛࣦؔ਺ɿٯϨϯμϦϯά • ਖ਼ղܗঢ়͕༩͑ΒΕͳ͍৔߹ɺ   ը૾܈͔ΒٯϨϯμϦϯά໰୊Λղ͘͜ͱΛߟ͑Δ • ֤ܗঢ়දݱʹର͠ɺ༷ʑͳඍ෼ՄೳϨϯμϥ͕ଘࡏ • ఺܈ →Pulser
[Lassner2021]ͳͲ • ϘΫηϧˠPTN [Yan2016]ͳͲ • ϝογϡˠOpenDR [Loper2014], NMR [Kato2019], Softras [Liu2019a]ͳͲ • ӄؔ਺ˠ[Liu2019b], IDR [Yariv2020], NeRF [Mildenhall2020]ͳͲ ϝογϡʹ͓͚ΔٯϨϯμϦϯά[Kato2018]

• ਖ਼ଇԽ߲Λ૊Έ߹ΘͤΔ͜ͱͰܗঢ়ʹ੍໿Λ͔͚Δ͜ͱ͕Ͱ͖Δ • Ill-posedͳ໰୊ઃఆͰ͸ಛʹ༗ޮ ଛࣦؔ਺ɿਖ਼ଇԽ߲ ଌ஍ઢ੍໿ʢLIMP [Cosmo2020]) ӄؔ਺ͷද໘๏ઢͷLpϊϧϜͷ૯࿨Λ੍໿߲ʹ   [Liu2019b]

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 xs xc

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 LBS xs xc xp

ଛࣦؔ਺ɿਖ਼ଇԽ߲ Ԡ༻ྫɿԁ؀੍໿Λ׆༻ͨ͠4DεΩϟϯ͔ΒͷΞόλʔֶश [Saito2021] LBS−1 LBS xs xc xp ಉ͡ܗঢ়ʹҰக͢Δ͸ͣ xs
= LBS(LBS−1(xs))

[Saito2021]

ଛࣦؔ਺ɿਖ਼ଇԽ߲ χϡʔϥϧ৔ͷϦϓγοπ࿈ଓਖ਼نԽ [Liu2022] τϨϯυᶃɿதؒ૚ͷਖ਼ଇԽ

ଛࣦؔ਺ɿਖ਼ଇԽ߲ ޯ഑ͷϥϓϥγΞϯਖ਼ଇԽ [Nicolet2021] τϨϯυᶄɿޯ഑ͷਖ਼ଇԽ

ଛࣦؔ਺ɿਖ਼ଇԽ߲ ճసಉมͳOptimizerʢVectorAdam [Ling2022]ʣ τϨϯυᶅɿOptimizerͷਖ਼ଇԽ

ϑϨʔϜϫʔΫͰΈΔ୯؟෮ݩ

• ୯؟ը૾ • ϘΫηϧ • ఺܈ 2D CNN   (େҬಛ௃ʣ
3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ୯؟෮ݩ૲૑ظ [Wu2015] [Fan2017] • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ)

• ϝογϡ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ਖ਼ଇԽ • ୯؟ը૾
2D CNN   (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ϝογϡදݱͷ୆಄ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ Pixel2Mesh [Wang2018]

• ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ • ϘΫηϧ • ϝογϡ •
఺܈ • ୯؟ը૾ 2D CNN   (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ඍ෼ՄೳϨϯμϦϯάͷ༂ਐ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ఺܈ [Wang2019] 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ϘΫηϧ [Yan2016] ϝογϡ [Kato2018]

• ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • χϡʔϥϧ৔   (ӄؔ਺ද໘) • ୯؟ը૾
2D CNN   (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ৔େരൃ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ DeepSDF [Park2019] Occupancy Networks   [Mescheder2019] IM-Net [Chen2019]

• ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • χϡʔϥϧ৔   (ӄؔ਺ද໘) • ୯؟ը૾
2D CNN   (େҬಛ௃ʣ 3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ہॴχϡʔϥϧ৔ʹΑΔ൚Խੑೳ޲্ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN   (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ PIFu [Saito2019]

• χϡʔϥϧ৔   (NeRF) • ୯؟ը૾ 2D CNN   (େҬಛ௃ʣ
3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ χϡʔϥϧ৔ɼඍ෼ՄೳϨϯμϦϯάͱग़ձ͏ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN   (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) PixelNeRF [Yu2021]

• χϡʔϥϧ৔   (NeRF) • ୯؟ը૾ 2D CNN   (େҬಛ௃ʣ
3D CNN ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ہॴಛ௃ྔͷઌ΁ 2D CNN   (ہॴಛ௃ʣ Graph Conv. ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ 2D CNN   (େҬಛ௃ʣ MLP ೖྗσʔλ ଛࣦؔ਺ 2D CNN   (ہॴಛ௃ʣ MLP ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) ViT   (ඇہॴಛ௃ʣ ViT-NeRF [Lin2022]

·ͱΊ • ֤σʔλදݱͷಛੑΛཧղ͠ ΤϯίʔμʔɺσίʔμʔΛσβΠϯ͢Δ • ਖ਼ղܗঢ়ͷ༗ແΛߟྀ͠ɺద੾ʹଛࣦؔ਺Λఆٛ͢Δ • ୯؟ը૾ • ਂ౓෇͖ը૾
• ෳ਺ը૾ • ఺܈ɾεΩϟϯ • ϘΫηϧ • ਂ౓Ϛοϓʢ2.5Dʣ • ఺܈ • ϝογϡ • χϡʔϥϧ৔ • ڭࢣ͋Γֶश   (࠶ߏ੒ଛࣦ) • ࣗݾڭࢣ͋Γֶश (ٯϨϯμϦϯά) • ਖ਼ଇԽ Τϯίʔμʔ σίʔμʔ ೖྗσʔλ ग़ྗσʔλ ଛࣦؔ਺ ਪ࿦ ύϥϝʔλʔߋ৽ʢSGDʣ

Ҿ༻Ϧετᶃ • [Blanz1999] Blanz, Volker, and Thomas Vetter. "A morphable
model for the synthesis of 3D faces." Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 1999. • [Chen2019] Chen, Zhiqin, and Hao Zhang. "Learning implicit fields for generative shape modeling." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Choy2016] Choy, Christopher B., et al. "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction." European conference on computer vision. Springer, Cham, 2016. • [Cosmo2020] Cosmo, Luca, et al. "Limp: Learning latent shape representations with metric preservation priors." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. • [Dai2020] Dai, Angela, Christian Diller, and Matthias Nießner. "Sg-nn: Sparse generative neural networks for self-supervised scene completion of rgb-d scans." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. • [Dosovitskiy2021] Alexey Dosovitskiy et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. • [Furukawa2009] Furukawa, Yasutaka, et al. "Manhattan-world stereo." 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009. • [Fan2017] Fan, Haoqiang, Hao Su, and Leonidas J. Guibas. "A point set generation network for 3d object reconstruction from a single image." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Graham2017] Graham, Benjamin, and Laurens van der Maaten. "Submanifold sparse convolutional networks." arXiv preprint arXiv:1706.01307 (2017). • [Groueix2018] Groueix, Thibault, et al. "A papier-mâché approach to learning 3d surface generation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. • [He2016] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. • [Jackson2017] Jackson, Aaron S., et al. "Large pose 3D face reconstruction from a single image via direct volumetric CNN regression." Proceedings of the IEEE International Conference on Computer Vision. 2017. • [Kato2018] Kato, Hiroharu, Yoshitaka Ushiku, and Tatsuya Harada. "Neural 3d mesh renderer." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. • [Lassner2021] Lassner, Christoph, and Michael Zollhofer. "Pulsar: Efficient Sphere-based Neural Rendering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

Ҿ༻Ϧετᶄ • [Lin2022] Kai-En Lin, Lin Yen-Chen, Wei-Sheng Lai, Tsung-Yi
Lin, Yi-Chang Shih, and Ravi Ramamoorthi. Vision transformer for nerf-based view synthesis from a single input image. arXiv preprint arXiv:2207.05736, 2022. • [Ling2022] Selena Ling, Nicholas Sharp, and Alec Jacobson. Vectoradam for rotation equiv- ariant geometry optimization. arXiv preprint arXiv:2205.13599, 2022. • [Liu2019a] Liu, Shichen, et al. "Soft rasterizer: A differentiable renderer for image-based 3d reasoning." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. • [Liu2019b] Liu, Shichen, et al. "Learning to infer implicit surfaces without 3d supervision." NeurIPS 2019. • [Liu2022] Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, and Or Litany. Learning smooth neural functions via lipschitz regularization. SIGGRAPH, 2022. • [Loper2014] Loper, Matthew M., and Michael J. Black. "OpenDR: An approximate differentiable renderer." European Conference on Computer Vision. Springer, Cham, 2014. • [Ma2021] Ma, Qianli, et al. "SCALE: Modeling clothed humans with a surface codec of articulated local elements." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Maturana2015] Maturana, Daniel, and Sebastian Scherer. "Voxnet: A 3d convolutional neural network for real-time object recognition." 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015. • [Mescheder2019] Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Mildenhall2020] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020. • [Miangoleh2021] Miangoleh, S. Mahdi H., et al. "Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Mueller2022] Thomas Mueller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022. • [Newell2016] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European conference on computer vision. Springer, Cham, 2016. • [Nicolet2021] Baptiste Nicolet, Alec Jacobson, and Wenzel Jakob. Large steps in inverse rendering of geometry. ACM Transactions on Graphics (TOG), Vol. 40, No. 6, pp. 1–13, 2021. • [Park2019] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • [Peng2020] Peng, Songyou, et al. "Convolutional occupancy networks." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020. • [Qi2016] Qi, Charles R., et al. "Volumetric and multi-view cnns for object classification on 3d data." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Ҿ༻Ϧετᶅ • [Qi2017] Qi, Charles R., et al. "Pointnet: Deep
learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Qi2017b] Qi, Charles R., et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space." arXiv preprint arXiv:1706.02413 (2017) • [Ranjan2018] Ranjan, Anurag, et al. "Generating 3D faces using convolutional mesh autoencoders." Proceedings of the European Conference on Computer Vision (ECCV). 2018. • [Riegler2017] Riegler, Gernot, Ali Osman Ulusoy, and Andreas Geiger. "Octnet: Learning deep 3d representations at high resolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. • [Saito2018] Saito, Shunsuke, et al. "3D hair synthesis using volumetric variational autoencoders." ACM Transactions on Graphics (TOG) 37.6 (2018): 1-12. • [Saito2019] Saito, Shunsuke, et al. "Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. • [Saito2020] Saito, Shunsuke, et al. "Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. • [Saito2021] Saito, Shunsuke, et al. "SCANimate: Weakly supervised learning of skinned clothed avatar networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. • [Simonyan2014] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). • [Tancik2020] Tancik, Matthew, et al. "Fourier features let networks learn high frequency functions in low dimensional domains." arXiv preprint arXiv:2006.10739 (2020). • [Yan2016] Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. Perspec- tive transformer nets: Learning single-view 3d object reconstruction without 3d supervision. Advances in neural information processing systems, Vol. 29, , 2016. • [Yariv2020] Yariv, Lior, et al. "Multiview neural surface reconstruction by disentangling geometry and appearance." arXiv preprint arXiv:2003.09852 (2020). • [Yao2018] Yao, Yao, et al. "Mvsnet: Depth inference for unstructured multi-view stereo." Proceedings of the European Conference on Computer Vision (ECCV). 2018. • [Yan2016] Yan, Xinchen, et al. "Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision." arXiv preprint arXiv:1612.00814 (2016). • [Yang2018] Yang, Yaoqing, et al. "Foldingnet: Point cloud auto-encoder via deep grid deformation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. • [Yu2021] Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neu- ral radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. • [Wang2018] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV), pp. 52– 67, 2018. • [Wang2019] Wang Yifan, Felice Serena, Shihao Wu, Cengiz O ̈ztireli, and Olga Sorkine- Hornung. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), Vol. 38, No. 6, pp. 1–14, 2019. • [Wu2015] Wu, Zhirong, et al. "3d shapenets: A deep representation for volumetric shapes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

ニューラル3次元復元入門

ニューラル3次元復元入門

Other Decks in Research

Featured

Transcript