OpenTalks.AI - Виктор Лемпицкий, Моделирование 3Д сцен: новые подходы в 2020 году

3D Scene Modeling with AI: What was new in 2020
Victor Lempitsky, Samsung AI Center Moscow Skolkovo Institute of Science and Technology (Skoltech)

3D scene modeling Input photographs/frames + camera parameters Scene representation
/ model New view of the scene image-based modeling rendering / new view synthesis

Classic pipeline Input photographs/frames + camera parameters Mesh(es)+Texture(s) Pros: highly
optimized and widely supported rendering Cons: modeling is tough / brittle. Modeling some geometry and photometry is difficult 3D reconstruction pipeline Graphics rendering engine New view of the scene

Neural rendering approach Differentiable Rendering Differentiable representation Pros: • Higher
modeling power, better realism • Use of sophisticated losses (perceptual, adversarial) Cons: • Differentiable rendering can be (very) slow • Overfitting is often a problem, weaker inductive prior Rendered view Ground truth Loss

Neural Radiance Fields (NeRF) [Middenhall et al. ECCV 2020]

Neural Radiance Fields (NeRF) [Middenhall et al. ECCV 2020] •
Positional encoding is used to facilitate high-frequency details • The angle parameters are input only to the last layer of the network • “Coarse” network is learned alongside the main network to facilitate faster approximate integration • Still dozens of seconds for a VGA image

Neural Radiance Fields (NeRF) [Middenhall et al. ECCV 2020]

Neural Radiance Fields (NeRF) [Middenhall et al. ECCV 2020] Color
rendering Depth

Deformable NeRF [Park et al. Arxiv 2020]

Deformable NeRF [Park et al. Arxiv 2020] • The deformation
field is parameterized by rotation (center+quaternion) and translation • Strong deformations are penalized • Points recovered by SfM are pinpointed to stay put • Blurry frames are removed from the training set

Neural Sparse Voxel Fields [Liu et al. NeurIPS 2020] •
Geometry is approximated explicitly by an octree • Perceptron is sampled at ray-octree intersections • Training includes several refine-and-prune stages • Order of magnitude speedup over NERF (still not real-time)

Neural Sparse Voxel Fields [Liu et al. NeurIPS 2020]

Deferred neural rendering • Scene = Mesh geometry + neural
texture • Neural rendering network is used as the last stage of the rendering pipeline • Realistic images are generated even for coarse geometry [Thies et al. ACM ToG 2019]

Deferred neural rendering • Realistic images are generated even for
coarse geometry [Thies et al. ACM ToG 2019]

Neural dressing model Neural texture SMPL-X body model [Pavlakos et
al. 2019] [Iskakov et al. 2020]

Fullbody avatars with neural textures

Stable view synthesis Delaunay-based 3D surface reconstruction [Riegler & Koltun
2020]

Stable view synthesis vs NeRF [Riegler & Koltun 2020]

RGB views and reconstructed Point Cloud RGB Depth Point Cloud
Neural Point-Based Graphics [Aliev et al. ECCV2020]

p1 positions descriptors p2 pN d1 d2 dN points …
… rasterizer + z-buffer … Raw images Rendering network … … Result [Aliev et al. ECCV2020] Neural Point-Based Graphics

Neural Point-Based Graphics

Mesh-based vs Point-Based Deferred Neural Rendering (mesh-based) NPBG (point-based) Nearest
Train

Relightable 3D portraits [Sevastopolsky et al. 2020]

Relightable 3D portraits z-buffer Lighting model Relighted view Albedo Normals
Room light Mask Point cloud+ descriptors Neural rendering [Sevastopolsky et al. 2020]

Relightable 3D portraits From fixed viewpoint Simultaneous relighting & view
interpolation

So far: training/fitting individual scenes Differentiable Rendering Differentiable representation Rendered
view Ground truth Loss …. Multiple training views

Few-shot neural reconstruction Differentiable Rendering Differentiable representation Rendered holdout view
hold-out view Loss Encoding/reconstructing neural net • Training is performed on a dataset of scenes (tuples of views) • New scenes can be reconstructed from few views (or a single view)

SynSin system • One of several recent systems for single-view
3D modeling • Uses point-based geometric proxy [Wiles CVPR 2020]

SynSin system [Wiles CVPR 2020] • Splatting is used to
provide gradients over point locations in 2D • Alpha-over compositing of K closest points to make z- buffer differentiable

• Differentiable rendering • Differentiable structure-and-motion • Supported representations: •
Point clouds • Textured meshes • NeRFs [Ravi et al. 2020]

[Laine et al. ToG 2020]

Stereo magnification [Zhou et al. SIGGRAPH 2018]

Immersive Lightfield Video [Broxton et al. SIGGRAPH 2020] https://augmentedperception.github.io/deepviewvideo/

Immersive Lightfield Video [Broxton et al. SIGGRAPH 2020]

Recap • Various neural scene representations are developing: • Perceptron
(NeRF) • Mesh + neural texture • Point cloud + neural descriptors • Layered semi-transparent meshes • Differentiable renderers (PyTorch3D, nvdiffrast) make integration of neural networks and graphics easier • Scene fitting and few-shot reconstruction are both actively developing

References Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T.
Barron, Ravi Ramamoorthi, Ren Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV (1) 2020: 405-421 Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Steven M. Seitz, Ricardo Martin-Brualla: Deformable Neural Radiance Fields. CoRR abs/2011.12948 (2020) Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, Christian Theobalt: Neural Sparse Voxel Fields. NeurIPS 2020 Justus Thies, Michael Zollhöfer, Matthias Nießner: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. 38(4): 66:1-66:12 (2019) Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black: Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. CVPR 2019: 10975-10985 Gernot Riegler, Vladlen Koltun: Stable View Synthesis. CoRR abs/2011.07233 (2020) Kara-Ali Aliev, Artem Sevastopolsky, Maria Kolos, Dmitry Ulyanov, Victor S. Lempitsky: Neural Point-Based Graphics. ECCV (22) 2020: 696-712

References Artem Sevastopolsky, Savva Ignatiev, Gonzalo Ferrer, Evgeny Burnaev, Victor
Lempitsky: Relightable 3D Head Portraits from a Smartphone Video. CoRRabs/2012.09963 (2020) Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson: SynSin: End-to-End View Synthesis From a Single Image. CVPR 2020: 7465-7475 Nikhila Ravi, Jeremy Reizenstein, David Novotný, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari: Accelerating 3D Deep Learning with PyTorch3D. CoRR abs/2007.08501 (2020) Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila: Modular primitives for high-performance differentiable rendering. ACM Trans. Graph. 39(6): 194:1-194:14 (2020) Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, Noah Snavely: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37(4): 65:1-65:12 (2018) Michael Broxton, John Flynn, Ryan S. Overbeck, Daniel Erickson, Peter Hedman, Matthew DuVall, Jason Dourgarian, Jay Busch, Matt Whalen, Paul E. Debevec: Immersive light field video with a layered mesh representation. ACM Trans. Graph. 39(4): 86 (2020)

OpenTalks.AI - Виктор Лемпицкий, Моделирование ...

OpenTalks.AI - Виктор Лемпицкий, Моделирование 3Д сцен: новые подходы в 2020 году

More Decks by OpenTalks.AI

Other Decks in Business

Featured

Transcript