Slide 1

Slide 1 text

Blowin’ in the Wild Dynamic Looping Gaussians from Still Images Kai Kohyama Mentors: Toru Matsuoka, Sosuke Kobayashi Preferred Networks Internship. September, 2024

Slide 2

Slide 2 text

Motivation (1/2) ● 4D reconstruction is expected to be applied to virtual production ● However, existing 4D reconstruction system requires… ○ Multi-view settings ○ Large video data 2 ▲ Installing multiple cameras is difficult ▲ PFN 3D Scan provides virtual production service 写真は https://pfn3d.com/jlox-2023 https://pfn3d.com/4d より引用

Slide 3

Slide 3 text

Motivation (2/2) ● It’s useful if 4D reconstruction could be done from casually taken monocular images (“in-the-wild” images) ○ Naive 3DGS method tends to output blurry reconstruction ● Looping scene output is even better for virtual production 3 “in-the-wild” swaying scene unknown t, monocular looping 4D Scene Naive 3DGS ☺
 😢


Slide 4

Slide 4 text

Goal | Blowin’ in the Wild ● Project Goal : ○ Reconstruct looping scene of grass & trees swaying in the wind from in-the-wild images ○ ->Blowin’ in the Wild Dynamic Looping Gaussians from Still Images ○ A play on Bob Dylan’s “blowin’ in the wind” 4

Slide 5

Slide 5 text

Comparison 5 Method Input Output Monocular? Image? (not use t ?) Dynamic scene? Looping? 3DGS ⭕ ⭕ ❌ ❌ 4DGS ❌ ❌ ⭕ ❌ E-D3DGS ❌ ❌ ⭕ ❌ GFlow ⭕ ❌ ⭕ ❌ LoopGaussian ⭕ ⭕ △ ⭕ WildGaussians ⭕ ⭕ ❌ ❌ Ours ⭕ ⭕ ⭕ ⭕

Slide 6

Slide 6 text

Preliminary | 3D Gaussian Splatting ● Express the scene as a collection of 3D Gaussian ● Per-gaussian parameters: ○ Position, Rotation, Scale, Opacity, Color (SH coefficients) 6 Optimize Parameters (mean) (variance)

Slide 7

Slide 7 text

Prior Work | In-the-Wild Gaussian Splatting ● WildGaussians (Kulhanek et al.) ○ Add Per-Gaussian Embeddings & Per-Shot Embeddings ○ Use MLP to extend Color representation 7 Shot Embeddings: Learning environmental feature MLP 3DGS Renderer Gaussian Parameters Gaussian Embeddings: Learning spatial feature Scene with lighting changes 図は WildGaussians https://arxiv.org/pdf/2407.08447 より引用

Slide 8

Slide 8 text

Dynamic Scene? Method | In-the-Wild Dynamic Gaussian Splatting ● Add Per-Gaussian Embeddings & Per-Shot Embeddings ● Use MLP to extend Position / Rotation representation 8 MLP 3DGS Renderer Gaussian Parameters Shot Embeddings: Learning temporal feature Gaussian Embeddings: Learning spatial feature Scene with movements

Slide 9

Slide 9 text

Looping Scene Method | Looping Scene Production ● Create a natural looping scene by manipulating shot embeddings ○ Temporal information in shot embeddings is not 1D time, but higher order features 9 Shot Embeddings (M shots x N dim) Reduce feature dimensions to 2D using PCA [0.2, 0.1, -0.5, ..., 0.3], [0.3, 0.2, -0.3, ..., 0.4], [0.5, -0.3, 0.5, ..., 0.2], [-0.3, 0.0, -0.1, ..., 0.1], [-0.0, 0.2, -0.4, ..., 0.5], [0.2, 0.1, -0.5, ..., 0.3], [0.2, 0.1, -0.5, ..., 0.3] N dim 2 dim

Slide 10

Slide 10 text

Method | Motion Bake (1/2) ● Reduce calculation time by skip MLP estimation ● Try 2 ideas: ① Record & interpolate MLP results (2x2 Bilinear) Precalculation Bilinear Interpolation u v

Slide 11

Slide 11 text

2x7 Matrix 2x7 Matrix Method | Motion Bake (2/2) ● Reduce computation time by skip MLP estimation ● Try 2 ideas: MLP PCA-1 2x7 Matrix ② Approximate MLP as per-gaussian linear matrices Embeddings precomputed for 100 sampled (u,v) Approximation by the least-squares method × gaussian num.

Slide 12

Slide 12 text

Experiment 1 | 4D Reconstruction Ability (1/2) ● Test 4D reconstruction on D-NeRF Dataset ○ Monocular setting ○ Continuous shooting time ○ Lighting is static 12 Easier than in-the-wild data 50 continuous images from various angle

Slide 13

Slide 13 text

Experiment 1 | 4D Reconstruction Ability (2/2) ● Results 13 Embeddings successed to capture the movement PSNR↑ SSIM↑ LPIPS↓ Ours 26.77 0.9000 0.0571 4DGS 25.03 0.9376 0.0437 3DGS 23.06 0.9290 0.0642 Quantitative results on “Lego” scene

Slide 14

Slide 14 text

Experiment 2 | Test on In-the-Wild Scene (1/2) ● Test the idea on In-the-Wild data ○ Monocular setting ○ Shooting time is unknown and irregular ○ Lighting is nearly static “rose” scene contains 34 shots

Slide 15

Slide 15 text

Experiment 2 | Test on In-the-Wild Scene (2/2) ● Results Naive 3DGS Ours PSNR↑ SSIM↑ LPIPS↓ Ours 22.18 0.701 0.200 3DGS 21.99 0.697 0.206 * The quantitative evaluations were performed on the reconstruction of the training views due to the lack of multi-view, same-time validation data.

Slide 16

Slide 16 text

● “Rose” scene (real data, 1M gaussians) ● “Lego” scene (simulation data, 240K gaussians) Experiment 3 | Ablation Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.0294 s 0.0242 s 12M 21.80 0.6649 0.1617 ①Bilinear Interp. 0.0114 s 0.0039 s 28M 21.89 0.6654 0.1600 ②Linear Approx. 0.0114 s 0.0035 s 14M 21.77 0.6629 0.1622 Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.00594 s 0.0065 s 7.2M 28.69 0.9557 0.0258 ①Bilinear Interp. 0.00385 s 0.0014 s 6.7M 25.56 0.9456 0.0399 ②Linear Approx. 0.00337 s 0.00095 s 3.4M 26.02 0.9453 0.0393 * The configuration is slightly different from that on the previous page.

Slide 17

Slide 17 text

Conclusion ● Contributions ○ Reconstruct 4D scene of grass & trees swaying in the wind from in-the-wild still images ○ Produce looping scene using embeddings ○ Reduce computational cost by efficient approximations ○ Develop intuitive UI 17

Slide 18

Slide 18 text

Limitations and Future Work ● Complex or high-frequency movements (e.g., dancing people, flowing water) couldn’t be accurately recovered ○ Possibly due to SfM initialization, representation power, optimization, … 18

Slide 19

Slide 19 text

Supplementary ● Why 2D embeddings? ○ Easy to manipulate on UI ○ Embeddings not in training data can also be interpolated well ○ 1D is not sufficient to loop naturally, 3D doesn’t work very well ● What if color/opacity MLP are also added? ○ Not much improvement in PSNR, but more parameters ● How to decide hyperparameters (dimension of embeddings, etc.)? ○ Utilize Optuna https://github.com/optuna/optuna ○ The hyperparameter set is in our GitHub code