PFN Internship 2024 / Kai Kohyama: Blowin’ in the Wild: Dynamic Looping Gaussians from Still Images

Blowin’ in the Wild Dynamic Looping Gaussians from Still Images
Kai Kohyama Mentors: Toru Matsuoka, Sosuke Kobayashi Preferred Networks Internship. September, 2024

Motivation (1/2) • 4D reconstruction is expected to be applied
to virtual production • However, existing 4D reconstruction system requires… ◦ Multi-view settings ◦ Large video data 2 ▲ Installing multiple cameras is diﬃcult ▲ PFN 3D Scan provides virtual production service 写真は https://pfn3d.com/jlox-2023 https://pfn3d.com/4d より引用

Motivation (2/2) • It’s useful if 4D reconstruction could be
done from casually taken monocular images (“in-the-wild” images) ◦ Naive 3DGS method tends to output blurry reconstruction • Looping scene output is even better for virtual production 3 “in-the-wild” swaying scene unknown t, monocular looping 4D Scene Naive 3DGS ☺  😢 

Goal | Blowin’ in the Wild • Project Goal :
◦ Reconstruct looping scene of grass & trees swaying in the wind from in-the-wild images ◦ ->Blowin’ in the Wild Dynamic Looping Gaussians from Still Images ◦ A play on Bob Dylan’s “blowin’ in the wind” 4

Comparison 5 Method Input Output Monocular? Image? (not use t
?) Dynamic scene? Looping? 3DGS ⭕ ⭕ ❌ ❌ 4DGS ❌ ❌ ⭕ ❌ E-D3DGS ❌ ❌ ⭕ ❌ GFlow ⭕ ❌ ⭕ ❌ LoopGaussian ⭕ ⭕ △ ⭕ WildGaussians ⭕ ⭕ ❌ ❌ Ours ⭕ ⭕ ⭕ ⭕

Preliminary | 3D Gaussian Splatting • Express the scene as
a collection of 3D Gaussian • Per-gaussian parameters: ◦ Position, Rotation, Scale, Opacity, Color (SH coeﬃcients) 6 Optimize Parameters (mean) (variance)

Prior Work | In-the-Wild Gaussian Splatting • WildGaussians (Kulhanek et
al.) ◦ Add Per-Gaussian Embeddings & Per-Shot Embeddings ◦ Use MLP to extend Color representation 7 Shot Embeddings: Learning environmental feature MLP 3DGS Renderer Gaussian Parameters Gaussian Embeddings: Learning spatial feature Scene with lighting changes 図は WildGaussians https://arxiv.org/pdf/2407.08447 より引用

Dynamic Scene? Method | In-the-Wild Dynamic Gaussian Splatting • Add
Per-Gaussian Embeddings & Per-Shot Embeddings • Use MLP to extend Position / Rotation representation 8 MLP 3DGS Renderer Gaussian Parameters Shot Embeddings: Learning temporal feature Gaussian Embeddings: Learning spatial feature Scene with movements

Looping Scene Method | Looping Scene Production • Create a
natural looping scene by manipulating shot embeddings ◦ Temporal information in shot embeddings is not 1D time, but higher order features 9 Shot Embeddings (M shots x N dim) Reduce feature dimensions to 2D using PCA [0.2, 0.1, -0.5, ..., 0.3], [0.3, 0.2, -0.3, ..., 0.4], [0.5, -0.3, 0.5, ..., 0.2], [-0.3, 0.0, -0.1, ..., 0.1], [-0.0, 0.2, -0.4, ..., 0.5], [0.2, 0.1, -0.5, ..., 0.3], [0.2, 0.1, -0.5, ..., 0.3] N dim 2 dim

Method | Motion Bake (1/2) • Reduce calculation time by
skip MLP estimation • Try 2 ideas: ① Record & interpolate MLP results (2x2 Bilinear) Precalculation Bilinear Interpolation u v

2x7 Matrix 2x7 Matrix Method | Motion Bake (2/2) •
Reduce computation time by skip MLP estimation • Try 2 ideas: MLP PCA-1 2x7 Matrix ② Approximate MLP as per-gaussian linear matrices Embeddings precomputed for 100 sampled (u,v) Approximation by the least-squares method × gaussian num.

Experiment 1 | 4D Reconstruction Ability (1/2) • Test 4D
reconstruction on D-NeRF Dataset ◦ Monocular setting ◦ Continuous shooting time ◦ Lighting is static 12 Easier than in-the-wild data 50 continuous images from various angle

Experiment 1 | 4D Reconstruction Ability (2/2) • Results 13
Embeddings successed to capture the movement PSNR↑ SSIM↑ LPIPS↓ Ours 26.77 0.9000 0.0571 4DGS 25.03 0.9376 0.0437 3DGS 23.06 0.9290 0.0642 Quantitative results on “Lego” scene

Experiment 2 | Test on In-the-Wild Scene (1/2) • Test
the idea on In-the-Wild data ◦ Monocular setting ◦ Shooting time is unknown and irregular ◦ Lighting is nearly static “rose” scene contains 34 shots

Experiment 2 | Test on In-the-Wild Scene (2/2) • Results
Naive 3DGS Ours PSNR↑ SSIM↑ LPIPS↓ Ours 22.18 0.701 0.200 3DGS 21.99 0.697 0.206 * The quantitative evaluations were performed on the reconstruction of the training views due to the lack of multi-view, same-time validation data.

• “Rose” scene (real data, 1M gaussians) • “Lego” scene
(simulation data, 240K gaussians) Experiment 3 | Ablation Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.0294 s 0.0242 s 12M 21.80 0.6649 0.1617 ①Bilinear Interp. 0.0114 s 0.0039 s 28M 21.89 0.6654 0.1600 ②Linear Approx. 0.0114 s 0.0035 s 14M 21.77 0.6629 0.1622 Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.00594 s 0.0065 s 7.2M 28.69 0.9557 0.0258 ①Bilinear Interp. 0.00385 s 0.0014 s 6.7M 25.56 0.9456 0.0399 ②Linear Approx. 0.00337 s 0.00095 s 3.4M 26.02 0.9453 0.0393 * The conﬁguration is slightly diﬀerent from that on the previous page.

Conclusion • Contributions ◦ Reconstruct 4D scene of grass &
trees swaying in the wind from in-the-wild still images ◦ Produce looping scene using embeddings ◦ Reduce computational cost by eﬃcient approximations ◦ Develop intuitive UI 17

Limitations and Future Work • Complex or high-frequency movements (e.g.,
dancing people, ﬂowing water) couldn’t be accurately recovered ◦ Possibly due to SfM initialization, representation power, optimization, … 18

Supplementary • Why 2D embeddings? ◦ Easy to manipulate on
UI ◦ Embeddings not in training data can also be interpolated well ◦ 1D is not suﬃcient to loop naturally, 3D doesn’t work very well • What if color/opacity MLP are also added? ◦ Not much improvement in PSNR, but more parameters • How to decide hyperparameters (dimension of embeddings, etc.)? ◦ Utilize Optuna https://github.com/optuna/optuna ◦ The hyperparameter set is in our GitHub code

PFN Internship 2024 / Kai Kohyama: Blowin’ in t...

PFN Internship 2024 / Kai Kohyama: Blowin’ in the Wild: Dynamic Looping Gaussians from Still Images

Preferred Networks PRO

More Decks by Preferred Networks

Other Decks in Technology

Featured

Transcript

Blowin’ in the Wild Dynamic Looping Gaussians from Still Images

Motivation (1/2) • 4D reconstruction is expected to be applied

Motivation (2/2) • It’s useful if 4D reconstruction could be

Goal | Blowin’ in the Wild • Project Goal :

Comparison 5 Method Input Output Monocular? Image? (not use t

Preliminary | 3D Gaussian Splatting • Express the scene as

Prior Work | In-the-Wild Gaussian Splatting • WildGaussians (Kulhanek et

Dynamic Scene? Method | In-the-Wild Dynamic Gaussian Splatting • Add

Looping Scene Method | Looping Scene Production • Create a

Method | Motion Bake (1/2) • Reduce calculation time by

2x7 Matrix 2x7 Matrix Method | Motion Bake (2/2) •

Experiment 1 | 4D Reconstruction Ability (1/2) • Test 4D

Experiment 1 | 4D Reconstruction Ability (2/2) • Results 13

Experiment 2 | Test on In-the-Wild Scene (1/2) • Test

Experiment 2 | Test on In-the-Wild Scene (2/2) • Results

• “Rose” scene (real data, 1M gaussians) • “Lego” scene

Conclusion • Contributions ◦ Reconstruct 4D scene of grass &

Limitations and Future Work • Complex or high-frequency movements (e.g.,

Supplementary • Why 2D embeddings? ◦ Easy to manipulate on