Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PFN Internship 2024 / Kai Kohyama: Blowin’ in t...

PFN Internship 2024 / Kai Kohyama: Blowin’ in the Wild: Dynamic Looping Gaussians from Still Images

Presentation by a 2024 PFN intern Kai Kohyama

Preferred Networks

October 24, 2024
Tweet

More Decks by Preferred Networks

Other Decks in Technology

Transcript

  1. Blowin’ in the Wild Dynamic Looping Gaussians from Still Images

    Kai Kohyama Mentors: Toru Matsuoka, Sosuke Kobayashi Preferred Networks Internship. September, 2024
  2. Motivation (1/2) • 4D reconstruction is expected to be applied

    to virtual production • However, existing 4D reconstruction system requires… ◦ Multi-view settings ◦ Large video data 2 ▲ Installing multiple cameras is difficult ▲ PFN 3D Scan provides virtual production service 写真は https://pfn3d.com/jlox-2023 https://pfn3d.com/4d より引用
  3. Motivation (2/2) • It’s useful if 4D reconstruction could be

    done from casually taken monocular images (“in-the-wild” images) ◦ Naive 3DGS method tends to output blurry reconstruction • Looping scene output is even better for virtual production 3 “in-the-wild” swaying scene unknown t, monocular looping 4D Scene Naive 3DGS ☺
 😢

  4. Goal | Blowin’ in the Wild • Project Goal :

    ◦ Reconstruct looping scene of grass & trees swaying in the wind from in-the-wild images ◦ ->Blowin’ in the Wild Dynamic Looping Gaussians from Still Images ◦ A play on Bob Dylan’s “blowin’ in the wind” 4
  5. Comparison 5 Method Input Output Monocular? Image? (not use t

    ?) Dynamic scene? Looping? 3DGS ⭕ ⭕ ❌ ❌ 4DGS ❌ ❌ ⭕ ❌ E-D3DGS ❌ ❌ ⭕ ❌ GFlow ⭕ ❌ ⭕ ❌ LoopGaussian ⭕ ⭕ △ ⭕ WildGaussians ⭕ ⭕ ❌ ❌ Ours ⭕ ⭕ ⭕ ⭕
  6. Preliminary | 3D Gaussian Splatting • Express the scene as

    a collection of 3D Gaussian • Per-gaussian parameters: ◦ Position, Rotation, Scale, Opacity, Color (SH coefficients) 6 Optimize Parameters (mean) (variance)
  7. Prior Work | In-the-Wild Gaussian Splatting • WildGaussians (Kulhanek et

    al.) ◦ Add Per-Gaussian Embeddings & Per-Shot Embeddings ◦ Use MLP to extend Color representation 7 Shot Embeddings: Learning environmental feature MLP 3DGS Renderer Gaussian Parameters Gaussian Embeddings: Learning spatial feature Scene with lighting changes 図は WildGaussians https://arxiv.org/pdf/2407.08447 より引用
  8. Dynamic Scene? Method | In-the-Wild Dynamic Gaussian Splatting • Add

    Per-Gaussian Embeddings & Per-Shot Embeddings • Use MLP to extend Position / Rotation representation 8 MLP 3DGS Renderer Gaussian Parameters Shot Embeddings: Learning temporal feature Gaussian Embeddings: Learning spatial feature Scene with movements
  9. Looping Scene Method | Looping Scene Production • Create a

    natural looping scene by manipulating shot embeddings ◦ Temporal information in shot embeddings is not 1D time, but higher order features 9 Shot Embeddings (M shots x N dim) Reduce feature dimensions to 2D using PCA [0.2, 0.1, -0.5, ..., 0.3], [0.3, 0.2, -0.3, ..., 0.4], [0.5, -0.3, 0.5, ..., 0.2], [-0.3, 0.0, -0.1, ..., 0.1], [-0.0, 0.2, -0.4, ..., 0.5], [0.2, 0.1, -0.5, ..., 0.3], [0.2, 0.1, -0.5, ..., 0.3] N dim 2 dim
  10. Method | Motion Bake (1/2) • Reduce calculation time by

    skip MLP estimation • Try 2 ideas: ① Record & interpolate MLP results (2x2 Bilinear) Precalculation Bilinear Interpolation u v
  11. 2x7 Matrix 2x7 Matrix Method | Motion Bake (2/2) •

    Reduce computation time by skip MLP estimation • Try 2 ideas: MLP PCA-1 2x7 Matrix ② Approximate MLP as per-gaussian linear matrices Embeddings precomputed for 100 sampled (u,v) Approximation by the least-squares method × gaussian num.
  12. Experiment 1 | 4D Reconstruction Ability (1/2) • Test 4D

    reconstruction on D-NeRF Dataset ◦ Monocular setting ◦ Continuous shooting time ◦ Lighting is static 12 Easier than in-the-wild data 50 continuous images from various angle
  13. Experiment 1 | 4D Reconstruction Ability (2/2) • Results 13

    Embeddings successed to capture the movement PSNR↑ SSIM↑ LPIPS↓ Ours 26.77 0.9000 0.0571 4DGS 25.03 0.9376 0.0437 3DGS 23.06 0.9290 0.0642 Quantitative results on “Lego” scene
  14. Experiment 2 | Test on In-the-Wild Scene (1/2) • Test

    the idea on In-the-Wild data ◦ Monocular setting ◦ Shooting time is unknown and irregular ◦ Lighting is nearly static “rose” scene contains 34 shots
  15. Experiment 2 | Test on In-the-Wild Scene (2/2) • Results

    Naive 3DGS Ours PSNR↑ SSIM↑ LPIPS↓ Ours 22.18 0.701 0.200 3DGS 21.99 0.697 0.206 * The quantitative evaluations were performed on the reconstruction of the training views due to the lack of multi-view, same-time validation data.
  16. • “Rose” scene (real data, 1M gaussians) • “Lego” scene

    (simulation data, 240K gaussians) Experiment 3 | Ablation Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.0294 s 0.0242 s 12M 21.80 0.6649 0.1617 ①Bilinear Interp. 0.0114 s 0.0039 s 28M 21.89 0.6654 0.1600 ②Linear Approx. 0.0114 s 0.0035 s 14M 21.77 0.6629 0.1622 Rendering Time MLP Time Parameters PSNR SSIM LPIPS Naive MLP 0.00594 s 0.0065 s 7.2M 28.69 0.9557 0.0258 ①Bilinear Interp. 0.00385 s 0.0014 s 6.7M 25.56 0.9456 0.0399 ②Linear Approx. 0.00337 s 0.00095 s 3.4M 26.02 0.9453 0.0393 * The configuration is slightly different from that on the previous page.
  17. Conclusion • Contributions ◦ Reconstruct 4D scene of grass &

    trees swaying in the wind from in-the-wild still images ◦ Produce looping scene using embeddings ◦ Reduce computational cost by efficient approximations ◦ Develop intuitive UI 17
  18. Limitations and Future Work • Complex or high-frequency movements (e.g.,

    dancing people, flowing water) couldn’t be accurately recovered ◦ Possibly due to SfM initialization, representation power, optimization, … 18
  19. Supplementary • Why 2D embeddings? ◦ Easy to manipulate on

    UI ◦ Embeddings not in training data can also be interpolated well ◦ 1D is not sufficient to loop naturally, 3D doesn’t work very well • What if color/opacity MLP are also added? ◦ Not much improvement in PSNR, but more parameters • How to decide hyperparameters (dimension of embeddings, etc.)? ◦ Utilize Optuna https://github.com/optuna/optuna ◦ The hyperparameter set is in our GitHub code