Slide 1

Slide 1 text

1 Image to Point Cloud Translation using Conditional Generative Adversarial Network for Airborne LiDAR data Takayuki Shinohara, Haoyi Xiu, and Masashi Matsuoka Tokyo Institute of Technology 7/July/2021 Online, ISPRS DIGITAL EDITION

Slide 2

Slide 2 text

2 Tokyo Tech Outline 1. Background and Objective 2. Proposed Method 3. Experimental Result 4. Conclusion

Slide 3

Slide 3 text

3 Tokyo Tech 1. Background and Objectives

Slide 4

Slide 4 text

4 Tokyo Tech Image to Point Cloud Translation Single Aerial Photo 3D Point Clouds Translation Data source: https://ieeexplore.ieee.org/document/8328995

Slide 5

Slide 5 text

5 Tokyo Tech 3D Reconstruction from Image lMulti View Image n Photogrammetric method n Many Images l High Computational Cost lDeep Learning n Statistical Estimation n Single Image l Low Computational Cost 3D Reconstruction from single images using deep learning is important for low-cost 3D restoration.

Slide 6

Slide 6 text

6 Tokyo Tech DL-based Reconstruction 1/2 lHeight Image Estimation Previous papers only generate 2D image Images cannot represent 3D, point cloud reconstruction is necessary. Image from: https://ieeexplore.ieee.org/abstract/document/9190011 Li et.al 2020

Slide 7

Slide 7 text

7 Tokyo Tech DL-based Reconstruction 2/2 lPoint Cloud Reconstruction Only one object in the image is targeted. Aerial Photo has more complex objects than previous target https://ieeexplore.ieee.org/document/8099747/ Fan et.al 2016 Lin et.al 2018 https://www.ci2cv.net/media/papers/AAAI2018_chenhuan.pdf

Slide 8

Slide 8 text

8 Tokyo Tech Reconstruction for Complex Objects lImage-to-Image Translation n Pix2Pix[Isola et.al 2017] l General pipeline for various tasks with cGAN. lAuto Encoder for Point Cloud n FoldingNet[Yang et.al 2018] l A network that creates a point cloud from a 2D grid. l Easy to extend the method to create a point cloud by inputting images. We propose Pix2Pix like aerial photo to airborne Point Cloud translation with FoldingNet-like generator.

Slide 9

Slide 9 text

9 Tokyo Tech 2. Proposed Method

Slide 10

Slide 10 text

10 Tokyo Tech Overview of Our Method lPix2Pix pipeline Reconstructed Fake Data G(𝑧) Input Image 𝑥 𝑧 E G D Sampled from Real Data 𝑦~𝑅𝑒𝑎𝑙 Real/Fake ResNet FoldingNet PointNet++ Translation Discriminator judges Real or Reconstructed Fake data Encoder maps input image into latent feature. Generator reconstructs Point Cloud. GAN

Slide 11

Slide 11 text

11 Tokyo Tech Network: Generator lResNet[Shu et al. 2015]-based 𝑧 ∈ ℝ!"×!" Residual Block Input Image Input Image 𝑥 ∈ ℝ$×%"&×%"& Reconstructed Point Cloud 𝐸 𝑥 = 𝑧 ∈ ℝ!"×!" Encoder extracts latent feature of input image

Slide 12

Slide 12 text

12 Tokyo Tech Network: Generator lFoldingNet[Yang et al. 2018]-based MLP Point Cloud Reconstructed Point Cloud 𝐺 𝑧 ∈ ℝ'×$ Generator reconstructs point cloud from latent vector extracted by Encoder Concat. Image Feature from Encoder 𝑧 ∈ ℝ!"×!" 2D Grid Concat. Concat.

Slide 13

Slide 13 text

13 Tokyo Tech Network: Point Discriminator lPointNet++[Qi et al. 2017]-based N Input Patch Prob. Real 8,192 4,096 2,048 1DCNN Downsampling ( ) Sampling Grouping Fake Points Real Points Judge fake or real

Slide 14

Slide 14 text

14 Tokyo Tech Optimization lReconstruction 𝐿!"# = lGAN n Wasserstein loss Gen: 𝐿$ = −𝔼% 𝐷 𝐺 𝑧 Disc: 𝐿& = 𝔼% 𝐷 𝐺 𝑧 - 𝔼 ́ (~ℝ 𝐷 ́ 𝑥 ] lTotal loss n In the actual training process, we use all these objective functions. Chamfer Distance Earth Mover Distance ! !∈#! min $∈#" 𝑥 − 𝑦 % % + ! !∈#" min $∈#! 𝑥 − 𝑦 % % + min #!→#" ! ': !∈#! 1 2 𝑥 − 𝜙(𝑥) % %

Slide 15

Slide 15 text

15 Tokyo Tech 3. Experimental Results

Slide 16

Slide 16 text

16 Tokyo Tech Experimental Data lGRSS Data Fusion Contest 2018 n Airborne LiDAR observation l Target Area l urban area l Building l Vegetation l Road l Training Patch l 25 m2 l 2,045 points l 1,000 patches GT Point Cloud Input Aerial Photo 25 m 25 m Target Area

Slide 17

Slide 17 text

17 Tokyo Tech Generated Point Cloud Proposed GAN and VAE method generated better results than raw GAN model.

Slide 18

Slide 18 text

18 Tokyo Tech 4. Conclusion and Future Work

Slide 19

Slide 19 text

19 Tokyo Tech Conclusion and Future Work lConclusion n We propose a conditional adversarial network to translate Aerial photo into Point Cloud observed by airborne LiDAR. n Our trained Generator was able to make fake point clouds clearly. lFuture work n Only Qualitative evaluation => Quantitative evaluation n Combination of Instance Sem.Seg. => Label guided point cloud generation n Traditional method => Change generator into recent architecture