Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image-to-Image Translation and Applications

Image-to-Image Translation and Applications

LINE DevDay 2020

November 25, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1.  Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

    CVPR 2017 D Depth-wise concatenation real / fake Ground truth y Input x pix2pix min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * ) G D Depth-wise concatenation real / fake Output G(x) Input x Input x
  2.  Network architecture of pix2pixHD. Wang et al. High-Resolution Image

    Synthesis and Semantic Manipulation with Conditional GANs. CVPR 2018 pix2pixHD min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * )
  3.  Semantic image synthesis results using SPADE Park et al.

    Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019 SPADE
  4.  SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive

    Normalization. CVPR 2019 In the SPADE, the mask is first projected onto an embedding space and then convolved to produce the modulation parameters γ and β.
  5.  Each normalization layer uses the segmentation mask to modulate

    the layer activations. SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019
  6.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation min $ %&~((&) G,- (G-, & ) − & / + %1~((1) G-, (G,- 1 ) − 1 /
  7.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  8.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  9.  Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018

    Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation min $,& '(~* ( , +~,(.,/) E(G (, + ) − + 4
  10.  Input Outputs (winter → summer) Input Input Outputs (cats

    → big cats) Outputs (dogs → cats) Input Outputs (edges → bags) Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018 Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation
  11.  Choi et al. StarGAN: Unified Generative Adversarial Networks for

    Multi-Domain Image-to-Image Translation. CVPR 2018 Multi-Domain Image-to-Image Translation min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/ * , ,~/(,) log(1 − D G *, , )
  12.  Multi-Domain Image-to-Image Translation Choi et al. StarGAN: Unified Generative

    Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
  13. Paired image-to-image translation (pix2pix) Unpaired image-to-image translation (CycleGAN) Limitation of

    prior work Multi-modal image-to-image translation (MUNIT) Multi-domain image-to-image translation (StarGAN) Existing image-to-image translation methods require training multiple models for all domains (scalability ↓) or produce only a single output for each domain (diversity ↓).
  14. Generator transforms an input reflecting the style code Style encoder

    learns to reconstruct the style of the generated image Discriminator: each branch is responsible for a particular domain Mapping network extracts style codes from random gaussian Style reconstruction loss Adversarial loss StarGAN v2
  15.  Saito et al. COCO-FUNIT: Few-Shot Unsupervised Image Translation with

    a Content Conditioned Style Encoder. ECCV 2020 COCO-FUNIT
  16.  Style FUNIT COCO-FUNIT Content Saito et al. COCO-FUNIT: Few-Shot

    Unsupervised Image Translation with a Content Conditioned Style Encoder. ECCV 2020