Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image-to-Image Translation and Applications

Image-to-Image Translation and Applications

Avatar for LINE DevDay 2020

LINE DevDay 2020

November 25, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1.  Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

    CVPR 2017 D Depth-wise concatenation real / fake Ground truth y Input x pix2pix min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * ) G D Depth-wise concatenation real / fake Output G(x) Input x Input x
  2.  Network architecture of pix2pixHD. Wang et al. High-Resolution Image

    Synthesis and Semantic Manipulation with Conditional GANs. CVPR 2018 pix2pixHD min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * )
  3.  Semantic image synthesis results using SPADE Park et al.

    Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019 SPADE
  4.  SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive

    Normalization. CVPR 2019 In the SPADE, the mask is first projected onto an embedding space and then convolved to produce the modulation parameters γ and β.
  5.  Each normalization layer uses the segmentation mask to modulate

    the layer activations. SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019
  6.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation min $ %&~((&) G,- (G-, & ) − & / + %1~((1) G-, (G,- 1 ) − 1 /
  7.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  8.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  9.  Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018

    Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation min $,& '(~* ( , +~,(.,/) E(G (, + ) − + 4
  10.  Input Outputs (winter → summer) Input Input Outputs (cats

    → big cats) Outputs (dogs → cats) Input Outputs (edges → bags) Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018 Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation
  11.  Choi et al. StarGAN: Unified Generative Adversarial Networks for

    Multi-Domain Image-to-Image Translation. CVPR 2018 Multi-Domain Image-to-Image Translation min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/ * , ,~/(,) log(1 − D G *, , )
  12.  Multi-Domain Image-to-Image Translation Choi et al. StarGAN: Unified Generative

    Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
  13. Paired image-to-image translation (pix2pix) Unpaired image-to-image translation (CycleGAN) Limitation of

    prior work Multi-modal image-to-image translation (MUNIT) Multi-domain image-to-image translation (StarGAN) Existing image-to-image translation methods require training multiple models for all domains (scalability ↓) or produce only a single output for each domain (diversity ↓).
  14. Generator transforms an input reflecting the style code Style encoder

    learns to reconstruct the style of the generated image Discriminator: each branch is responsible for a particular domain Mapping network extracts style codes from random gaussian Style reconstruction loss Adversarial loss StarGAN v2
  15.  Saito et al. COCO-FUNIT: Few-Shot Unsupervised Image Translation with

    a Content Conditioned Style Encoder. ECCV 2020 COCO-FUNIT
  16.  Style FUNIT COCO-FUNIT Content Saito et al. COCO-FUNIT: Few-Shot

    Unsupervised Image Translation with a Content Conditioned Style Encoder. ECCV 2020