Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image-to-Image Translation and Applications

Image-to-Image Translation and Applications

Eebedc2ee7ff95ffb9d9102c6d4a065c?s=128

LINE DevDay 2020

November 25, 2020
Tweet

Transcript

  1. None
  2. Animal translation Semantic image synthesis Face image synthesis Edge2image

  3.  Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

    CVPR 2017 D Depth-wise concatenation real / fake Ground truth y Input x pix2pix min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * ) G D Depth-wise concatenation real / fake Output G(x) Input x Input x
  4.  Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

    CVPR 2017 pix2pix
  5.  Network architecture of pix2pixHD. Wang et al. High-Resolution Image

    Synthesis and Semantic Manipulation with Conditional GANs. CVPR 2018 pix2pixHD min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * )
  6. None
  7.  Semantic image synthesis results using SPADE Park et al.

    Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019 SPADE
  8.  SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive

    Normalization. CVPR 2019 In the SPADE, the mask is first projected onto an embedding space and then convolved to produce the modulation parameters γ and β.
  9.  Each normalization layer uses the segmentation mask to modulate

    the layer activations. SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019
  10. None
  11. Unpaired Image-to-Image Translation

  12.  Unpaired data Paired (edge & shoe) Unpaired (cat &

    dog)
  13.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation min $ %&~((&) G,- (G-, & ) − & / + %1~((1) G-, (G,- 1 ) − 1 /
  14.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  15.  Unpaired Image-to-Image Translation Yu et al. Unsupervised Image-to-Image Translation.

    NeurIPS 2017 Day to night Snowy to summery
  16.  Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

    Networks. ICCV 2017 Unpaired Image-to-Image Translation
  17. Multi-Modal Image-to-Image Translation

  18.  Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018

    Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation min $,& '(~* ( , +~,(.,/) E(G (, + ) − + 4
  19. None
  20.  Input Outputs (winter → summer) Input Input Outputs (cats

    → big cats) Outputs (dogs → cats) Input Outputs (edges → bags) Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018 Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation
  21. None
  22.  Choi et al. StarGAN: Unified Generative Adversarial Networks for

    Multi-Domain Image-to-Image Translation. CVPR 2018 Multi-Domain Image-to-Image Translation min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/ * , ,~/(,) log(1 − D G *, , )
  23.  Multi-Domain Image-to-Image Translation Choi et al. StarGAN: Unified Generative

    Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
  24. Paired image-to-image translation (pix2pix) Unpaired image-to-image translation (CycleGAN) Limitation of

    prior work Multi-modal image-to-image translation (MUNIT) Multi-domain image-to-image translation (StarGAN) Existing image-to-image translation methods require training multiple models for all domains (scalability ↓) or produce only a single output for each domain (diversity ↓).
  25. Multi-Domain & Multi-Modal Image-to-Image Translation

  26. Choi et al. StarGAN v2: Diverse Image Synthesis for Multiple

    Domains. CVPR 2020 StarGAN v2
  27. Generator transforms an input reflecting the style code Style encoder

    learns to reconstruct the style of the generated image Discriminator: each branch is responsible for a particular domain Mapping network extracts style codes from random gaussian Style reconstruction loss Adversarial loss StarGAN v2
  28. None
  29. None
  30.  Saito et al. COCO-FUNIT: Few-Shot Unsupervised Image Translation with

    a Content Conditioned Style Encoder. ECCV 2020 COCO-FUNIT
  31.  Style FUNIT COCO-FUNIT Content Saito et al. COCO-FUNIT: Few-Shot

    Unsupervised Image Translation with a Content Conditioned Style Encoder. ECCV 2020
  32. None
  33. More Applications

  34. vid2vid Wang et al. Video-to-Video Translation. NeurIPS 2018

  35. Deep Face Drawing Chen et al. DeepFaceDrawing: Deep Generation of

    Face Images from Sketches. SIGGRAPH 2020
  36. pix2pix pix2pixHD CycleGAN StarGAN v2 UNIT Deep face drawing vid2vid

    SPADE Thank you