Image-to-Image Translation and Applications

Animal translation Semantic image synthesis Face image synthesis Edge2image

Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.
CVPR 2017 D Depth-wise concatenation real / fake Ground truth y Input x pix2pix min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * ) G D Depth-wise concatenation real / fake Output G(x) Input x Input x

Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.
CVPR 2017 pix2pix

Network architecture of pix2pixHD. Wang et al. High-Resolution Image
Synthesis and Semantic Manipulation with Conditional GANs. CVPR 2018 pix2pixHD min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * )

Semantic image synthesis results using SPADE Park et al.
Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019 SPADE

SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive
Normalization. CVPR 2019 In the SPADE, the mask is first projected onto an embedding space and then convolved to produce the modulation parameters γ and β.

Each normalization layer uses the segmentation mask to modulate
the layer activations. SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive Normalization. CVPR 2019

Unpaired Image-to-Image Translation

Unpaired data Paired (edge & shoe) Unpaired (cat &
dog)

Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks. ICCV 2017 Unpaired Image-to-Image Translation min $ %&~((&) G,- (G-, & ) − & / + %1~((1) G-, (G,- 1 ) − 1 /

Networks. ICCV 2017 Unpaired Image-to-Image Translation

Unpaired Image-to-Image Translation Yu et al. Unsupervised Image-to-Image Translation.
NeurIPS 2017 Day to night Snowy to summery

Networks. ICCV 2017 Unpaired Image-to-Image Translation

Multi-Modal Image-to-Image Translation

Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018
Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation min $,& '(~* ( , +~,(.,/) E(G (, + ) − + 4

Input Outputs (winter → summer) Input Input Outputs (cats
→ big cats) Outputs (dogs → cats) Input Outputs (edges → bags) Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018 Zhu et al. Toward Multi-Modal Image-to-Image Translation. NeurIPS 2018 Multi-Modal Image-to-Image Translation

Choi et al. StarGAN: Unified Generative Adversarial Networks for
Multi-Domain Image-to-Image Translation. CVPR 2018 Multi-Domain Image-to-Image Translation min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/ * , ,~/(,) log(1 − D G *, , )

Multi-Domain Image-to-Image Translation Choi et al. StarGAN: Unified Generative
Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018

Paired image-to-image translation (pix2pix) Unpaired image-to-image translation (CycleGAN) Limitation of
prior work Multi-modal image-to-image translation (MUNIT) Multi-domain image-to-image translation (StarGAN) Existing image-to-image translation methods require training multiple models for all domains (scalability ↓) or produce only a single output for each domain (diversity ↓).

Multi-Domain & Multi-Modal Image-to-Image Translation

Choi et al. StarGAN v2: Diverse Image Synthesis for Multiple
Domains. CVPR 2020 StarGAN v2

Generator transforms an input reflecting the style code Style encoder
learns to reconstruct the style of the generated image Discriminator: each branch is responsible for a particular domain Mapping network extracts style codes from random gaussian Style reconstruction loss Adversarial loss StarGAN v2

Saito et al. COCO-FUNIT: Few-Shot Unsupervised Image Translation with
a Content Conditioned Style Encoder. ECCV 2020 COCO-FUNIT

Style FUNIT COCO-FUNIT Content Saito et al. COCO-FUNIT: Few-Shot
Unsupervised Image Translation with a Content Conditioned Style Encoder. ECCV 2020

More Applications

vid2vid Wang et al. Video-to-Video Translation. NeurIPS 2018

Deep Face Drawing Chen et al. DeepFaceDrawing: Deep Generation of
Face Images from Sketches. SIGGRAPH 2020

pix2pix pix2pixHD CycleGAN StarGAN v2 UNIT Deep face drawing vid2vid
SPADE Thank you

Image-to-Image Translation and Applications

Image-to-Image Translation and Applications

LINE DevDay 2020

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript

Animal translation Semantic image synthesis Face image synthesis Edge2image

Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

Isola et al. Image-to-Image Translation with Conditional Adversarial Networks.

Network architecture of pix2pixHD. Wang et al. High-Resolution Image

Semantic image synthesis results using SPADE Park et al.

SPADE Park et al. Semantic Image Synthesis with Spatially-Adaptive

Each normalization layer uses the segmentation mask to modulate

Unpaired Image-to-Image Translation

Unpaired data Paired (edge & shoe) Unpaired (cat &

Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

Unpaired Image-to-Image Translation Yu et al. Unsupervised Image-to-Image Translation.

Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial

Multi-Modal Image-to-Image Translation

Huang et al. Multi-Modal Unsupervised Image-to-Image Translation. ECCV 2018

Input Outputs (winter → summer) Input Input Outputs (cats

Choi et al. StarGAN: Unified Generative Adversarial Networks for

Multi-Domain Image-to-Image Translation Choi et al. StarGAN: Unified Generative

Paired image-to-image translation (pix2pix) Unpaired image-to-image translation (CycleGAN) Limitation of

Multi-Domain & Multi-Modal Image-to-Image Translation

Choi et al. StarGAN v2: Diverse Image Synthesis for Multiple

Generator transforms an input reflecting the style code Style encoder

Saito et al. COCO-FUNIT: Few-Shot Unsupervised Image Translation with

Style FUNIT COCO-FUNIT Content Saito et al. COCO-FUNIT: Few-Shot

More Applications

vid2vid Wang et al. Video-to-Video Translation. NeurIPS 2018

Deep Face Drawing Chen et al. DeepFaceDrawing: Deep Generation of

pix2pix pix2pixHD CycleGAN StarGAN v2 UNIT Deep face drawing vid2vid