CVPR 2017 D Depth-wise concatenation real / fake Ground truth y Input x pix2pix min $ max ' ((*,,)~/(*,,) log D(*, ,) + (*~/(*) log(1 − D *, G * ) G D Depth-wise concatenation real / fake Output G(x) Input x Input x
Normalization. CVPR 2019 In the SPADE, the mask is first projected onto an embedding space and then convolved to produce the modulation parameters γ and β.
prior work Multi-modal image-to-image translation (MUNIT) Multi-domain image-to-image translation (StarGAN) Existing image-to-image translation methods require training multiple models for all domains (scalability ↓) or produce only a single output for each domain (diversity ↓).
learns to reconstruct the style of the generated image Discriminator: each branch is responsible for a particular domain Mapping network extracts style codes from random gaussian Style reconstruction loss Adversarial loss StarGAN v2