Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey of Image Editing with GANs in SIGGRAPH'21

Udon
September 12, 2021

Survey of Image Editing with GANs in SIGGRAPH'21

Yakiniku tabetai YO!

Udon

September 12, 2021
Tweet

More Decks by Udon

Other Decks in Research

Transcript

  1. Contents Image Editing with GANs • What is “Image Editing

    with GANs”? • Introduction • StyleGAN Architecture • Projection into StyleGAN • Enjoy SIGGRAPH’21 accepted papers! • Conclusion • Appendix — More recent papersʢମྗ͕ਚ͖ͨͷͰλΠτϧͱҰݴͷΈʣ 3
  2. What is — “Image Editing with GANs”? 4 Virtual Try

    On Cartoonization Appearance / Pose Editing Attribute Editing Input Reference Output Output Input Input Appearance Editing Pose Editing Input + Illumination + Pose - Pose + Expression
  3. StyleGAN Architecture 6 Introduction — Style Mixing 4 × 4

    8 × 8 16 × 16 1024 × 1024 512 × 512 ɾɾɾ Space W Space W+ Mapping Net AdaIN Structure (Source A) Style (Source B) const. Output
  4. StyleGAN Architecture 7 StyleGAN[Karras+ CVPR19] Introduction — StyleGAN vs StyleGAN2

    StyleGAN2[Karras+ CVPR20] Feature Modulation by AdaIN[Huang+ ICCV17] Weight Demodulation (Simplify Style Block) W+ W+
  5. Projection into StyleGAN 8 Introduction — Motivation Real-world Face Image

    W+ Space How can we achieve to embed real images in StyleGAN prior? If it was success, we could edit real-world images! (*Success to reconstruct image via embedding ≠ Editability Semantics)
  6. Projection into StyleGAN 9 Introduction — Image2StyleGAN[Abdal+ ICCV19] (Iterative Optimization)

    G W+ Space 1. Init 3. Calc loss 2. Generate 4. Update w* Final Accurate reconstructed result Slow No generalization to space
  7. Projection into StyleGAN 10 Fast inference Can’t reconstruct details Not

    work for OoD sample Introduction — pSp[Alaluf+ CVPR21] (Learning Encoder) Input Output https://twitter.com/notlewistbh/status/1432936600745431041?s=20 Please search “Face Depixelizer” in Twitter Can’t ignore strong prior
  8. TryOnGAN[Lewis+] 13 • Previous: Paired-Image-to-Image Translation-based Try On • Propose:

    StyleGAN-based Try On • Design a pose conditioned StyleGAN2 with Segmentation / Image generation branches Contribution
  9. TryOnGAN[Lewis+] 14 G Overview Input Ip Reference g G Output

    Ig Generated Image Real Image • Lost high-frequency details • High-quality!
  10. TryOnGAN[Lewis+] Results 16 Real-Image In. Ref. Previous methods Ours In.

    Ref. Ours Generated-Image Original StyleGAN2 (NOT BAD) Failure real-image (Can’t Harajuku)
  11. StyleCariGAN[Jang+] Contribution • Shape Exaggeration Blocks • Modulate course features

    to produce caricature shape exaggerations • Novel Architecture for Caricature Generation 18
  12. StyleCariGAN[Jang+] Method 19 Layer Swap Style MixingResults • CycleGAN-approach •

    Keep contents but transfer caricature style w/ WebCariA 50 Label α 1 α 4
  13. AgileGAN[Song+] Contribution • Achieve to generate high quality stylistic portraits.

    • Introduce hierarchical VAE, which embed in , to enforce the inverse mapped distribution conforms to follow original prior. Z+ W+ 22
  14. AgileGAN[Song+] Method 23 Novel VAE Architecture (Embed in ) Z+

    Reparameterization Trick t-SNE Visualization Content Difference
  15. e4e[Tov+] Contribution • Study the latent space of StyleGAN •

    Propose to consider distortion and perceptual quality of reconstructed image. • Propose two principles for designing encoders — controls proximity to based on distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. W 26
  16. e4e[Tov+] Method 27 W : prior, Wk : OoD latent

    OoD latent achieves better editability and distortion (Tradeoff) Restrict variance (Blue Arrow) Guide towards prior (Red Arrow) Overview of objectives for latents W Wk End-to-End Architecture
  17. SWAGAN[Gal+] Contribution • Previous GAN suffer from degradation in quality

    for high-frequency content… • Propose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain of Haar Wavelet (Not RGB). • Achieve Faster training (x0.25 time) • (Weakness point) Inversion methods using encoders suffer from acute high- frequency shortcomings, since their use of L2 based losses. 30
  18. SWAGAN[Gal+] Results (1/2) 32 Generated samples Comparison of time [s]

    to process 1,000 imgs Bi: Proposed, NWD: Non-Wavelet-Discriminator, NU: Neural Upsample SWAGAN-Bi SWAGAN-NU
  19. StyleFlow[Abdal+] Contributions • Propose StyleFlow — controls the generation process

    of attribute conditions and is formulated Conditional Continuous Normalizing Flows. • Refer • Normalizing Flowೖ໳γϦʔζ, https://tatsy.github.io/blog/ • The best overview article of normalizing flow!!!!!!!!!!!!! 35 Unknown Distribution Known Distribution Invertible Normalizing Flow
  20. Conclusion • Introduce StyleGAN Architecture and Projection • Explain six

    papers of “Image Editing with GANs” session. • Surprisingly, all papers employ StyleGAN!! 38
  21. Appendix GAN Inversion • Image2StyleGAN: How to Embed Images Into

    the StyleGAN Latent Space?[Abdal+ ICCV19] • ࠷ॳʹStyleGANʹຒΊࠐΈΛ΍ͬͨ࿦จ • Image2StyleGAN++: How to Edit the Embedded Images?[Abdal+ CVPR20] • Semantic Editingͷ࿩ • ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement[Alaluf+ ICCV21] • pSp[Alaluf+ CVPR21]ͷ༧ଌ݁Ռʹ࢒ࠩΛ௥Ճ͍ͯ͘͠࿩ 39
  22. Appendix Space of Style GAN • StyleSpace Analysis: Disentangled Controls

    for StyleGAN Image Generation[Wu+ CVPR21] • Convͷಛ௃ྔۭؒΛSۭؒͱఆٛ͠ɼ ΑΓDisentanglement͞Ε͍ͯΔͱ ൃݟ W+ 40