Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey of Image Editing with GANs in SIGGRAPH'21

Udon
September 12, 2021

Survey of Image Editing with GANs in SIGGRAPH'21

Yakiniku tabetai YO!

Udon

September 12, 2021
Tweet

More Decks by Udon

Other Decks in Research

Transcript

  1. 2021.9.12 Daichi Horita Image Editing with GANs SIGGRAPH 2021ษڧձ —

    https://siggraph.xyz/s2021/ 1
  2. ࣗݾ঺հ • “͏ͲΜ”Ͱ͢ɹhttps://twitter.com/udoooom 2 https://www.shikoku-np.co.jp/udon/shop/890 “͏ͲΜόΧҰ୅” in ߳઒͕͓͢͢Ί

  3. Contents Image Editing with GANs • What is “Image Editing

    with GANs”? • Introduction • StyleGAN Architecture • Projection into StyleGAN • Enjoy SIGGRAPH’21 accepted papers! • Conclusion • Appendix — More recent papersʢମྗ͕ਚ͖ͨͷͰλΠτϧͱҰݴͷΈʣ 3
  4. What is — “Image Editing with GANs”? 4 Virtual Try

    On Cartoonization Appearance / Pose Editing Attribute Editing Input Reference Output Output Input Input Appearance Editing Pose Editing Input + Illumination + Pose - Pose + Expression
  5. StyleGAN Architecture 5 Introduction — Original StyleGAN[Karras+ CVPR19] PGGAN[Karras+ CVPR18]

    StyleGAN[Karras+ CVPR19] W+ W Generated Images by StyleGAN
  6. StyleGAN Architecture 6 Introduction — Style Mixing 4 × 4

    8 × 8 16 × 16 1024 × 1024 512 × 512 ɾɾɾ Space W Space W+ Mapping Net AdaIN Structure (Source A) Style (Source B) const. Output
  7. StyleGAN Architecture 7 StyleGAN[Karras+ CVPR19] Introduction — StyleGAN vs StyleGAN2

    StyleGAN2[Karras+ CVPR20] Feature Modulation by AdaIN[Huang+ ICCV17] Weight Demodulation (Simplify Style Block) W+ W+
  8. Projection into StyleGAN 8 Introduction — Motivation Real-world Face Image

    W+ Space How can we achieve to embed real images in StyleGAN prior? If it was success, we could edit real-world images! (*Success to reconstruct image via embedding ≠ Editability Semantics)
  9. Projection into StyleGAN 9 Introduction — Image2StyleGAN[Abdal+ ICCV19] (Iterative Optimization)

    G W+ Space 1. Init 3. Calc loss 2. Generate 4. Update w* Final Accurate reconstructed result Slow No generalization to space
  10. Projection into StyleGAN 10 Fast inference Can’t reconstruct details Not

    work for OoD sample Introduction — pSp[Alaluf+ CVPR21] (Learning Encoder) Input Output https://twitter.com/notlewistbh/status/1432936600745431041?s=20 Please search “Face Depixelizer” in Twitter Can’t ignore strong prior
  11. Enjoy SIGGRAPH’21 accepted papers! 11 TryOnGAN[Lewis+] StyleCariGAN[Jang+] AgileGAN[Song+] e4e[Tov+] SWAGAN[Gal+]

    StyleFlow[Abdal+]
  12. TryOnGAN[Lewis+] 12

  13. TryOnGAN[Lewis+] 13 • Previous: Paired-Image-to-Image Translation-based Try On • Propose:

    StyleGAN-based Try On • Design a pose conditioned StyleGAN2 with Segmentation / Image generation branches Contribution
  14. TryOnGAN[Lewis+] 14 G Overview Input Ip Reference g G Output

    Ig Generated Image Real Image • Lost high-frequency details • High-quality!
  15. TryOnGAN[Lewis+] 15 Method 1st: Train StyleGAN2 2nd: Optimize σp, σq

    Style mixing per layer!
  16. TryOnGAN[Lewis+] Results 16 Real-Image In. Ref. Previous methods Ours In.

    Ref. Ours Generated-Image Original StyleGAN2 (NOT BAD) Failure real-image (Can’t Harajuku)
  17. StyleCariGAN[Jang+] 17

  18. StyleCariGAN[Jang+] Contribution • Shape Exaggeration Blocks • Modulate course features

    to produce caricature shape exaggerations • Novel Architecture for Caricature Generation 18
  19. StyleCariGAN[Jang+] Method 19 Layer Swap Style MixingResults • CycleGAN-approach •

    Keep contents but transfer caricature style w/ WebCariA 50 Label α 1 α 4
  20. StyleCariGAN[Jang+] Result 20 vs. StyleGAN Inversion vs. I2I Translation Caricature

    to Real FID vs. I2I Translation
  21. AgileGAN[Song+] 21

  22. AgileGAN[Song+] Contribution • Achieve to generate high quality stylistic portraits.

    • Introduce hierarchical VAE, which embed in , to enforce the inverse mapped distribution conforms to follow original prior. Z+ W+ 22
  23. AgileGAN[Song+] Method 23 Novel VAE Architecture (Embed in ) Z+

    Reparameterization Trick t-SNE Visualization Content Difference
  24. AgileGAN[Song+] Results 24 vs. I2I Translation vs. Inversion (Use fine-tuned

    stylization model) Ablation Semantic Editing
  25. e4e[Tov+] 25

  26. e4e[Tov+] Contribution • Study the latent space of StyleGAN •

    Propose to consider distortion and perceptual quality of reconstructed image. • Propose two principles for designing encoders — controls proximity to based on distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. W 26
  27. e4e[Tov+] Method 27 W : prior, Wk : OoD latent

    OoD latent achieves better editability and distortion (Tradeoff) Restrict variance (Blue Arrow) Guide towards prior (Red Arrow) Overview of objectives for latents W Wk End-to-End Architecture
  28. e4e[Tov+] Results 28 (pSp[Alaluf+ CVPR21]) vs. pSp Editing vs. Optimization

    (Optimization is unsuitable for editing.)
  29. SWAGAN[Gal+] 29

  30. SWAGAN[Gal+] Contribution • Previous GAN suffer from degradation in quality

    for high-frequency content… • Propose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain of Haar Wavelet (Not RGB). • Achieve Faster training (x0.25 time) • (Weakness point) Inversion methods using encoders suffer from acute high- frequency shortcomings, since their use of L2 based losses. 30
  31. SWAGAN[Gal+] Method 31 Overall architecture Generator Discirminator To upsample, once

    converted to RGB domain
  32. SWAGAN[Gal+] Results (1/2) 32 Generated samples Comparison of time [s]

    to process 1,000 imgs Bi: Proposed, NWD: Non-Wavelet-Discriminator, NU: Neural Upsample SWAGAN-Bi SWAGAN-NU
  33. SWAGAN[Gal+] Results (2/2) 33 Optimized latent code interpolation

  34. StyleFlow[Abdal+] 34

  35. StyleFlow[Abdal+] Contributions • Propose StyleFlow — controls the generation process

    of attribute conditions and is formulated Conditional Continuous Normalizing Flows. • Refer • Normalizing Flowೖ໳γϦʔζ, https://tatsy.github.io/blog/ • The best overview article of normalizing flow!!!!!!!!!!!!! 35 Unknown Distribution Known Distribution Invertible Normalizing Flow
  36. StyleFlow[Abdal+] Method 36 Optimized by Neural ODE Solver[Chen+ NeurIPS18 Best

    Paper]
  37. StyleFlow[Abdal+] Results 37

  38. Conclusion • Introduce StyleGAN Architecture and Projection • Explain six

    papers of “Image Editing with GANs” session. • Surprisingly, all papers employ StyleGAN!! 38
  39. Appendix GAN Inversion • Image2StyleGAN: How to Embed Images Into

    the StyleGAN Latent Space?[Abdal+ ICCV19] • ࠷ॳʹStyleGANʹຒΊࠐΈΛ΍ͬͨ࿦จ • Image2StyleGAN++: How to Edit the Embedded Images?[Abdal+ CVPR20] • Semantic Editingͷ࿩ • ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement[Alaluf+ ICCV21] • pSp[Alaluf+ CVPR21]ͷ༧ଌ݁Ռʹ࢒ࠩΛ௥Ճ͍ͯ͘͠࿩ 39
  40. Appendix Space of Style GAN • StyleSpace Analysis: Disentangled Controls

    for StyleGAN Image Generation[Wu+ CVPR21] • Convͷಛ௃ྔۭؒΛSۭؒͱఆٛ͠ɼ ΑΓDisentanglement͞Ε͍ͯΔͱ ൃݟ W+ 40