Survey of Image Editing with GANs in SIGGRAPH'21

Slide 1

Slide 1 text

2021.9.12 Daichi Horita Image Editing with GANs SIGGRAPH 2021ษڧձ — https://siggraph.xyz/s2021/ 1

Slide 2

Slide 2 text

ࣗݾ঺հ • “͏ͲΜ”Ͱ͢ɹhttps://twitter.com/udoooom 2 https://www.shikoku-np.co.jp/udon/shop/890 “͏ͲΜόΧҰ୅” in ߳઒͕͓͢͢Ί

Slide 3

Slide 3 text

Contents Image Editing with GANs • What is “Image Editing with GANs”? • Introduction • StyleGAN Architecture • Projection into StyleGAN • Enjoy SIGGRAPH’21 accepted papers! • Conclusion • Appendix — More recent papersʢମྗ͕ਚ͖ͨͷͰλΠτϧͱҰݴͷΈʣ 3

Slide 4

Slide 4 text

What is — “Image Editing with GANs”? 4 Virtual Try On Cartoonization Appearance / Pose Editing Attribute Editing Input Reference Output Output Input Input Appearance Editing Pose Editing Input + Illumination + Pose - Pose + Expression

Slide 5

Slide 5 text

StyleGAN Architecture 5 Introduction — Original StyleGAN[Karras+ CVPR19] PGGAN[Karras+ CVPR18] StyleGAN[Karras+ CVPR19] W+ W Generated Images by StyleGAN

Slide 6

Slide 6 text

StyleGAN Architecture 6 Introduction — Style Mixing 4 × 4 8 × 8 16 × 16 1024 × 1024 512 × 512 ɾɾɾ Space W Space W+ Mapping Net AdaIN Structure (Source A) Style (Source B) const. Output

Slide 7

Slide 7 text

StyleGAN Architecture 7 StyleGAN[Karras+ CVPR19] Introduction — StyleGAN vs StyleGAN2 StyleGAN2[Karras+ CVPR20] Feature Modulation by AdaIN[Huang+ ICCV17] Weight Demodulation (Simplify Style Block) W+ W+

Slide 8

Slide 8 text

Projection into StyleGAN 8 Introduction — Motivation Real-world Face Image W+ Space How can we achieve to embed real images in StyleGAN prior? If it was success, we could edit real-world images! (*Success to reconstruct image via embedding ≠ Editability Semantics)

Slide 9

Slide 9 text

Projection into StyleGAN 9 Introduction — Image2StyleGAN[Abdal+ ICCV19] (Iterative Optimization) G W+ Space 1. Init 3. Calc loss 2. Generate 4. Update w* Final Accurate reconstructed result Slow No generalization to space

Slide 10

Slide 10 text

Projection into StyleGAN 10 Fast inference Can’t reconstruct details Not work for OoD sample Introduction — pSp[Alaluf+ CVPR21] (Learning Encoder) Input Output https://twitter.com/notlewistbh/status/1432936600745431041?s=20 Please search “Face Depixelizer” in Twitter Can’t ignore strong prior

Slide 11

Slide 11 text

Enjoy SIGGRAPH’21 accepted papers! 11 TryOnGAN[Lewis+] StyleCariGAN[Jang+] AgileGAN[Song+] e4e[Tov+] SWAGAN[Gal+] StyleFlow[Abdal+]

Slide 12

Slide 12 text

TryOnGAN[Lewis+] 12

Slide 13

Slide 13 text

TryOnGAN[Lewis+] 13 • Previous: Paired-Image-to-Image Translation-based Try On • Propose: StyleGAN-based Try On • Design a pose conditioned StyleGAN2 with Segmentation / Image generation branches Contribution

Slide 14

Slide 14 text

TryOnGAN[Lewis+] 14 G Overview Input Ip Reference g G Output Ig Generated Image Real Image • Lost high-frequency details • High-quality!

Slide 15

Slide 15 text

TryOnGAN[Lewis+] 15 Method 1st: Train StyleGAN2 2nd: Optimize σp, σq Style mixing per layer!

Slide 16

Slide 16 text

TryOnGAN[Lewis+] Results 16 Real-Image In. Ref. Previous methods Ours In. Ref. Ours Generated-Image Original StyleGAN2 (NOT BAD) Failure real-image (Can’t Harajuku)

Slide 17

Slide 17 text

StyleCariGAN[Jang+] 17

Slide 18

Slide 18 text

StyleCariGAN[Jang+] Contribution • Shape Exaggeration Blocks • Modulate course features to produce caricature shape exaggerations • Novel Architecture for Caricature Generation 18

Slide 19

Slide 19 text

StyleCariGAN[Jang+] Method 19 Layer Swap Style MixingResults • CycleGAN-approach • Keep contents but transfer caricature style w/ WebCariA 50 Label α 1 α 4

Slide 20

Slide 20 text

StyleCariGAN[Jang+] Result 20 vs. StyleGAN Inversion vs. I2I Translation Caricature to Real FID vs. I2I Translation

Slide 21

Slide 21 text

AgileGAN[Song+] 21

Slide 22

Slide 22 text

AgileGAN[Song+] Contribution • Achieve to generate high quality stylistic portraits. • Introduce hierarchical VAE, which embed in , to enforce the inverse mapped distribution conforms to follow original prior. Z+ W+ 22

Slide 23

Slide 23 text

AgileGAN[Song+] Method 23 Novel VAE Architecture (Embed in ) Z+ Reparameterization Trick t-SNE Visualization Content Difference

Slide 24

Slide 24 text

AgileGAN[Song+] Results 24 vs. I2I Translation vs. Inversion (Use ﬁne-tuned stylization model) Ablation Semantic Editing

Slide 25

Slide 25 text

e4e[Tov+] 25

Slide 26

Slide 26 text

e4e[Tov+] Contribution • Study the latent space of StyleGAN • Propose to consider distortion and perceptual quality of reconstructed image. • Propose two principles for designing encoders — controls proximity to based on distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. W 26

Slide 27

Slide 27 text

e4e[Tov+] Method 27 W : prior, Wk : OoD latent OoD latent achieves better editability and distortion (Tradeoff) Restrict variance (Blue Arrow) Guide towards prior (Red Arrow) Overview of objectives for latents W Wk End-to-End Architecture

Slide 28

Slide 28 text

e4e[Tov+] Results 28 (pSp[Alaluf+ CVPR21]) vs. pSp Editing vs. Optimization (Optimization is unsuitable for editing.)

Slide 29

Slide 29 text

SWAGAN[Gal+] 29

Slide 30

Slide 30 text

SWAGAN[Gal+] Contribution • Previous GAN suffer from degradation in quality for high-frequency content… • Propose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain of Haar Wavelet (Not RGB). • Achieve Faster training (x0.25 time) • (Weakness point) Inversion methods using encoders suffer from acute high- frequency shortcomings, since their use of L2 based losses. 30

Slide 31

Slide 31 text

SWAGAN[Gal+] Method 31 Overall architecture Generator Discirminator To upsample, once converted to RGB domain

Slide 32

Slide 32 text

SWAGAN[Gal+] Results (1/2) 32 Generated samples Comparison of time [s] to process 1,000 imgs Bi: Proposed, NWD: Non-Wavelet-Discriminator, NU: Neural Upsample SWAGAN-Bi SWAGAN-NU

Slide 33

Slide 33 text

SWAGAN[Gal+] Results (2/2) 33 Optimized latent code interpolation

Slide 34

Slide 34 text

StyleFlow[Abdal+] 34

Slide 35

Slide 35 text

StyleFlow[Abdal+] Contributions • Propose StyleFlow — controls the generation process of attribute conditions and is formulated Conditional Continuous Normalizing Flows. • Refer • Normalizing Flowೖ໳γϦʔζ, https://tatsy.github.io/blog/ • The best overview article of normalizing ﬂow!!!!!!!!!!!!! 35 Unknown Distribution Known Distribution Invertible Normalizing Flow

Slide 36

Slide 36 text

StyleFlow[Abdal+] Method 36 Optimized by Neural ODE Solver[Chen+ NeurIPS18 Best Paper]

Slide 37

Slide 37 text

StyleFlow[Abdal+] Results 37

Slide 38

Slide 38 text

Conclusion • Introduce StyleGAN Architecture and Projection • Explain six papers of “Image Editing with GANs” session. • Surprisingly, all papers employ StyleGAN!! 38

Slide 39

Slide 39 text

Appendix GAN Inversion • Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?[Abdal+ ICCV19] • ࠷ॳʹStyleGANʹຒΊࠐΈΛ΍ͬͨ࿦จ • Image2StyleGAN++: How to Edit the Embedded Images?[Abdal+ CVPR20] • Semantic Editingͷ࿩ • ReStyle: A Residual-Based StyleGAN Encoder via Iterative Reﬁnement[Alaluf+ ICCV21] • pSp[Alaluf+ CVPR21]ͷ༧ଌ݁Ռʹ࢒ࠩΛ௥Ճ͍ͯ͘͠࿩ 39

Slide 40

Slide 40 text

Appendix Space of Style GAN • StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation[Wu+ CVPR21] • Convͷಛ௃ྔۭؒΛSۭؒͱఆٛ͠ɼ ΑΓDisentanglement͞Ε͍ͯΔͱ ൃݟ W+ 40