Slide 1

Slide 1 text

2020/12/15 @udoooom Network-to-Network Translation with Conditional Invertible Neural Networks Robin Rombach∗, Patrick Esser∗, Björn Ommer IWR, HCI, Heidelberg University https://papers.nips.cc/paper/2020/file/1cfa81af29c6f2d8cacb44921722e753-Paper.pdf https://papers.nips.cc/paper/2020/file/1cfa81af29c6f2d8cacb44921722e753-Supplemental.pdf

Slide 2

Slide 2 text

Problems • Supervised models have enough great success such tasks, which are • Image Classification, Segmentation (ResNet, DeepLab Series) • Question Answering (BERT, GPT-3) • Image Generation, Translation (BigGAN, StyleGAN) • Need to find new ways to reuse such expert models!!

Slide 3

Slide 3 text

Problems • Pre-trained models have arbitrary fixed representations • StyleGAN: Image Generation • BERT: Sentence Embedding • Need domain (modal) translation with keeping the full capabilities!

Slide 4

Slide 4 text

Contribution • Propose conditionally invertible network (cINN), which is a model that can relate between different existing representations without altering them. • cINN needs no gradients of expert models.

Slide 5

Slide 5 text

Related Works Invertible Neural Networks(INN): Generative Models Figure: https://openai.com/blog/generative-models/ Base Distribution Target Distribution INN (e.g. Image2StyleGAN) Generation Conditions

Slide 6

Slide 6 text

Related Works Invertible Neural Networks(INN): Generative Models Figure: https://openai.com/blog/generative-models/ Base Distribution Target Distribution INN (e.g. Image2StyleGAN) Generation Conditions Extends Network-to-Network

Slide 7

Slide 7 text

Proposed Method Motivation • Learn relationships and transfer between representations of different domains

Slide 8

Slide 8 text

Proposed Method Motivation • : Two target domains • : Desired output, • : Latent representation • • To realize domain translation, it needs to be described probabilistically as sampling from • Denote , translation func, residuals D x , D y f(x) x ∈ D x z Φ = Φ(x) f(x) = Ψ(Φ(x)), g(y) = Λ(Θ(y)) p(z Θ |z Φ ) z Θ = τ(v|z Φ ) τ : v : x ∈ D x y 1 ∈ D y 5IFEPHJTDVUF 5IFEPHJTMPWFMZ y 2 ∈ D y z Φ = Φ(x) Λ(z Φ ) Λ(z Φ ) v

Slide 9

Slide 9 text

Proposed Method Learning a Domain Translation τ • must capture all information of not represented in , but no information that is already represented in • v z Θ z Φ z Φ v = τ−1(z Θ |z Φ ) cINN

Slide 10

Slide 10 text

Proposed Method Learning a Domain Translation τ • discards all information of , if and are independent • Minimize v z Φ v z Φ KL(p(v|z Φ )|q(v)) standard normal distribution Achieve the goal of sampling from ɹɹ , sampled from p(z Θ |z Φ ) z Θ = τ(v|z Φ ) v q(v)

Slide 11

Slide 11 text

Proposed Method Domain Transfer Between Fixed Models • • Algorithm • 1. Sample from • 2. Encode into • 3. Sample from • 4. Transform • 5. Decode into x p(x) x z Φ = Φ(x) v q(v) z Θ = τ(v|z Φ ) z Θ y = Λ(z Θ ) 2 3 4 5 1

Slide 12

Slide 12 text

Experiments 1. BERT-to-BigGAN Translation • Compare IS and FID with baselines using COCO-stuff dataset CVPR19 CVPR18 ICCV17 CVPR19

Slide 13

Slide 13 text

Experiments 2. Reusing a single target generator • Encoder: (a, b) DeepLab, (c, d) ResNet50 Super-Resolution with Auto-encoder

Slide 14

Slide 14 text

Experiments 2. Reusing a single target generator • How the invariances increase with increasing layer depth for visualization

Slide 15

Slide 15 text

Experiments 3. Image Editing: Conditional I2I [8] StarGAN[Choi+ CVPR18]

Slide 16

Slide 16 text

Experiments 3. Image Editing: Exemplar-Guided Translation and Uns. Disentangling

Slide 17

Slide 17 text

Experiments 3. Image Editing: Unpaired I2I

Slide 18

Slide 18 text

Conclusion • Propose cINN technique for reusing pre-trained models • NLP-to-Image • Image-to-Image • Label-to-Image • Achieve eco-friendly method