StyleGAN2 への道のり / Toward StyleGAN2

Slide 1

Slide 1 text

StyleGAN2 への道のり R&D Group インターン⽣内⽥奏「画像処理 × 深層学習勉強会」

Slide 2

Slide 2 text

※ 掲載されている内容等は発表時点の情報です。 ※ 公開に当たり、資料の⼀部を変更・削除している場合があります。

Slide 3

Slide 3 text

本発表の背景

Slide 4

Slide 4 text

Data Strategy and Operation Center 名刺取り込み背景分離画像補正 1 項⽬分割 2 セキュリティー項⽬細分割、項⽬⼊⼒ 3 チェック＆補正 5 マージ 4 マイクロタスク×マルチソーシングによる独⾃の名刺データ化システム名刺データ化システム「GEES」セキュアな環境を構築

Slide 5

Slide 5 text

Data Strategy and Operation Center 個⼈情報保護に関すること • Sansanのデータはお客様からお預かりしているデータ • 全社員に個⼈情報保護⼠資格の取得を義務づけ，細⼼の注意を払って取り扱い • データ消去について • 個⼈情報保護に関する法律第19条における規定を遵守 • 「個⼈情報取扱事業者は、(中略)、利⽤する必要がなくなったときは、当該個⼈データを遅滞なく消去するよう努めなければならない。」 • 再現性・デモ⽤途などを考えると，永続的に使えるデータがあると嬉しい • NII (国⽴情報学研究所) にダミーデータを提供してはいるが，枚数などには限りがあるダミーの名刺画像を⾃動で⽣成したい!

Slide 6

Slide 6 text

ここでクイズ

Slide 7

Slide 7 text

Data Strategy and Operation Center Which face is real? Images from http://www.whichfaceisreal.com/

Slide 8

Slide 8 text

Data Strategy and Operation Center Which face is real? Images from http://www.whichfaceisreal.com/

Slide 9

Slide 9 text

Data Strategy and Operation Center StyleGAN2 がすごい Video from https://twitter.com/SkyLi0n/status/1212857350928945152 ⽂字がかなり綺麗に⽣成できている → ダミー名刺⽣成への淡い期待

Slide 10

Slide 10 text

StyleGAN2までの道のり ※ 図表は論⽂から引⽤しています．

Slide 11

Slide 11 text

Data Strategy and Operation Center History of GANs Vannila GAN [I. Goodfellow et al. NeurIPS 2014] DCGAN [A. Radford et al. ICLR 2016] PGGAN [T. Karras et al. ICLR 2018] WGAN-GP [I. Gulrajani et al. NeurIPS 2017] BigGAN [A. Brock et al. ICLR 2019] StyleGAN [T. Karras et al. CVPR 2019] StyleGAN2 [T. Karras et al. 2019] old new SNGAN [T. Miyato et al. ICLR 2018] SAGAN [H. Zhang et al. ICML 2018] hogeGANの論⽂数の累計 (https://github.com/hindupuravinash/the-gan-zoo)

Slide 12

Slide 12 text

Data Strategy and Operation Center 論⽂情報 • StyleGAN • タイトル: A Style-Based Generator Architecture for Generative Adversarial Networks • 著者: T. Karras1, S. Laine1, T. Aila1 • 所属: 1) NVIDIA • 特徴: 従来に⽐べて安定的な⾼解像度画像⽣成 & 滑らかな潜在空間 • StyleGAN2 • タイトル: Analyzing and Improving the Image Quality of StyleGAN • 著者: T. Karras1, S. Laine1, M. Aittala1, J. Hellsten1, J. Lehtinen1,2, T. Aila1 • 所属: 1) NVIDIA, 2) Aalto University • 特徴: StyleGAN特有のアーティファクトを除去

Slide 13

Slide 13 text

Data Strategy and Operation Center StyleGANのキーポイント • Style-based generator 1. Synthesis network • 潜在変数をスタイルとして⼊⼒ • AdaINによってスタイルを適⽤ 2. Mapping network • 潜在変数を別の空間に⾶ばしてから⼊⼒ • Progressive growing • PPL (perceptual path length) の提案 • 潜在空間の滑らかさを評価する指標 ⒉ ⒈

Slide 14

Slide 14 text

Data Strategy and Operation Center AdaIN (adaptive instance normalization) • Style-transfer [X. Huang et al. ICCV 2017] で⽤いられる正規化⼿法 1. ⼊⼒特徴 # を平均 (#) ，標準偏差 (#) を⽤いて正規化 2. スタイル # = [ ,,#, .,# ] を⽤いて特徴を線形変換 AdaIN #, # = ,,# # − (# ) (#) + .,# AdaINによるスタイル変換

Slide 15

Slide 15 text

Data Strategy and Operation Center Mapping network では何をしているのか？ • 学習データは⼀部⽋損している e.g. ⻑髪 + 男性のデータ • 無理やり正規分布などを仮定すると空間が歪む • 特徴空間上で少し値を動かしただけで出⼒が⼤きく変わったりする • MLPに通すことで線形補間できるような空間に⾶ばしたいという気持ち

Slide 16

Slide 16 text

Data Strategy and Operation Center Progressive growing • PGGAN [T. Karras et al. ICLR 2018] ⽤いられる学習の安定化⼿法 • 低解像度で学習→層を追加して学習を繰り返す • 低解像度画像はクラス情報に乏しく，モードが少ないため学習が安定 • スクラッチでフル解像度を学習するより2-6倍収束が速い Progessive growing による学習過程

Slide 17

Slide 17 text

Data Strategy and Operation Center PPL (Perceptual Path Length) • 良い潜在空間 = 潜在変数の変化に対して滑らかに画像が変化 • 潜在変数 , を補間して，線分間の知覚的距離を計測 • 距離の総和が⼩さい⽅が特徴空間のコントロール性の⾼いといえる • 距離尺度には LPIPS [R. Zhang et al. CVPR 2018] を利⽤

Slide 18

Slide 18 text

Data Strategy and Operation Center StyleGANの出⼒画像の特徴 ① • Droplet artifact • 画像の⼀部に滴状のアーティファクトが出現 • AdaINが強すぎる説 • バッチ統計量は使わずに確率的な仮定による正規化を⽬指す

Slide 19

Slide 19 text

Data Strategy and Operation Center ネットワーク構造の再考 • AdaINを normalization / modulation に分解して再考 • バイアス・ノイズ加算と normalization を⼊れ替え → 特徴のコントロール性向上 • 平均によるシフトを廃⽌ → スケーリングのみで構成可能になった AdaIN #, # = ,,# # − (#) (#) + .,# Normalization Modulation

Slide 20

Slide 20 text

Data Strategy and Operation Center Weight demodulation • スケーリング処理を畳み込みカーネルに焼き込む • Modulationは「スケーリングした重み #:; < による畳込み」と等価 • ⼊⼒特徴と重みが標準正規分布に従うとすると，出⼒特徴の標準偏差は : となる • Normalizationの処理は #:; < ∕ : と等価 → Weight demodulation #:; < = # ⋅ #:; ⋯ (1) : = B #,; #:; < C ⋯ (2) #:; << = E #:; < : ⋯ (3) ※ , , は画像，特徴マップ，位置のインデックス

Slide 21

Slide 21 text

Data Strategy and Operation Center Weight demodulation の効果 • Droplet artifact が出現しなくなった • Weight demodulation は確率的な仮定に基づいているため無理な正規化を防げる • 無駄な演算を省いたため並列化が容易となり学習時間が 40% ⾼速化

Slide 22

Slide 22 text

Data Strategy and Operation Center StyleGANの出⼒画像の特徴 ② • Phase artifact • 顔のポーズ変化に対して眼球や⻭が追従しない • Progressive growing が原因説 • 解像度ごとに学習は全く別のタスクになりうる • 低解像度マップに⾼周波成分が多くなる → 低解像度の学習を忘却 (?) ⻭並びにほぼ変化がない低解像度マップに出現する⾼周波成分

Slide 23

Slide 23 text

Data Strategy and Operation Center Progressive growing の代替案 • Skip-connectionの利⽤ • ResNet, UNet, MSG-GAN など，様々な成果を挙げている • Skip-connectionを持つ⽣成器・識別器の組み合わせを考える • ⽣成器に(b)を使うとPPL改善 & 識別機に(c)を使うとFID改善 D G original (b) (c) original 237 207 238 (b) 149 116 117 (c) 187 201 203 D G original (b) (c) original 4.32 4.18 3.58 (b) 4.33 3.77 3.31 (c) 4.35 3.96 3.79 ⽣成器・識別器の組み合わせに対するPPL ⽣成器・識別器の組み合わせに対するFID

Slide 24

Slide 24 text

Data Strategy and Operation Center Skip-connectionの導⼊結果 • Phase artifact を解決 • 暗黙的に progressive growing のような学習 • Generator は各解像度の重み付き和を出⼒ • 学習序盤は低解像度，学習が進むと⾼解像度が⽀配的学習回数に対する重みの推移 Video from https://youtu.be/c-NJtV9Jvp0

Slide 25

Slide 25 text

Data Strategy and Operation Center その他⼯夫 • Path length regularization • PPLは潜在空間の評価だけでなく画像品質の評価にも利⽤可能 • 潜在変数の変化に対して⽣成画像 () の変化が⼩さくなるようにに正則化 ,∼ , S − where = ⁄ () • 計算コストが⾼いため 1回/16iter のみ計算 (lazy regularization) • Projectionによる⽣成画像の検知 • ターゲット画像に対する潜在変数を勾配法で探索 • ⽣成画像はほぼ同⼀画像が⽣成可能 & 実画像は細部が⽋落 Projection結果-ターゲット画像間のLPIPSの分布

Slide 26

Slide 26 text

Data Strategy and Operation Center まとめ • StyleGAN • Style-based generator による⾼精度画像⽣成 • StyleGAN2 • StyleGAN2特有のアーティファクトを解消 • Weight demodulation による AdaIN の置換 & progressive growing の廃⽌ • 個⼈的な所感 • 無駄な処理を省いてシンプルにする姿勢が重要 • 時にはベースラインを疑うことも必要 • 精度向上の速度が速く，対照実験が⽢い場合が多いため

Slide 27

Slide 27 text

Data Strategy and Operation Center ⽂献リスト • [I. Goodfellow et al. NeurIPS 2014] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014. • [A. Radford et al. ICLR 2016] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). • [I. Gulrajani et al. NeurIPS 2017] Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." Advances in neural information processing systems. 2017. • [T. Miyato et al. ICLR 2018] Miyato, Takeru, et al. "Spectral normalization for generative adversarial networks.” arXiv preprint arXiv:1802.05957 (2018). • [T. Karras et al. ICLR 2018] Karras, Tero, et al. "Progressive growing of gans for improved quality, stability, and variation." arXiv preprint arXiv:1710.10196 (2017). • [H. Zhang et al. ICML 2018] Zhang, Han, et al. "Self-attention generative adversarial networks." arXiv preprint arXiv:1805.08318 (2018).

Slide 28

Slide 28 text

Data Strategy and Operation Center ⽂献リスト • [A. Brock et al. ICLR 2019] • Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale gan training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018). • [T. Karras et al. CVPR 2019] • Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. • [T. Karras et al. 2019] • Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." arXiv preprint arXiv:1912.04958 (2019). • [X. Huang et al. ICCV 2017] • Huang, Xun, and Serge Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proceedings of the IEEE International Conference on Computer Vision. 2017. • [R. Zhang et al. CVPR 2018] • Zhang, Richard, et al. "The unreasonable effectiveness of deep features as a perceptual metric." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Slide 29

Slide 29 text

No content