A First Step to Flow-Based Generative Models

A First Step to Flow-Based Generative Models 2019/10/05 NLP/CV SoTA
Survey Challenge #4 古川遼 1

⾃⼰紹介 2 n 古川遼 – 出⾝ • 数理科学研究科 (幾何・トポロジー)
– 現在 • 分析会社勤務 • 主に受託分析. ここ1年は, 物体検出など動画像認識をやる機会が多かった. – 興味・関⼼ • 深層⽣成モデル. • 機械学習と幾何が関係ありそうなところ. – その他 • DLは業務や松尾研の公開講座などで勉強.

この資料の注意点 3 n 勉強会⽤の資料です. n 個⼈の理解に基づいて作成したため, 誤りが含まれる可能性もあります. 詳細を知りたい場合は, 元論⽂や公開実装を⾒ることをお勧めします. n
実験結果の再現はしておらず, 論⽂に書いてあることを元にしています. n 引⽤元が明らかな場合は, 省略していることがあります. n (最終更新⽇: 2019/10/9) ⼤きく構成は変えませんが, 気づいた誤記・不適切な説明は, 2019年10 ⽉を⽬処に適宜修正します.

⽬次 4 1. Flow-based ⽣成モデルの概要 – Normalizing Flow とは –
Flow-based ⽣成モデルとは – Flow-based ⽣成モデルの困難 – Normalizing Flows の応⽤ – ベンチマークデータセット – 2章で紹介するモデルの範囲 – 画像⽣成モデルの精度⽐較 2. 各モデルの紹介 – 画像⽣成 • Coupling Flows – NICE, RealNVP, Glow, Flow++, CAGlow (画像合成) • Residual Flows – i-ResNet, Residual Flows • Infinitesimal (Continuous) Flows – Continuous Normalizing Flow (Neural ODEs), FFJORD – 動画⽣成 – VideoFlow

1. Flow-based ⽣成モデルの概要 5

Normalizing Flow とは 6 n Normalizing Flow は, – [Rezende
and Mohamed, 2015] (Planar and Radial Flow) で, variational inference に使われ, – [Dinh et al., 2015] (NICE) で, 画像データセットの密度推定に使われた. これらによって, 広く認知され⽣成モデルに使われるようになった. 概念は, これらより前からあった. [Tabak and Vanden-Eijnden, 2010.] [Tabak and Turner, 2013.] Li’Log: L. Weng, Flow-based Deep Generative Models https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html]

Normalizing Flow とは 7 n Flow, Normalizing Flow (cf. [Rezende
and Mohamed, 2015] を参考.) – " ∈ ℝ% を確率密度関数 ": ℝ% → ℝ を持つ確率変数とする. (通常, 標準正規分布など `簡単な` 分布とする. ) * : ℝ% → ℝ%, = 1, 2, … , , を微分同相写像*とし, * を * = * (*34 ) で定義される確率変数とする. この時 * の確率密度関数 * は, * * = *34 * 34 * det * 34 * (*) = *34 * 34 * det *34 * 34 * 34 となる. ここで, ;< ;=>?@ *34 はの *34 における Jacobian である. 確率変数の列 ", 4, … , ZB を (finite) flow と⾔う. 確率分布 (確率密度関数) の列 " , 4 , … , C を (finite) normalizing flow という. – " = で, 時刻に関して連続な微分同相 G 与えらたとき, (infinitesimal) flow , (infinitesimal) normalizing flow を `同様` に定める. – * ⽂献では, 単に可逆 (invertible) と⾔うことが多い. 微分同相より条件がゆるくて良いはず.

Flow-based ⽣成モデルとは 8 n Flow-based ⽣成モデルとは – Normalizing Flow を使った⽣成モデル.
– 狭義では, Normalizing Flow を使って, 尤度最⼤化によりデータの分布の密度推定をするモデル. – Variational Inference (尤度のELBOの最⼤化) に normalizing flow を使うモデルもある. そのようなモデルを含んでも良い (気がする). – Normalizing flow ⾃体を, ⽣成モデルの⼀種といっている⽂献もある. Li’Log: L. Weng, Flow-based Deep Generative Models https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html]

Flow-based ⽣成モデルの困難 9 n 尤度最⼤化での密度推定において, (finite) Normalizing Flow の困難の1つは, Jacobi
⾏列の (log-)determinant の計算. H = * *34 , = 1, 2, … , , = C ∘ C34 ∘ ⋯ ∘ 4 " , log O = log "(") − Q *R4 C log det * *34 (*34) . ∈ ℝ% とすると, determinant 部分の計算量は, (\). 画像データなど次元が⾼い場合は, 計算量を抑える必要がある. n 様々な⼯夫 – Jacobi ⾏列が三⾓⾏列になるように変換を制限 – Infinitesimal Flow を⽤いて, determinant ではなく trace の計算にする – 直接計算せず, 推定量を計算する – …

Normalizing Flows の応⽤ 10 n Normalizing Flows の⽣成への応⽤例 (cf. [Kobyzev,
Prince and Brubaker, 2019] Introduction) • 画像⽣成 – Glow [Kingma and Dhariwal, 2018] – Flow++ [Ho et al., 2019] – … • 動画⽣成 – VideoFlow [Kumar et al., 2019] – … • ⾳声⽣成 – FloWaveNet [Kim et al., 2018] – Waveglow [Prenger et al., 2019] – Flow synthesizer [Esling et al., 2019] – … • Graph ⽣成 – GraphNVP [Madhawa et al., 2019] – … • 強化学習 – FloRL(?) [Nadeem Ward et al., 2019] – …

Normalizing Flows の分類 11 n アーキテクチャによる Normalizing Flows の分類 (cf.
[Kobyzev, Prince and Brubaker, 2019] Section 3) いくつかにまたがるものもある. – Finite Flows • Elementwise Bijections • Linear Flows • Planar and Radial Flows • Coupling Flows • Autoregressive Flows • Residual Flows – Infinitesimal Flows • ODE-based methods • SDE-based methods (Langevin Flows) n Coupling Flows と Autoregressive Flows ではともに, Coupling Layers がbuilding blockとして使われる. – Coupling layers の種類 • affine coupling, coupling with continuous mixture CDFs, splines, …

ベンチマークデータセット (画像) 12 n 画像⽣成 – 共通して多く使われるもの • MNIST: http://yann.lecun.com/exdb/mnist/
• CIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.html • ImageNet 32x32: http://image-net.org/small/download.php • ImageNet 64x64: http://image-net.org/small/download.php – その他 • CelebA HQ • Omniglot: https://github.com/brendenlake/omniglot • LSUN: https://www.yf.io/p/lsun • CIFAR-100: https://www.cs.toronto.edu/~kriz/cifar.html • SVHN: http://ufldl.stanford.edu/housenumbers/ • etc.

ベンチマークデータセット (テーブル) 13 n テーブルデータ – UCI Datasets • Individual
household electric power consumption: http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+pow er+consumption • Gas Sensor Array Drift Dataset: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic +gas+mixtures • HEPMASS: http://archive.ics.uci.edu/ml/datasets/HEPMASS • MiniBooNE particle identification: http://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification – BSDS300: https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/

n 2D の toy データセットで密度推定の実験結果を定性的に確かめることもある. ベンチマークデータセット (その他) 14 [Behrmann et
al., 2019.] [Chen et al., 2018.] [Grathwohl et al., 2019.]

2章で紹介するモデルの範囲 15 n ⽣成モデルの⽅法 – 基本的に, 尤度最⼤化のモデルを紹介. n 応⽤ –
主に, 画像⽣成に関するものを紹介 (動画⽣成も1つ紹介.) n アーキテクチャ主に, ⻘字のモデルを紹介. – Finite Flows • Elementwise bijections • Linear Flows • Planar and Radial Flow • Coupling Flows – 紹介する例: NICE, RealNVP, Glow, Flow++, CAGlow (画像合成) • Autoregressive Flows • Residual Flows – 紹介する例: i-ResNet, Residual Flow – Infinitesimal Flows • ODE-based methods – 紹介する例: Continuous Normalizing Flow (Neural ODEs), FFJORD • SDE-based methods (Langevin Flows)

画像⽣成モデルの精度⽐較 16 n 評価指標 – bits per dimension log] ()
/ の平均が使われることが多い. ここで, モデルからのサンプル ∈ ℝ% の尤度を () としている. n 代表的な flow-based の画像⽣成モデルの精度 – 3つのデータセットでFLOW++がSOTA. [Kobyzev et.al. 2019]

画像⽣成モデルの精度⽐較 17 n 参考 – ただし, (unconditional) な画像合成という観点から⾒ると autoregressive model
が SOTA. [Ho et al., 2019]

2. 各モデルの紹介 18

今回紹介するモデル 19 n 画像⽣成 – Coupling Flows • NICE, RealNVP,
Glow, Flow++, CAGlow (画像合成) – Residual Flows • i-ResNet, Residual Flows – Infinitesimal Flows: • Continuous Normalizing Flow (Neural ODEs), FFJORD n 動画⽣成 • VideoFlow

Glow, Flow++, CAGlow (画像合成) – Residual Flows • i-ResNet, Residual Flows – Infinitesimal Flows: • Continuous Normalizing Flow (Neural ODEs), FFJORD n 動画⽣成 – VideoFlow

NICE 21 n Non-linear Independent Components Estimation. – ICLR 2015.
L. Dinh, D. Krueger, Y. Bengio. n 概要 – データ分布を潜在空間の factorize された分布への変換するための⾮線形な変換を提案. – Coupling layer を提案し, 変換を coupling layer の合成で実現. – Inpaintingに応⽤.

NICE 22 n General coupling layer – ∈ ℝ%, 4
, ] を {1, 2, … , } の分割 ( 4 = ), を ℝ% 上の関数とする. – x= (b@ , bc ) から = (b@ , bc ) への変換を次で定義 b@ = b@ , bc = bc , b@ . : ℝ%3f× ℝf → ℝ%3f は, ℝ%3f×{}, ∈ ℝf , に制限すると可逆関数. – このような可逆な変換 (の層) を coupling layer という. を coupling law, を coupling function という. – 逆変換は次のようになる. b@ = b@ , bc = i ℝj?k× l m@ 34 bc , b@ .

NICE 23 n Additive coupling layer – Additive coupling law
, = + を⽤いる. b@ = b@ bc = bc , b@ = bc + b@ – Jacobian は対⾓成分が1の下三⾓⾏列. よって, Jacobian の⾏列式は1になる. = f 0 bc b@ bc bc = f 0 bc b@ %3f , det = 1. – この論⽂では additive coupling layer を採⽤.

NICE 24 n Rescaling – Additive coupling layer は体積保存 (Jacobian
の⾏列式が1). – Additive coupling layer を積み重ねた後に rescaling する層を⼊れる. * ↦ ** n 事前分布 – 潜在変数の事前分布は factorize 可能なものを選択 (独⽴な確率変数の組み合わせ) s = t fR4 % sk (f) 各標準 sk は標準 Gaussian 分布やlogistic 分布.

NICE の実験結果 25 n 画像の⽣成モデルの精度 – TFD (Toronto Face Dataset),
SVHN, CIFAR-10 で実験. GRBM などより⾼い対数尤度を実現. n Inpainting – MNISTで訓練された⽣成モデルで inpainting を実験. – Inpainting ⽤に訓練したわけではないが, reasonable な inpainting を実現.

RealNVP 26 n Density estimation using Real NVP. – ICLR
2017. L. Dinh, J. Sohl-Dickstein, S. Bengio. n 概要 – NICE の構造を改良し, 体積保存でない変換⼿法 RealNVP (real-valued non-volume preserving) を提案. – 対数尤度の正確な計算, 正確かつ効率的なサンプリング, 潜在変数の正確かつ効率的な推論が可能. – 画像データセットの, サンプリング(合成)・対数尤度・データ操作, において有⽤であることを⽰した.

RealNVP 27 n Affine coupling layer – Affine coupling layer
を導⼊. (NICE では additive coupling layer 使⽤.) 4:f = 4:f fu4:% = fu4:% ⊙ exp 4:f + 4:f ここで, , : ℝf → ℝ%3f, ⊙は成分ごとの掛け算. 逆変換は次のようになる 4:f = 4:f fu4:% = (fu4:%− 4:f ) ⊙ exp − 4:f – Jacobi ⾏列は下三⾓⾏列になる. = f 0 fu4:% 4:f diag (exp((4:f ))) ここで, diag exp( 4:f ) はベクトル exp( 4:f ) の成分を対⾓成分とする対⾓⾏列. – 実装では, , として rectified convolution layer を採⽤.

RealNVP 28 n Masked convolution – 次元の分割 ( = (4:f,
fu4:%) など) は binary mask を⽤いた masked convolution を⽤いて実装; fu4:% = fu4:% ⊙ exp 4:f + 4:f は, 実装上は次のように計算される = ⊙ + 1 − ⊙ ⊙ exp ⊙ + ⊙ . – 次元の空間⽅向の分割と次元のチャネル⽅向の分割を使⽤. Binary mask は各々次のようになる. 実際は, 空間⽅向の分割 > squeeze >チャネル⽅向の分割という流れ. 値: 0 値: 1

RealNVP 29 n Multi-scale architecture – Squeezing を⽤いて, multi-scale architecture
を実装. • Squeezing: 2x2xc の subsquare を 1x1x4c にする操作. – 各 step (scaleに対応) で次のように層を組み合わせる. (最終 step は異なる). • (Coupling layer with checkerboard mask) x3 • Squeeze • (Coupling layer with channel-wise mask) x3 – 各 step で次元の分割して, 半分だけ次の step へ渡す. 残り半分はその step (scale) の出⼒とする. – 最後に全ての scale の出⼒を concatenate.

RealNVP 30 n Alternating pattern – 次元の分割を交互に⾏う. n Batch normalization
の変種 – 過去の batch の statistic も組み込む. ̂ G , | G ]は時刻の batch の平均・分散. • Gu4 = • G + 1 − ̂ G, • Gu4 ] = • G ] + 1 − | G ]. ↦ − • • ] + . – ⼩さい batch サイズでの効果を期待.

n 画像合成 – Pixel CNN には及ばないが, 他の⽣成モデルと competitive. RealNVP の実験結果
31 CIFAR-10 Imagenet (32x32) Imagenet (64x64) CelebA LSUN (bedroom) データセットモデルからのサンプリング

n 画像の操作 – 潜在操作を操作するとどのように変わるかを可視化. 各データセットの4つのサンプルを潜在空間へ移し, 変形. RealNVP の実験結果 32

Glow 33 n Glow: Generative Flow with Invertible 1×1 Convolutions.
– NIPS 2018. D. P. Kingma, P. Dhariwal. n 概要 – RealNVPから, 主にactivation normalization, 1x1 convolution を導⼊し, 画像合成における精度を向上させた. n アーキテクチャ – RealNVPと同じ multi-scale architecture. step of flow の部分を新しく変更.

Glow のアーキテクチャ 34 n アーキテクチャ(続き) – 特徴マップのshapeを ℎ×× とする. 以下
(, ) は空間⽅向 ℎ× のindicesとする. – Activation normalization (actnorm) • batch normalization では, minibatch size (per-PU) が⼩さい時に activation の noise を増加させてしまう可能性がある. • batch normalization の代わりに activation normalization (actnorm) を提唱. *,ˆ = ⊙ *,ˆ + • 初期化: 最初の minibatch で post-actnorm activations が channel ごとに, 平均0・分散1となるように初期化 (データ依存の初期化). • 初期化後: scale, bias パラメータ (, ) は, データによらず訓練可能なパラメータ. – Invertible 1x1 convolution • NICE, RealNVP では, channel の順番を逆にする置換の操作が⼊っていた. • ⼊⼒ channel 数と出⼒ channel 数が同じ1 × 1 convolution を導⼊し, channel の置換の⼀般化を実現 (しようとする). *,ˆ = *,ˆ • 重みの初期値は, ランダムな c× の回転⾏列. • det の計算はLU分解を使うことで, 計算量を \ から () へ減らす. = • + diag , log |det | = sum(log ||)

Glow の実験結果 35 n 画像合成においてflow-basedモデルの精度向上. n 結果 (定量) – 1x1
convolution は精度向上に寄与. – RealNVP より低い負の対数尤度を実現.

Glowの実験結果 36 n 結果 (定性) の CelebA での例 – 綺麗なサンプリングを実現.
– 属性の操作がある程度可能. – 潜在空間のでの線型補間でうまく画像が変換される.

Flow++ 37 n Flow++: Improving Flow-Based Generative Models with Variational
Dequantization and Architecture Design. – ICML 2019. J. Ho, X. Chen, A. Srinivas, Y. Duan, P. Abbeel. n 概要 – Flow modelにおける3つの inefficiency 1. uniform noise を使った dequantization を⾏なっていることが, training loss と汎化を妨げている。 2. affine coupling flows は表現⼒は不⼗分. 3. coupling layer の conditioning network の convolutional layers は⼗分に強⼒でない. – 3つの inefficiency の改善策を提案 1. variational flow-based dequantization を採⽤. 2. coupling layerの改善1: logistic mixture cumulative density function (CDF) coupling flow を導⼊. 3. coupling layerの改善2: conditional network に self-attention を導⼊.

Flow++ 38 1. Variational flow-based dequantization • 画像データは, 離散分布 (f‘G‘(),
∈ 0, 1, … , 255 )を持つが, モデリングは連続分布を仮定して⾏われる. 画像データにノイズを加え(dequantization) , モデリングする. • Uniform dequantization ∈ 0,1 %を⼀様ノイズとし, = + とする. の分布 f‘G‘ の連続モデル l”f•– () の対数尤度を最⼤化は, 次の離散モデル f‘G‘() = ∫ ",4 j l”f•– + の対数尤度の下限の最⼤化. m∼šk›œ› log l”f•– ≤ ž∼Ÿk›œ› log l”f•– • Variational flow-based dequantization ノイズの加えをモデリングに組みこむことで, より良い精度の密度推定を⽬指す. ž∼Ÿk›œ› ∼¡(⋅|ž) log l”f•– + ≤ ž∼Ÿk›œ› log l”f•– ここで, = ž , ∼ (0, ) は, flow-based モデルで推定. は 0, 1 %に台を持つ. (|)がによらない 0, 1 %の⼀様分布の時, uniform dequantization.

Flow++ 39 – RealNVP の coupling 変換 4 = 4
] = ] ⋅ exp ¥ 4 + ¥ 4 2. Coupling layer の改善1: logistic mixture cumulative density function coupling flow • ] の変換部分をより⼀般のnonlinearityに変更. 混合 logistic 分布の cumulative distribution function (CDF) を適⽤. 4 = 4 ] = 34 MixLogCDF ]; ¥ 4 , ¥ 4 , ¥ 4 ⋅ exp ¥ 4 + ¥ 4 ここで, () = 4 4u-®¯(3ž) (sigmoid関数) であり, MixLogCDF は混合 logistic 分布の cumulative density function; MixLogCDF ; , , = Q *R4 C * − * * 値域は (0, 1)で単調増加. 3. Coupling layer の改善2: conditional network に self-attention を導⼊. • coupling layerの 4 の成分の説明⼒をあげるために, self attention の構造を導⼊.

Flow++ の実験結果 40 n Density modeling – Non-autoregressiveなモデルでSOTA. – Autoregressiveなモデルとcompetitive

Flow++ の実験結果 41 n Ablation study – Flow++からuniform dequantizationにしたものが負の対数尤度がが最も上昇. n
Sampling

CAGlow 42 n Conditional Adversarial Generative Flow for Controllable Image
Synthesis. – CVPR 2019. R. Liu, Y. Liu, X. Gong, X. Wang, H. Li. n 概要 – Conditional な画像合成は flow-based generative models にとって挑戦的. – 画像と condition の同時分布をモデリングを元にして, conditional adversarial generative flow (CAGlow) という flow-based generative model を提案. – 潜在空間を disentangle する代わりに, 敵対的 (adversarial) な⽅法で condition の空間から潜在空間への写像を推定する.

CAGlow 43 n アーキテクチャ – データのある空間から潜在空間へflow, conditionの空間から潜在空間へのencoder, Supervision (discriminator・classifier・decoder) 部分からなる.
n ⽬的関数 ℒ = Q ±∈ ²,³,%>,´,%µ,²¶ ¸ ℒ± – ℒ² : Reversible flow の標準正規分布に対する負の対数尤度の loss. – ℒ³ : conditional 分布と real な潜在空間の JS-divergence を⼩さくするためのloss. Discriminator を使う. – ℒ%> : generated 潜在変数と, reversible flowで推論された潜在変数を区別するための loss. – ℒ´ : クラス分類のloss (softmax cross entropy or sigmoid cross entropy). – ℒ%µ : ⽣成されたされた潜在変数から unsupervised condition への decoder の loss. – ℒ²¶ : 訓練を安定させるために⾏う, feature matching の loss (2乗誤差).

CAGlow の実験結果 44 n Controllableな画像合成 – Conditonalな画像合成 • conditional Glow
(CGlow) より良いと考えられる画像を合成.

CAGlow の実験結果 45 n Controllableな画像合成 (続き) – 累積でconditioを追加した場合に, 良い画像を⽣成 –
画像の補間がうまくできる.

CAGlow の実験結果 46 n 定量的な結果 – Category preserving test •
Accuracy と FID を算出. • CAGlow は CGlow ⽐べ, accuracy が⾼く, FID が低かった (より Glow に近い). – Attribute preserving test • Precision が⾼い CelebA 訓練済み分類器を使⽤. • Attribute Mean Probability (AMP) という指標を導⼊.予測ラベルのばらつき評価. • CAGlow は CGlow より accuracy が⾼く, AMPの variance が⼩さい. – Cumulative conditions interfering test • Precision が⾼い CelebA 訓練済み分類器を使⽤. • Attribute を累積的に追加した場合に, 追加したラベル以外の AMP の差の絶対値を評価.

Glow, Flow++, CAGlow (画像合成) – Residual Flows • i-ResNet, Residual Flows – Infinitesimal (Continuous) Flows: • Continuous Normalizing Flow (Neural ODEs), FFJORD n 動画⽣成 • VideoFlow

i-ResNet 48 n Invertible Residual Networks – ICML 2019. J.
Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud, J-H. Jacobsen. n 概要 – 従来は, 可逆性 (invertiblity) 実現のために, 次元の分割やネットワークアーキテクチャの制限をしていた. – 可逆な ResNet (i-ResNet) を提案. i-ResNet は, 分類・密度推定・⽣成に使⽤可能. – i-ResNet で定義される⽣成モデルにおける Jacobian の log-determinant の近似⽅法を提案. – i-ResNet は, SOTA の画像の分類モデルや SOTA の flow-based ⽣成モデルに匹敵する精度を達成.

i-ResNet 49 n ResNet が可逆になるための⼗分条件 – ¥ : ℝ% →
ℝ%, ¥ = ¥ º ∘ ¥ º34 ∘ ⋯ ∘ ¥ 4, を Residual block ¥ G = Id + ¥œ からなる ResNet とする. この時, 各 ¥œ の Lipschitz 定数について Lip ¥œ < 1, ∀ ∈ 1, 2, … , ならば ResNet ¥ は可逆. – 可逆になることは⽰せるが, 逆関数の解析的な形はわからない. Banach の不動点定理より, 各点で逆関数の像への収束列が作れる. 不動点 iteration (Algorithm 1) で逆関数を代替. – Lipschitz 定数の制約には, spectral normalization [Miyato et al., 2018.] を使⽤.

i-ResNet 50 n Log-determinant の計算 – 変数変換 = = (Id
+ )() を考える. – Lipschitz 制約とdetとtraceの間の関係を⽤いて, 対数尤度は次のように計算できる. log ž = log = + tr log + Á . trace は, Lipschitz 制約 Á ] ≤ Lip < 1 を⽤いて, 収束無限級数に展開できる. tr log + Á = Q ÂR4 Ã −1 Âu4 tr Á Â . – この式によるTrace の計算には3つの⽋点がある 1. tr(Á ) の計算量は(]), またはの回 evaluation が必要. 2. Á Âの計算が必要. 3. 和が無限和. – 次のように対処. 1. Reverse-mode ⾃動微分を使って, Vector-Jacobian ºÁ を計算. 計算コストはの evaluation と同程度にできる. 2. Hutchinsonʼ s trace estimator を⽤いる. 3. 無限和を有限和で打ち切る. • Trace の推定量が不偏ではない.

i-ResNet 51 n 他の⼿法との⽐較

i-ResNet の実験結果 52 n 2D toy データセットの密度推定 – Glow と⽐較.
i-ResNet の⽅が良い密度推定ができることを定性的に確認. • Glow の coupling layers を invertible residual blocks で置き換え • Glow の actnorm を invertible linear transformation の代わりに使⽤. n 画像合成 – MNIST と CIFAR-10 で他のモデルと⽐較 – Glow や FFJORD より精度は悪かったが, RealNVP と同程度の精度.

i-ResNet の実験結果 53 n 参考: i-ResNet は分類でも⾼精度. – ResNet と
Lipschitz 定数の制約を変えた場合を⽐較 – Glow と⽐較

Residual flow 54 n Residual Flows for Invertible Generative Modeling
– ICML INNF 2019. R. T. Q. Chen, J. Behrmann, D. Duvenaud, J-H. Jacobsen. n 概要 – i-ResNet を改良して Residual Flow を提案. • i-ResNet での尤度計算において, 無限級数の trace を計算. 有限和で打ち切っていたため, trace の推定量が不偏でなかった. Russian roulette estimator を導⼊することで, trace の不偏推定量を提案. • 勾配の不偏推定量を提案し, さらに backward 計算の⼀部を forward 計算時に⾏うことで, 訓練時の必要なメモリを削減. • Lipschitz 制約を考慮した活性化関数を導⼊し, 勾配消失に対処. – 画像の密度推定において, Flow-based モデルで⾼精度を達成.

Residual Flow 55 n 尤度計算の trace の不偏推定量 – 対数尤度は以下を計算すれば良い log
ž = log = + tr Q ÂR4 Ã −1 Âu4 Á Â . – trace 部分の計算について, i-ResNet では有限和で打ち切っていたため, trace の不偏推定量になっていない. • Russian roulette estimator を導⼊し, trace の不偏推定量を提案. () = + , Lip < 1, として, は正の整数に台をもつ確率変数とする. log ž = log = = () + Ç,È Q ÂR4 Ç −1 Âu4 ºÁ Â ℙ ≥ . ここで, ~ , ~ 0, . – 不偏推定量を導⼊することで, テストデータの精度向上.

Residual Flow 56 n 訓練時のメモリ使⽤量削減 – 勾配の不偏推定量 • 勾配を直接計算せず, 不偏推定量を計算.
log det + Á , = Ç,È Q ÂR" Ç −1 Â ℙ ≥ ºÁ , Â (Á , ) ここで, は正の整数に台をもつ確率変数で, ~ , ~ 0, . 勾配計算は和の外側なのでメモリを節約できる. – Backward-in-forward • Forward evaluation の間に⼀部の backward 計算.

Residual Flow 57 n Lipschitz 制約を考慮した活性化関数 – Log density の計算に必要なのは
Jacobian であり1階の微分. よって, 訓練時には2階の微分を使って最適化を⾏う. – i-ResNet において, 活性化関数部分の Lipschitz 定数が最⼤となるのは, ELU関数の1 階微分が1になる時. この時 2階微分 (もどき) は消えてしまい訓練が進みにくい. – 1. • ≤ 1 for ∀, 2. • が1に近づくときに2階微分が0に近づかない, ような活性化関数を使いたい. – 活性化関数として Swish を変形した, LipSwish を提案. LipSwish = Swish 1.1 = ⋅ ⋅ 1.1 ここで, (⋅)は sigmoid 関数, は実数.

Residual Flow の実験結果 58 n 画像データセットでの密度推定・サンプリング – Flow-based モデルのSOTA に近い精度を達成.

Residual Flow の実験結果 59 n 画像合成の定量・訂正評価 – FID で評価. i-ResNetやPixelCNNより低い
FID を達成 – CIFAR-10 で訓練した, Residual Flow は PixelCNN より⼤域的に整合性が取れていた.

Residual Flow の実験結果 60 n Ablation study – 不偏推定量と LipSwish
について ablation study を実施. n その他, 連続値と離散値 (画像とクラスラベル) を混ぜた Hybrid training も実施.

今回紹介するモデル 61 n 各モデルの紹介 – 画像⽣成 • Coupling Flows –
NICE, RealNVP, Glow, Flow++, CAGlow (画像合成) • Residual Flows – i-ResNet, Residual Flows • Infinitesimal Flows: – Continuous Normalizing Flow (Neural ODEs), FFJORD – 動画⽣成 – VideoFlow

Continuous Normalizing Flow 62 n Neural Ordinary Differential Equations. –
NIPS 2018 (Best paper). R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. Duvenaud. n 概要 – NN において, 隠れ層の離散的な列を使う代わりに, ODE solver を使って隠れ状態の連続的な dynamics をパラメトライズする. NN の層の深さが離散値から連続値へ. – 応⽤の1つとして, scalable な invertible normalizing flow が得られた. • ResNets ℎGu4 = ℎG + ℎG + G ∈ 0, 1, … , • Neural ODEs ℎ() = ℎ , , ∈ 0,

Continuous Normalizing Flow 63 n Neural ODEs の loss とその勾配
– 隠れ状態 () を推定したい. 次の初期値問題 (初期値 (" )) を解いて, をNNでモデリングする; = , , . () は, を積分して求める. – 最⼩化する loss は以下のようにかける 4 = " + Ô GÕ G@ , , = (ODESolve( " , , " , 4 , )). – loss の勾配を求めるには adjoint sensitivity method を⽤いる. • Adjoint state を導⼊. = () . • Adjoint state は第2の常微分⽅程式の解. = − º , , . • loss の勾配は次のように計算される. 4 から" へbackward. = − Ô G@ GÕ º , , .

Continuous Normalizing Flow 64 n Neural ODEs を⽤いた normalizing flows
– Change of variables • 離散的な場合 4 = " ⇒ log 4 = log " − log | det " | • 連続的な場合 (Neural ODEsを使う場合) () = , ⇒ log = −tr ただし, は, に関して⼀様Lipschitz連続, に関して連続と仮定. • ODE solver のコストと引き換えに, 尤度計算ではコストが \ から ] になる. (: データの次元数) – Planar Normalizing flow (Planar NF) の連続版を Neural ODEs を使ってモデリング • 離散的な場合 (Rezende and Mohamed, 2015) + 1 = + ℎ º + ⇒ log ( + 1) = log (()) − log |1 + º ℎ () | • 連続的な場合 (Neural ODEs) () = ℎ º + ⇒ log = −º ℎ

Continuous Normalizing Flow 65 n Neural ODEs を⽤いた normalizing flows
(続き) – Multiple hidden units • Traceは線型関数なので, dynamics を関数の和で複雑にしても計算量は線型に増えるのみ. () = Q ÇR4 ¶ Ç ( ) ⇒ log = − Q ÇR4 ¶ tr Ç • 通常の NF では隠れ層のユニット数を増やすと計算量は \ で増える. Neural ODEs を使った NF では線型に増える. • Neural ODEs では, 隠れ層の深さを増やす代わりに, 隠れ層のユニット数 (width と呼ぶ) を増やすことでモデルを複雑にする. – Time-dependent dynamics • Flow のパラメータを時間依存にできる. • Gating メカニズムを導⼊:f=(G) fG = ∑Ç Ç Ç . • これらのモデルを, continuous normalizing flows (CNF) と呼ぶ.

n Density matching – をデータ分布, をモデルとして, (()||())を最⼩化. – Planar NF
と Planar CNF で変えて⽐較. – Planar CNFの⽅が良い. ※ NFはdepth で, CNF は width でモデルの容量が決まる n 対数尤度最⼤化 – をデータ分布, をモデルとして, (š(ž) [ ]を最⼩化. – Planar NF と Planar CNF で normalizing flow の時間変化を可視化. – Planar CNFの⽅が良い. Continuous Normalizing Flowの実験結果 66 CNFの結果

FFJORD 67 n FFJORD: Free-form Continuous Dynamics for Scalable Reversible
Generative Models. – ICLR 2019. W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, D. Duvenaud. n 概要 – 可逆な⽣成モデルの likelihood-based の訓練には, Jacobianの⾏列式の計算が必要. – 変換を ODE の解とすることで, Jacobian の⾏列式の代わりに trace を計算する Continuous normalizing flow (CNF) を提案. • 計算量を⾏列式を直接計算する場合の (\) から (]) に減らした. (: データの次元数) – Jacobian の trace の計算に, Hutchinson's trace estimator を使うことで, log- density の代わりにその不偏推定量を出すモデルを提案. • 計算量をCNFの (]) から () に減らし, Jacobianの形の制約なしにモデリングできるようになった. (Free-form Jacobian of Reversible Dynamics)

FFJORD 68 n 他の⼿法との⽅法の⽐較

FFJORD 69 n Hutchinson's trace estimator – を × ⾏列,
を次元空間上の適切な*確率分布で, = , = 0となるものとすると tr = š á º となる. つまり, º は tr() の不偏推定量. この推定量を使う Monte Carlo estimator を Hutchinsonʻs trace estimator (Hutchinson, 1989)という. * Gaussian分布, Rademacher分布 n Log density estimation – 訓練で最⼤化する対数尤度 log 4 = log " − Ô GÕ G@ tr ( ) を Hutchinsonʼs trace estimator を使って log 4 = log " − š á Ô GÕ G@ º と書き直す. 計算量がとなる. – Hutchinsonʼs trace estimator の variance を抑える⼯夫も⼊れる． (bottleneck trick)

FFJORD の実験結果 70 n 2D toy データセットでの密度推定 n Table・画像データセットでの密度推定 –
画像データセットでは, multi-scale structureを使うと MNIST で Glow より精度が良かった.

FFJORD の実験結果 71 n Variational Autoencoder – Variational inferenceを⾏った. –
他の Variational inference を⾏う flow-based モデルより精度が良かった. n その他 – bottle neck trick の ablation study – Hutchinsonʼs trace estimator に使う分布の違いによる精度⽐較 – etc

Glow, Flow++, CAGlow (画像合成) – Residual Flows • i-ResNet, Residual Flows – Infinitesimal Flows: • Continuous Normalizing Flow (Neural ODEs), FFJORD n 動画⽣成 • VideoFlow

VideoFlow 73 n VideoFlow: A Flow-Based Generative Model for Video
– ICML 2019, M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh, D. Kingma n 概要 – Glow や RealNVP の構造を拡張し, Normalizing Flows を使った video prediction を提案. – BAIR action-free dataset の stochastic video prediction において SOTA と competitive な精度. – Pixel level mean-squared-error を使うモデルに共通の構造や adversarial training などを⽤いずに, 質的に良い結果を達成. – Pixel level autoregressive モデルに⽐べ, より⾼速で画像を合成.

VideoFlow 74 n 問題設定は conditional video prediction – 過去の観測で条件づけられた, RGB
video フレームの合成 n アプローチ – 時刻の Video フレーム G を, 時刻の潜在変数 G から invertible な変換で得るようなモデリング. – 潜在変数の系列の予測を通して, Video フレームを予測. n モデルの変数 – 潜在空間を time step で分割 = {G}GR4 º . – G は特定の scale の情報を持つ複数レベルの stack によって得られる G = {G (–)}–R4 â . – G は対応する時間の Video フレームから invertible な変換で得られる G = ¥(G).

VideoFlow 75 n 潜在変数の dynamics – 潜在変数の事前分布として次の autoregressive factorization を使う
¥ = t *R4 º ¥ G ã G . – 条件つき事前分布として次のfactorizationを仮. ¥ G ãG = t –R4 â ¥ G – ãG – , G ä– . – 各¥ (G – |ãG (–), G (ä–)) には Gaussian density による条件つき factorization を仮定. ¥ G – ãG – , G ä– = (G – ; , ), , log = ¥ ãG – , G (ä–) . ここで, ¥ はResNet.

VideoFlow 76 n Invertible NN (xå = ¥(G)) と ResNet
(¥ ).

VideoFlow の実験結果 77 n Stochastic video-generation baselines との⽐較 – BAIR
action-free dataset で訓練・評価 • 訓練: ⼊⼒ 3 フレーム, 予測: 10 フレーム • 評価: ⼊⼒ 3 フレーム, 予測: 27 フレーム – SAVP-VAE, SV2P と⽐較. – 指標は, Max Cosine Similarity, Max SSIM, Max PSNR. n 結果動画など: https://sites.google.com/view/videoflow/home

参考になる survey 78 n Paper (preprint) – I. Kobyzev, S.
Prince and M. A. Brubaker. Normalizing Flows: Introduction and Ideas. arXiv2019. • Normalizing flowについてのsurvey. 3章で構造ごとに整理されていて便利. • https://arxiv.org/abs/1908.09257 n Slides – M. Suzuki. Flow-based Generative Models. • 松尾研のDL輪読会の資料, flow-based ⽣成モデルについて幅広くまとめられている. • https://www.slideshare.net/DeepLearningJP2016/dlflowbased-deep-generative-models n Blog – E. Jang. Normalizing Flows Tutorial, Part 1: Distributions and Determinants, Part 2: ModernNormalizing Flows. • https://blog.evjang.com/2018/01/nf1.html, https://blog.evjang.com/2018/01/nf2.html – LiʼLog: L. Weng, Flow-based Deep Generative Models • https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html n Conference page – ICML 2019 Invertible Neural Networks and Normalizing Flows • ICML 2019のworkshop. 講演Videoあり. • https://invertibleworkshop.github.io/index.html

参考⽂献 79 1. J. Behrmann, W. Grathwohl, R. T. Q.
Chen, D. Duvenaud, J-H. Jacobsen. Invertible Residual Networks. ICML 2019. 2. R. T. Q. Chen, J. Behrmann, D. Duvenaud, J-H. Jacobsen. Residual Flows for Invertible Generative Modeling. ICML 2019. 3. R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. Duvenaud. Neural Ordinary Differential Equations. NIPS 2018. 4. L. Dinh, D. Krueger and Y. Bengio. Non-linear Independent Components Estimation. ICLR 2015. 5. L. Dinh, J. Sohl-Dickstein and S. Bengio. Density estimation using Real NVP. ICLR 2017. 6. W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever and D. Duvenaud. FFJORD: Free- form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019. 7. J. Ho, X. Chen, A. Srinivas, Y. Duan, P. Abbeel. Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design. ICML 2019. 8. D. P. Kingma, P. Dhariwal. Glow: Generative Flow with Invertible 1×1 Convolutions. NIPS 2018. 9. I. Kobyzev, S. Prince and M. A. Brubaker. Normalizing Flows: Introduction and Ideas. arXiv2019. 10. M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh and D. Kingma. VideoFlow: A Flow-Based Generative Model for Video. ICML 2019. 11. R. Liu, Y. Liu, X. Gong, X. Wang, H. Li. Conditional Adversarial Generative Flow for Controllable Image Synthesis. CVPR 2019. 12. D. J. Rezende and S. Mohamed. Variational Inference with Normalizing Flows. ICML 2015.

A First Step to Flow-Based Generative Models

A First Step to Flow-Based Generative Models

Other Decks in Technology

Featured

Transcript