Autoencoding variational bayes - reading group

Auto-encoding variational Bayes Diederik P Kingma1 Max Welling2 Presented by
: Mehdi Cherti (LAL/CNRS) 9th May 2015 Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Diederik P Kingma, Max Welling Auto-encoding variational Bayes

What is a generative model ? A model of how
the data X was generated Typically, the purpose is to nd a model for : p(x) or p(x, y) y can be a set of latent (hidden) variables or a set of output variables, for discriminative problems Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Training generative models Typically, we assume a parametric form of
the probability density : p(x|Θ) Given an i.i.d dataset : X = (x 1 , x 2 , ..., xN), we typically do : Maximum likelihood (ML) : argmax Θ p(X|Θ) Maximum a posteriori (MAP) : argmax Θ p(X|Θ)p(Θ) Bayesian inference : p(Θ|X) = p(x|Θ)p(Θ) ´ Θ p(x|Θ)p(Θ)dΘ Diederik P Kingma, Max Welling Auto-encoding variational Bayes

The problem let x be the observed variables we assume
a latent representation z we dene p Θ(z) and p Θ(x|z) We want to design a generative model where: p Θ(x) = ´ p Θ(x|z)p Θ(z)dz is intractable p Θ(z|x) = p Θ(x|z)p Θ(z)/p Θ(x) is intractable we have large datasets : we want to avoid sampling based training procedures (e.g MCMC) Diederik P Kingma, Max Welling Auto-encoding variational Bayes

The proposed solution They propose: a fast training procedure that
estimates the parameters Θ: for data generation an approximation of the posterior p Θ(z|x) : for data representation an approximation of the marginal p Θ(x) : for model evaluation and as a prior for other tasks Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Formulation of the problem the process of generation consists of
sampling z from p Θ(z) then x from p Θ(x|z). Let's dene : a prior over over the latent representation p Θ(z), a decoder p Θ(x|z) We want to maximize the log-likelihood of the data (x(1), x(2), ..., x(N)): logp Θ(x(1), x(2), ..., x(N)) = i logp Θ(xi) and be able to do inference : p Θ(z|x) Diederik P Kingma, Max Welling Auto-encoding variational Bayes

The variational lower bound We will learn an approximate of
p Θ(z|x) : q Φ(z|x) by maximizing a lower bound of the log-likelihood of the data We can write : logp Θ(x) = DKL(q Φ(z|x)||p Θ(z|x)) + L(Θ, φ, x) where: L(Θ, Φ, x) = Eq Φ(z|x) [logp Θ(x, z) − logq φ (z|x)] L(Θ, Φ, x)is called the variational lower bound, and the goal is to maximize it w.r.t to all the parameters (Θ, Φ) Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Estimating the lower bound gradients We need to compute ∂L(Θ,Φ,x)
∂Θ , ∂L(Θ,Φ,x) ∂φ to apply gradient descent For that, we use the reparametrisation trick : we sample from a noise variable p( ) and apply a determenistic function to it so that we obtain correct samples from q φ(z|x), meaning: if ∼ p( ) we nd g so that if z = g(x, φ, ) then z ∼ q φ (z|x) g can be the inverse CDF of q Φ (z|x) if is uniform With the reparametrisation trick we can rewrite L: L(Θ, Φ, x) = E ∼p( ) [logp Θ(x, g(x, φ, )) − logq φ (g(x, φ, )|x)] We then estimate the gradients with Monte Carlo Diederik P Kingma, Max Welling Auto-encoding variational Bayes

A connection with auto-encoders Note that L can also be
written in this form: L(Θ, φ, x) = −DKL(q Φ(z|x)||p Θ(z)) + Eq Φ(z|x) [logp Θ(x|z)] We can interpret the rst term as a regularizer : it forces q Φ(z|x) to not be too divergent from the prior p Θ(z) We can interpret the (-second term) as the reconstruction error Diederik P Kingma, Max Welling Auto-encoding variational Bayes

The algorithm Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Variational auto-encoders It is a model example which uses the
procedure described above to maximize the lower bound In V.A, we choose: p Θ (z) = N(0, I) p Θ (x|z) : is normal distribution for real data, we have neural network decoder that computes µand σ of this distribution from z is multivariate bernoulli for boolean data, we have neural network decoder that computes the probability of 1 from z q Φ (z|x) = N(µ(x), σ(x)I) : we have a neural network encoder that computes µand σ of q Φ (z|x) from x ∼ N(0, I) and z = g(x, φ, ) = µ(x) + σ(x) ∗ Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Experiments (1) Samples from MNIST: Diederik P Kingma, Max Welling
Auto-encoding variational Bayes

Experiments (2) 2D-Latent space manifolds from MNIST and Frey datasets
Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Experiments (3) Comparison of the lower bound with the Wake-sleep
algorithm : Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Experiments (4) Comparison of the marginal log-likelihood with Wake-Sleep and
Monte Carlo EM (MCEM): Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Implementation : https://github.com/mehdidc/lasagnekit Diederik P Kingma, Max Welling Auto-encoding variational
Bayes

Autoencoding variational bayes - reading group

Autoencoding variational bayes - reading group

Mehdi

More Decks by Mehdi

Featured

Transcript

Auto-encoding variational Bayes Diederik P Kingma1 Max Welling2 Presented by

Diederik P Kingma, Max Welling Auto-encoding variational Bayes

What is a generative model ? A model of how

Training generative models Typically, we assume a parametric form of

The problem let x be the observed variables we assume

The proposed solution They propose: a fast training procedure that

Formulation of the problem the process of generation consists of

The variational lower bound We will learn an approximate of

Estimating the lower bound gradients We need to compute ∂L(Θ,Φ,x)

A connection with auto-encoders Note that L can also be

The algorithm Diederik P Kingma, Max Welling Auto-encoding variational Bayes

Variational auto-encoders It is a model example which uses the

Experiments (1) Samples from MNIST: Diederik P Kingma, Max Welling

Experiments (2) 2D-Latent space manifolds from MNIST and Frey datasets

Experiments (3) Comparison of the lower bound with the Wake-sleep

Experiments (4) Comparison of the marginal log-likelihood with Wake-Sleep and

Implementation : https://github.com/mehdidc/lasagnekit Diederik P Kingma, Max Welling Auto-encoding variational