Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Autoencoding variational bayes - reading group

Mehdi
May 09, 2015
200

Autoencoding variational bayes - reading group

Mehdi

May 09, 2015
Tweet

Transcript

  1. Auto-encoding variational Bayes Diederik P Kingma1 Max Welling2 Presented by

    : Mehdi Cherti (LAL/CNRS) 9th May 2015 Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  2. What is a generative model ? A model of how

    the data X was generated Typically, the purpose is to nd a model for : p(x) or p(x, y) y can be a set of latent (hidden) variables or a set of output variables, for discriminative problems Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  3. Training generative models Typically, we assume a parametric form of

    the probability density : p(x|Θ) Given an i.i.d dataset : X = (x 1 , x 2 , ..., xN), we typically do : Maximum likelihood (ML) : argmax Θ p(X|Θ) Maximum a posteriori (MAP) : argmax Θ p(X|Θ)p(Θ) Bayesian inference : p(Θ|X) = p(x|Θ)p(Θ) ´ Θ p(x|Θ)p(Θ)dΘ Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  4. The problem let x be the observed variables we assume

    a latent representation z we dene p Θ(z) and p Θ(x|z) We want to design a generative model where: p Θ(x) = ´ p Θ(x|z)p Θ(z)dz is intractable p Θ(z|x) = p Θ(x|z)p Θ(z)/p Θ(x) is intractable we have large datasets : we want to avoid sampling based training procedures (e.g MCMC) Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  5. The proposed solution They propose: a fast training procedure that

    estimates the parameters Θ: for data generation an approximation of the posterior p Θ(z|x) : for data representation an approximation of the marginal p Θ(x) : for model evaluation and as a prior for other tasks Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  6. Formulation of the problem the process of generation consists of

    sampling z from p Θ(z) then x from p Θ(x|z). Let's dene : a prior over over the latent representation p Θ(z), a decoder p Θ(x|z) We want to maximize the log-likelihood of the data (x(1), x(2), ..., x(N)): logp Θ(x(1), x(2), ..., x(N)) = i logp Θ(xi) and be able to do inference : p Θ(z|x) Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  7. The variational lower bound We will learn an approximate of

    p Θ(z|x) : q Φ(z|x) by maximizing a lower bound of the log-likelihood of the data We can write : logp Θ(x) = DKL(q Φ(z|x)||p Θ(z|x)) + L(Θ, φ, x) where: L(Θ, Φ, x) = Eq Φ(z|x) [logp Θ(x, z) − logq φ (z|x)] L(Θ, Φ, x)is called the variational lower bound, and the goal is to maximize it w.r.t to all the parameters (Θ, Φ) Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  8. Estimating the lower bound gradients We need to compute ∂L(Θ,Φ,x)

    ∂Θ , ∂L(Θ,Φ,x) ∂φ to apply gradient descent For that, we use the reparametrisation trick : we sample from a noise variable p( ) and apply a determenistic function to it so that we obtain correct samples from q φ(z|x), meaning: if ∼ p( ) we nd g so that if z = g(x, φ, ) then z ∼ q φ (z|x) g can be the inverse CDF of q Φ (z|x) if is uniform With the reparametrisation trick we can rewrite L: L(Θ, Φ, x) = E ∼p( ) [logp Θ(x, g(x, φ, )) − logq φ (g(x, φ, )|x)] We then estimate the gradients with Monte Carlo Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  9. A connection with auto-encoders Note that L can also be

    written in this form: L(Θ, φ, x) = −DKL(q Φ(z|x)||p Θ(z)) + Eq Φ(z|x) [logp Θ(x|z)] We can interpret the rst term as a regularizer : it forces q Φ(z|x) to not be too divergent from the prior p Θ(z) We can interpret the (-second term) as the reconstruction error Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  10. Variational auto-encoders It is a model example which uses the

    procedure described above to maximize the lower bound In V.A, we choose: p Θ (z) = N(0, I) p Θ (x|z) : is normal distribution for real data, we have neural network decoder that computes µand σ of this distribution from z is multivariate bernoulli for boolean data, we have neural network decoder that computes the probability of 1 from z q Φ (z|x) = N(µ(x), σ(x)I) : we have a neural network encoder that computes µand σ of q Φ (z|x) from x ∼ N(0, I) and z = g(x, φ, ) = µ(x) + σ(x) ∗ Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  11. Experiments (2) 2D-Latent space manifolds from MNIST and Frey datasets

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  12. Experiments (3) Comparison of the lower bound with the Wake-sleep

    algorithm : Diederik P Kingma, Max Welling Auto-encoding variational Bayes
  13. Experiments (4) Comparison of the marginal log-likelihood with Wake-Sleep and

    Monte Carlo EM (MCEM): Diederik P Kingma, Max Welling Auto-encoding variational Bayes