∂Θ , ∂L(Θ,Φ,x) ∂φ to apply gradient descent For that, we use the reparametrisation trick : we sample from a noise variable p( ) and apply a determenistic function to it so that we obtain correct samples from q φ(z|x), meaning: if ∼ p( ) we nd g so that if z = g(x, φ, ) then z ∼ q φ (z|x) g can be the inverse CDF of q Φ (z|x) if is uniform With the reparametrisation trick we can rewrite L: L(Θ, Φ, x) = E ∼p( ) [logp Θ(x, g(x, φ, )) − logq φ (g(x, φ, )|x)] We then estimate the gradients with Monte Carlo Diederik P Kingma, Max Welling Auto-encoding variational Bayes