L(D) = − (xt,ut,xt+1 )∈D log p(xt , ut , xt+1 ) ≥ (xt,ut,xt+1 )∈D Lbound(xt , ut , xt+1 ), where Lbound(xt , ut , xt+1 ) = E zt ∼ qφ zt+1 ∼ qψ − log pθ (xt |zt ) − log pθ (xt+1 |zt+1 ) + + KL(qφ ||N(0, I)) In practice we optimize the regularized LB (xt,ut,xt+1 )∈D Lbound(xt , ut , xt+1 )+λKL (qψ (z|µt , ut )||qφ (z|xt+1 )) → max parameters