curvature of model’s prediction function • It shows how robust the model is about its own prediction • It is positive semi-definite • It is correlated well with covariance matrix of stochastic gradients and with the Hessian of train loss • We can effectively estimate its trace in a stochastic manner F(θ) = − 𝔼x 𝔼y ∂2 log p(y|x, θ) ∂θ2 = 𝔼x 𝔼y ∂ log p(y|x, θ) ∂θ ( ∂ log p(y|x, θ) ∂θ ) T ≽ 0