Slide 19
Slide 19 text
Dynamical Isometry
(Dynamical Isometry) If the eigenvalue distribution of
is concentrated around 1, then we can prevent the exploding/vanishing gradients.
[Pennington, Schoenholz, Ganguli, AISTATS2018, Benoit Collins & TH,
CIMP2022] If we set the initialization of parameters to be Haar orthgonal and
choose appropriate activation function, then we can make the DNN to achieve the
dynamical isometry.
19