Slide 21
Slide 21 text
Training Procedure
Training = Optimization
w_opt = argmin(L(w))
w: Neural Network parameters, millions of them
L: loss function, problem-dependent (params -> number)
Optimization method - gradient descent, aka steepest descent
(intuition - reckless runner with short attention span gets lost in the fog in the mountains)