Slide 21
Slide 21 text
Understandings of DL: generalization
21/23
ref: https://arxiv.org/abs/1705.08741 https://arxiv.org/abs/1710.06451 https://arxiv.org/abs/1706.02677
SGD optimizations are controlled by “noise scale”:
ε : learning rate, N : training set size, B : Batch size
training error