Slide 26
Slide 26 text
Prepared and presented by: Kuncahyo Setyo Nugroho
Large Batch Challenge
Keskar, Nitish Shirish, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. Tak, “On Large-Batch Training for Deep Learning:
Generalization Gap and Sharp Minima,” arXiv.org, 2016. https://arxiv.org/abs/1609.04836.
Global Batch Size:
𝐵𝑔𝑙𝑜𝑏𝑎𝑙
= 𝐵𝑙𝑜𝑐𝑎𝑙
× 𝑁
Larger clusters → Larger global batch size.
Two Key Challenges:
▪ Lower generalization at very large batch size (e.g.,
>8000). Models tend to converge to sharp minima →
lower test performance.
▪ Scalability drops with more workers, especially with high
communication overhead.