Slide 106
Slide 106 text
ɹ
ֶशʹؔΘΔཧ [ֶशɾMomentumɾDA] 106
Control Batch Size and Learning Rate to Generalize Well:
Theoretical and Empirical Evidence
SGDͰֶश͞ΕͨDNNͷ൚Խೳྗʹର͢ΔόοναΠζ/ֶशͷൺʹؔ͢ΔӨڹΛௐͨ
[࣮ݧ֓ཁ]
[࣮ݧํ๏]
4छྨͷઃఆ={CIFAR-10ͱCIFAR-100্ͰResNet-110ͱVGG-19Ͱͷֶश}
20छྨͷόοναΠζ={16,32,48,64,80,96,112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 272, 288, 304, 320}
20छྨͷֶश={0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17,
0.18, 0.19, 0.20}
͜ΕΒͷશͯΛΈ߹Θͤͯ1600 = (20 x 20 x 4)Ϟσϧͷ࣮ݧΛߦͬͨ
※ ֶशதͷόοναΠζͱֶशΛҰఆͱ͠ɺΤϙοΫ200ɺmomentumͳͲͷSGDҎ֎ͷֶशςΫχοΫ
શͯ༻͍ͯ͠ͳ͍