normalized to zero mean and unit variance ‣ # hidden layers : 1 – 8 ‣ # hidden units : 512, 1024, 2048 ‣ Activation func : ReLU ‣ Initialization : Supervised layer-wise pre-training ‣ Minibatch size : 200 ‣ Learning rate : 0.0001 (with Newbob decay scheduler) ‣ Weight decay : 0.001 ‣ Momentum : 0.99 ‣ Max-norm : 1 ‣ Dropout rate : input = 0.5; hidden = 0.02 ‣ Min # epoches : 24 (fine-tuning) 05/04/2016 University of Missouri-Columbia 22