jgs SER 594 Software Engineering for Machine Learning Lecture 09: Image Recognition with DL4J II Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2 jgs Career Fair § Master’s and PhD Online Career Fair Tuesday, Feb 15, 2022 9 a.m.–4 p.m. § No lecture that day. § Faculty picnic with students Thursday, Feb 24, 2022 (SER Faculty, SCAI Director, Dean of Students). (there will be food) 11:30 am I will start the lecture 12:15 pm that day.
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4 jgs MNIST dataset § Each number is stored as an anti-aliased image in black and white and is normalized to fit into a 28x28 pixel bounding box
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9 jgs Weight Initialization | Xavier § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) Advice: § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10 jgs Activation Functions | RELU § ReLU –– Rectified linear activation function § Popular activation function for hidden layers
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11 jgs Activation Functions | SoftMax § Sigmoid is independent § Most popular activation function for output layers handling multiple classes. § Probabilities.
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12 jgs Error Function | Negative Log-Likelihood § the SoftMax function is used in tandem with the negative log-likelihood. § Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. § Log facilitates the derivatives § The Log likelihood values are then in range -Infinite to 0. § Negative make it Infinite to 0 https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17 jgs Updater § Training mechanisms. § There are methods that can result in much faster network training compared to 'vanilla' gradient descent. You can set the updater using the .updater(Updater) configuration option. § E.g., momentum, RMSProp, adagrad, ADAM, NADAM, and others
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18 jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. / / quantity of motion of a moving body / / (product of its mass and velocity)
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24 jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. § Adaptive Movement Estimation (ADAM): Calculate learning rate for each input objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient § Nesterov Momentum + ADAM Nesterov-accelerated Adaptive Moment Estimation
Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 33 jgs Reference § Deeplearning4j Suite Overview https://deeplearning4j.konduit.ai § Papers on Canvas § Source Code
jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez, Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.