JGS594 Lecture 09

jgs SER 594 Software Engineering for Machine Learning Lecture 09:
Image Recognition with DL4J II Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2
jgs Career Fair § Master’s and PhD Online Career Fair Tuesday, Feb 15, 2022 9 a.m.–4 p.m. § No lecture that day. § Faculty picnic with students Thursday, Feb 24, 2022 (SER Faculty, SCAI Director, Dean of Students). (there will be food) 11:30 am I will start the lecture 12:15 pm that day.

jgs Previously …

jgs MNIST dataset § Each number is stored as an anti-aliased image in black and white and is normalized to fit into a 28x28 pixel bounding box

jgs Model 1

jgs Results 🙁

jgs Model Case 2

jgs Model 2

jgs Weight Initialization | Xavier § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) Advice: § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set

jgs Activation Functions | RELU § ReLU –– Rectified linear activation function § Popular activation function for hidden layers

jgs Activation Functions | SoftMax § Sigmoid is independent § Most popular activation function for output layers handling multiple classes. § Probabilities.

jgs Error Function | Negative Log-Likelihood § the SoftMax function is used in tandem with the negative log-likelihood. § Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. § Log facilitates the derivatives § The Log likelihood values are then in range -Infinite to 0. § Negative make it Infinite to 0 https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf

jgs Results

jgs Model Case 3

jgs Model 3 (learningRate, momentum)

jgs Updater § Training mechanisms.

jgs Updater § Training mechanisms. § There are methods that can result in much faster network training compared to 'vanilla' gradient descent. You can set the updater using the .updater(Updater) configuration option. § E.g., momentum, RMSProp, adagrad, ADAM, NADAM, and others

jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. / / quantity of motion of a moving body / / (product of its mass and velocity)

jgs Updater | Nesterov Momentum / /Original Choice / /Velocity Momentum coefficient / /Nesterovs

jgs Results

jgs Results Is this Good enough?

jgs Model Case 4

jgs Model 4

jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. § Adaptive Movement Estimation (ADAM): Calculate learning rate for each input objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient § Nesterov Momentum + ADAM Nesterov-accelerated Adaptive Moment Estimation

jgs Updater | Nadam / /Original Choice / /Velocity Momentum coefficient / /Nesterovs / / Nesterov-accelerated Adaptive Moment Estimation ADAM + Nesterov Momentum

jgs Results

jgs Results 👍

jgs Model 4 / / Adam() ?

jgs Questions

jgs Reference § Deeplearning4j Suite Overview https://deeplearning4j.konduit.ai § Papers on Canvas § Source Code

jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,
Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.

JGS594 Lecture 09

JGS594 Lecture 09

Javier Gonzalez-Sanchez
PRO

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Featured

Transcript

jgs SER 594 Software Engineering for Machine Learning Lecture 09:

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

jgs Previously …

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 6

jgs Model Case 2

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13

jgs Model Case 3

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21

jgs Model Case 4

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 23

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 26

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 27

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 28

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 32

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 33

jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,