Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JGS594 Lecture 09

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

JGS594 Lecture 09

Software Engineering for Machine Learning
Image Recognition II
(202202)

Avatar for Javier Gonzalez-Sanchez

Javier Gonzalez-Sanchez PRO

February 08, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 09:

    Image Recognition with DL4J II Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

    jgs Career Fair § Master’s and PhD Online Career Fair Tuesday, Feb 15, 2022 9 a.m.–4 p.m. § No lecture that day. § Faculty picnic with students Thursday, Feb 24, 2022 (SER Faculty, SCAI Director, Dean of Students). (there will be food) 11:30 am I will start the lecture 12:15 pm that day.
  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs MNIST dataset § Each number is stored as an anti-aliased image in black and white and is normalized to fit into a 28x28 pixel bounding box
  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9

    jgs Weight Initialization | Xavier § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) Advice: § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

    jgs Activation Functions | RELU § ReLU –– Rectified linear activation function § Popular activation function for hidden layers
  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

    jgs Activation Functions | SoftMax § Sigmoid is independent § Most popular activation function for output layers handling multiple classes. § Probabilities.
  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

    jgs Error Function | Negative Log-Likelihood § the SoftMax function is used in tandem with the negative log-likelihood. § Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. § Log facilitates the derivatives § The Log likelihood values are then in range -Infinite to 0. § Negative make it Infinite to 0 https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf
  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

    jgs Model 3 (learningRate, momentum)
  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

    jgs Updater § Training mechanisms.
  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17

    jgs Updater § Training mechanisms. § There are methods that can result in much faster network training compared to 'vanilla' gradient descent. You can set the updater using the .updater(Updater) configuration option. § E.g., momentum, RMSProp, adagrad, ADAM, NADAM, and others
  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18

    jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. / / quantity of motion of a moving body / / (product of its mass and velocity)
  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19

    jgs Updater | Nesterov Momentum / /Original Choice / /Velocity Momentum coefficient / /Nesterovs
  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21

    jgs Results Is this Good enough?
  14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24

    jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. § Adaptive Movement Estimation (ADAM): Calculate learning rate for each input objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient § Nesterov Momentum + ADAM Nesterov-accelerated Adaptive Moment Estimation
  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25

    jgs Updater | Nadam / /Original Choice / /Velocity Momentum coefficient / /Nesterovs / / Nesterov-accelerated Adaptive Moment Estimation ADAM + Nesterov Momentum
  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 33

    jgs Reference § Deeplearning4j Suite Overview https://deeplearning4j.konduit.ai § Papers on Canvas § Source Code
  17. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.