Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JGS594 Lecture 09

JGS594 Lecture 09

Software Engineering for Machine Learning
Image Recognition II
(202202)

Javier Gonzalez-Sanchez

February 08, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 09:

    Image Recognition with DL4J II Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

    jgs Career Fair § Master’s and PhD Online Career Fair Tuesday, Feb 15, 2022 9 a.m.–4 p.m. § No lecture that day. § Faculty picnic with students Thursday, Feb 24, 2022 (SER Faculty, SCAI Director, Dean of Students). (there will be food) 11:30 am I will start the lecture 12:15 pm that day.
  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs MNIST dataset § Each number is stored as an anti-aliased image in black and white and is normalized to fit into a 28x28 pixel bounding box
  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9

    jgs Weight Initialization | Xavier § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) Advice: § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

    jgs Activation Functions | RELU § ReLU –– Rectified linear activation function § Popular activation function for hidden layers
  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

    jgs Activation Functions | SoftMax § Sigmoid is independent § Most popular activation function for output layers handling multiple classes. § Probabilities.
  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

    jgs Error Function | Negative Log-Likelihood § the SoftMax function is used in tandem with the negative log-likelihood. § Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. § Log facilitates the derivatives § The Log likelihood values are then in range -Infinite to 0. § Negative make it Infinite to 0 https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf
  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

    jgs Model 3 (learningRate, momentum)
  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

    jgs Updater § Training mechanisms.
  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17

    jgs Updater § Training mechanisms. § There are methods that can result in much faster network training compared to 'vanilla' gradient descent. You can set the updater using the .updater(Updater) configuration option. § E.g., momentum, RMSProp, adagrad, ADAM, NADAM, and others
  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18

    jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. / / quantity of motion of a moving body / / (product of its mass and velocity)
  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19

    jgs Updater | Nesterov Momentum / /Original Choice / /Velocity Momentum coefficient / /Nesterovs
  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21

    jgs Results Is this Good enough?
  14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24

    jgs Updater § A limitation of gradient descent is that the progress of the search can slow down if the gradient becomes flat or large curvature. § Momentum can be added to gradient descent that incorporates some inertia to updates. § Adaptive Movement Estimation (ADAM): Calculate learning rate for each input objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient § Nesterov Momentum + ADAM Nesterov-accelerated Adaptive Moment Estimation
  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25

    jgs Updater | Nadam / /Original Choice / /Velocity Momentum coefficient / /Nesterovs / / Nesterov-accelerated Adaptive Moment Estimation ADAM + Nesterov Momentum
  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 33

    jgs Reference § Deeplearning4j Suite Overview https://deeplearning4j.konduit.ai § Papers on Canvas § Source Code
  17. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.