$30 off During Our Annual Pro Sale. View Details »

JGS594 Lecture 09

JGS594 Lecture 09

Software Engineering for Machine Learning
Image Recognition II
(202202)

Javier Gonzalez-Sanchez
PRO

February 08, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs
    SER 594
    Software Engineering for
    Machine Learning
    Lecture 09: Image Recognition with DL4J II
    Dr. Javier Gonzalez-Sanchez
    [email protected]
    javiergs.engineering.asu.edu | javiergs.com
    PERALTA 230U
    Office Hours: By appointment

    View Slide

  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2
    jgs
    Career Fair
    § Master’s and PhD Online Career Fair
    Tuesday, Feb 15, 2022
    9 a.m.–4 p.m.
    § No lecture that day.
    § Faculty picnic with students
    Thursday, Feb 24, 2022 (SER Faculty, SCAI Director, Dean of Students).
    (there will be food)
    11:30 am
    I will start the lecture 12:15 pm that day.

    View Slide

  3. jgs
    Previously …

    View Slide

  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4
    jgs
    MNIST dataset
    § Each number is stored as an anti-aliased image in black and white and is
    normalized to fit into a 28x28 pixel bounding box

    View Slide

  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5
    jgs
    Model 1

    View Slide

  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 6
    jgs
    Results
    🙁

    View Slide

  7. jgs
    Model
    Case 2

    View Slide

  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8
    jgs
    Model 2

    View Slide

  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9
    jgs
    Weight Initialization | Xavier
    § A too-large initialization leads to exploding (partial derivatives)
    § A too-small initialization leads to vanishing (partial derivatives)
    Advice:
    § The mean of the activations should be zero.
    § The variance of the activations should stay the same across every layer.
    /
    / statistical measurement of
    /
    / the spread between numbers in a data set

    View Slide

  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10
    jgs
    Activation Functions | RELU
    § ReLU –– Rectified linear activation function
    § Popular activation function for hidden layers

    View Slide

  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11
    jgs
    Activation Functions | SoftMax
    § Sigmoid is independent
    § Most popular activation function for output layers handling multiple classes.
    § Probabilities.

    View Slide

  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12
    jgs
    Error Function | Negative Log-Likelihood
    § the SoftMax function is used in tandem with the negative log-likelihood.
    § Likelihood of observed data y would be produced by parameter values w
    L(y, w)
    Likelihood can be in range 0 to 1.
    § Log facilitates the derivatives
    § The Log likelihood values are then in range -Infinite to 0.
    § Negative make it Infinite to 0
    https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf

    View Slide

  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13
    jgs
    Results

    View Slide

  14. jgs
    Model
    Case 3

    View Slide

  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15
    jgs
    Model 3
    (learningRate, momentum)

    View Slide

  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16
    jgs
    Updater
    § Training mechanisms.

    View Slide

  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17
    jgs
    Updater
    § Training mechanisms.
    § There are methods that can result in much faster network training compared
    to 'vanilla' gradient descent. You can set the updater using the
    .updater(Updater) configuration option.
    § E.g., momentum, RMSProp, adagrad, ADAM, NADAM, and others

    View Slide

  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18
    jgs
    Updater
    § A limitation of gradient descent is that the progress of the search can
    slow down if the gradient becomes flat or large curvature.
    § Momentum can be added to gradient descent that incorporates some
    inertia to updates.
    /
    / quantity of motion of a moving body
    /
    / (product of its mass and velocity)

    View Slide

  19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19
    jgs
    Updater | Nesterov Momentum
    /
    /Original Choice
    /
    /Velocity
    Momentum coefficient
    /
    /Nesterovs

    View Slide

  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20
    jgs
    Results

    View Slide

  21. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21
    jgs
    Results
    Is this
    Good enough?

    View Slide

  22. jgs
    Model
    Case 4

    View Slide

  23. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 23
    jgs
    Model 4

    View Slide

  24. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24
    jgs
    Updater
    § A limitation of gradient descent is that the progress of the search can
    slow down if the gradient becomes flat or large curvature.
    § Momentum can be added to gradient descent that incorporates some
    inertia to updates.
    § Adaptive Movement Estimation (ADAM): Calculate learning rate for
    each input objective function and further smooths the search process
    by using an exponentially decreasing moving average of the gradient
    § Nesterov Momentum + ADAM
    Nesterov-accelerated Adaptive Moment Estimation

    View Slide

  25. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25
    jgs
    Updater | Nadam
    /
    /Original Choice
    /
    /Velocity
    Momentum coefficient
    /
    /Nesterovs
    /
    / Nesterov-accelerated Adaptive Moment Estimation
    ADAM + Nesterov Momentum

    View Slide

  26. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 26
    jgs
    Results

    View Slide

  27. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 27
    jgs
    Results
    👍

    View Slide

  28. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 28
    jgs
    Model 4
    /
    / Adam() ?

    View Slide

  29. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 32
    jgs
    Questions

    View Slide

  30. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 33
    jgs
    Reference
    § Deeplearning4j Suite Overview
    https://deeplearning4j.konduit.ai
    § Papers on Canvas
    § Source Code

    View Slide

  31. jgs
    SER 594 Software Engineering for Machine Learning
    Javier Gonzalez-Sanchez, Ph.D.
    [email protected]
    Spring 2022
    Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University.
    They cannot be distributed or used for another purpose.

    View Slide