Connecting the Dots Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
jgs Assignment 03 § Create 3 model for the Iris Dataset (a CSV table) § Play with activation function, loss/error function, weight initialization, updaters, and epochs and create a good model, a bad model, an some in between. § Explain in detail your choices and the results for the 3 models § Submit a paper (PDF) with your models (DL4J code) and their description/explanations. § You have a week to work as usual
jgs Weight Initialization a) UNIFORM § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) b) XAVIER § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set
jgs Error Function § Mean squared error (MSE) § Negative Log Likelihood (NLL) Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. Log facilitates the derivatives Log likelihood values are then in range -Infinite to 0. Negative make it Infinite to 0 Used in tandem with the SoftMax.
jgs Epoch § An epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed.
jgs Epoch § An epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed. How many Epochs in our XOR made from Scratch?
jgs Epoch § An epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed. How many Epochs in our MNIST example?
jgs Epoch value § When training algorithms can run into several (hundred, thousands) of epochs, and the process is set to continue until the model error is sufficiently minimized. § Tutorials and examples use values like 10
jgs Homework 03 Write a paper: § Report 3 attempts to create a model (good, bad, regular). § Compare the differences and how did you select the configuration parameters § Explain your network architecture and how you decided it § Add pictures of your code and the eval.stats() per model Do not forget Academic Integrity!
jgs class ScoreIterationListener § Score iteration listener. Reports the score (value of the loss function ) during training every N iterations //print the score with every 1 iteration model.setListeners(new ScoreIterationListener(1));
jgs class TimeIterationListener § It displays into INFO logs the remaining time in minutes and the date of the end of the process. § Remaining time is estimated from the amount of time for training so far, and the total number of iterations specified by the user
Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.