JGS594 Lecture 13

jgs SER 594 Software Engineering for Machine Learning Lecture 13:
Midterm Review Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment

jgs Part 0

Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 3
jgs 2. Machine Learning

jgs Part 1

jgs W, X, Y https://medium.com/analytics-vidhya/neural-networks-in-a-nutshell-with-java-b4a635a2c4af

jgs Activation Functions

jgs Neurons and Layers https://medium.com/analytics-vidhya/neural-networks-in-a-nutshell-with-java-b4a635a2c4af

jgs Error/Cost Function

jgs Back propagation Did I mention something called learning rate? Review Details (Math) Here: https://medium.com/analytics-vidhya/neural-networks-in-a-nutshell-with-java-b4a635a2c4af

jgs Code https://github.com/javiergs/Medium/blob/main/NeuralNetwork/BasicNeuralNetwork.java

jgs Part 2

jgs ML Frameworks and Libraries

jgs ND4J Input (coordinates, value)

jgs DL4J | Our Model * Dense layer – a layer that is deeply connected with its preceding layer which means the neurons of the layer are connected to every neuron of its preceding layer.

jgs DL4J | Our Model

jgs DL4J | Training

jgs DL4J | Testing

jgs DL4J | Our Model

jgs DL4J | Training

jgs A sneak peek at the Code

jgs Javier Gonzalez-Sanchez | SER332 | Spring 2018 | 21
… Go for a value < 0.05

jgs Part 3

jgs Evaluation

jgs Definition positive negative TP FP FN TN positive negative FN X FP X TN Accuracy = TP + TN / TP + TN + FP + FN + X

jgs Definition positive negative TP FP FN TN positive negative FN X FP X TN / / How much we can trust the model when predict a Positive Precision = TP / TP + FP / / Measure the ability of the model to find all Positive units Recall = TP / TP + FN

jgs Definition positive negative TP FP FN TN positive negative FN X FP X TN Precision = TP / TP + FP / / predicted Recall = TP / TP + FN / / real F1-score = 2 * Precision * Recall / Precision + Recall

jgs Part 4

jgs Load Data (28 x 28) x 70,000

jgs Code

jgs Model 1

jgs Evaluation (Number of classes)

jgs Weight Initialization | Xavier § A too-large initialization leads to exploding (partial derivatives) § A too-small initialization leads to vanishing (partial derivatives) Advice: § The mean of the activations should be zero. § The variance of the activations should stay the same across every layer. / / statistical measurement of / / the spread between numbers in a data set

jgs Activation Functions | RELU § ReLU –– Rectified linear activation function § Popular activation function for hidden layers

jgs Activation Functions | SoftMax § Sigmoid is independent § Most popular activation function for output layers handling multiple classes. § Probabilities.

jgs Error Function | Negative Log-Likelihood § the SoftMax function is used in tandem with the negative log-likelihood. § Likelihood of observed data y would be produced by parameter values w L(y, w) Likelihood can be in range 0 to 1. § Log facilitates the derivatives § The Log likelihood values are then in range -Infinite to 0. § Negative make it Infinite to 0 https://hea-www.harvard.edu/AstroStat/aas227_2016/lecture1_Robinson.pdf

jgs Updater | Nadam / /Original Choice / /Velocity Momentum coefficient / /Nesterovs / / Nesterov-accelerated Adaptive Moment Estimation ADAM + Nesterov Momentum

jgs Part 5

jgs Epoch § An epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed.

jgs A Second Option RecordReader recordReader = new CSVRecordReader (0, ',’)); // skipNumLines, delimiter recordReader.initialize( new FileSplit( new ClassPathResource("iris.txt").getFile()) ); DataSetIterator iterator = new RecordReaderDataSetIterator( recordReader, // source 150, // rows 4, // inputs 3 // labels );

jgs A Second Option DataSet allData = iterator.next(); allData.shuffle(42); //System.currentTimeMillis() // 65% training and 35% testing SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.65); DataSet trainingData = testAndTrain.getTrain(); DataSet testData = testAndTrain.getTest();

jgs Listeners // add this before calling fit() model.setListeners(new MyListener());

jgs MyListener

jgs Output …

jgs Output 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 401 421 441 461 481 501 521 541 561 581 601 621 641 661 681 701 721 741 761 781 801 821 841 861 881 901 921 941 961 981

jgs Save the Model

jgs Load the Model Load the updater

jgs Part 6

jgs Application

jgs Application / / int [] y = model.predict (input) / / y[0]

jgs Can we create a GUI? SEPAL LENGTH: SEPAL WIDTH: PETAL LENGTH: PETAL WIDTH: IRIS TYPE: Calculate

jgs Part 7

jgs Convolution § Convolutions have been used for a long time in image processing to blur, sharpen images, enhance edges, and emboss. 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 4 2 2 3 4 3 4 3 4 Image matrix: Filter: Feature: (RI x CI) (RF x CF) (RI-RF+1 x CI-CF+1)

jgs Filters § There are a set of few filters that are used to perform a few tasks. blur sharp borders 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0 -1 0 -1 5 -1 0 -1 0 -1 0 1 -2 0 2 -1 0 1 -1 -2 -1 0 0 0 1 2 1 horizontal vertical

jgs Stride § The number of pixels which are shift over the input matrix. § When the stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is equaled to 2, then we move the filters to 2 pixels at a time, etc. 11 21 31 41 51 12 22 32 42 52 13 23 33 43 53 14 24 34 44 54 15 25 35 45 54 61 62 63 64 65 0 1 2 3 4 16 26 36 46 55 10 20 30 40 50 66 60 5 6 1 1 1 1 1 1 1 1 1 99 117 135 279 * =

jgs Padding § The pixel in the corner will only get covers one time, but the middle pixels will get covered more than once. § Padding refers to the number of pixels added to an image when it is being processed 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

jgs Pooling § Pooling is downscaling of the image obtained from the previous layers. § It can be compared to shrinking an image to reduce its pixel density § Options: Max-pooling, Average-pooling, Sum-pooling 11 21 31 12 22 32 13 23 33 0 2 3 10 20 30 1 2x2 Max pooling 33 11 31 13

jgs Code

jgs Test Yourselves

jgs Let’s Work

jgs Questions

jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,
Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.

JGS594 Lecture 13

JGS594 Lecture 13

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Featured

Transcript