Image Recognition with DL4J Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
jgs Guidelines 1. Get data 2. Create a Model 3. Train the Model (i.e., calculate the W, Bias, and so on) 4. Test the Model (accuracy, recall, precision, F1-score, confusion matrix) 5. Deploy the model
jgs Definition Using neural networks for image classification is a very straightforward process. 1. We need a dataset with multiple images that we can use to train our neural network. We will use as inputs every single pixel in every image. 2. We expect to get as output a category that we are looking forward to recognizing.
jgs Problem Recognize handwriting numbers 0 to 9. Steps: 1. Load the dataset 2. Create, train, and evaluate a model 3. Save the model 4. Put our model in an application
jgs Publicly Available Datasets § MNIST dataset. 70, 000 samples of handwritten digits written by high school students and employees of the United States Census Bureau § Iris dataset. which contains three classes of 50 instances each, where each class refers to a type of iris plant; § TinyImageNet (a subset of ImageNet) dataset. An image dataset organized according to the WordNet hierarchy (~1000 per noun); § CIFAR-10 dataset. A a dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class; § Labeled Faces in the Wild, a database of face photographs; and, § Curve Fragment Ground-Truth Dataset, which is used for evaluating edge detection or boundary detection methods.
jgs Note According to the MNIST website, 1. a one-layer neural network (trained with this dataset) could achieve an error rate of 12% (pretty bad) –– 0.12 2. a deep convolutional neural network could achieve an error rate below 0.25% –– 0.0025 Remember: Go for a value < 0.05
jgs Load Data § DeepLearning4j comes with out-of-the-box dataset iterators for standard datasets, including one for the MNIST dataset. § The class MnistDataSetIterator allows us to load these standard datasets. § The constructor for MnistDataSetIterator receives three parameters: 1. The batch size, i.e., the number of training samples to work through before comparing to the expected output and calculate the error. 2. The total number of samples in the dataset. 3. A flag to indicate whether the dataset should be binarized (images considered in black & white without shades of gray) or not.
jgs First Attempt § Single hidden layer neural network. § Each pixel in the images became an input 28 x 28 = 784 inputs. § Each of the digits that we want to predict became an output; therefore, we have ten neurons in the output layer. § What about the hidden layer? Use the sum of input and output neurons, not a rule, but a starting point. 794 neurons in the hidden layer.
Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.