Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Season Image Classifier CNN with PyTorch

Building a Season Image Classifier CNN with PyTorch

This project is a continuation of my previous "Building a Season Image Classifier with Feature Extraction" project that only performed color conversions and attempted to extract features based on average image color values in order to classify the images into one of four seasons (spring, summer, fall, winter). In this iteration of the project, I built a Convolutional Neural Network that was similar to LeNet5, and trained it on 400 images (100 per season) for 300 epochs.

This PPT was submitted as a my Final Project portion of my PhD coursework in Computer Vision to detail what I learned throughout my period of research during the course.

Aaron Snowberger

June 17, 2021
Tweet

More Decks by Aaron Snowberger

Other Decks in Technology

Transcript

  1. Building a Season Image Classifier CNN with PyTorch NAME: Aaron

    Snowberger (정보통신공학과, 30211060) DATE: 2021.06.17 SUBJECT: Computer Vision (Dong-Geol Choi)
  2. “ This final project is an update to the midterm

    project for this course. Midterm Project: Building a Season Image Classifier with Color Space Conversion and Feature Extraction Final Project: Building a Season Image Classifier CNN with PyTorch Updated sections and contents will be indicated with a dark highlight and bright blue text. . 2
  3. CONTENTS INSPIRATION This Season Image Classifier project was inspired by

    a Day and Night Image Classifier project as part of Udacity's Computer Vision training. 01 Problem Overview 02 Data Preprocessing 03 CNN Design in PyTorch a. LeNet5 (color) - 200 training b. LeNet5 (grayscale) - 200 training c. AlexNet (attempt) - 400 training d. LeNet5 bigger (attempt) - 400 e. LeNet5 (color) - 400 training f. LeNet5 (grayscale) - 400 training g. Comparison of Results 04 Future Research Plan 05 Conclusion 3
  4. Season Image Classifier The Season Image Classifier project is intended

    to be able to accurately predict and label what season an image was captured in. This relies on extracting certain distinguishing features from the images in order to make its prediction. Dataset (updated): The dataset consists of 480 RGB color images split into two directories and four categories each (winter, spring, summer, fall): ⬥ img_train: 400 images for training (100 of each season) ⬥ img_test: 80 images for testing (20 of each season) The image dataset has been gathered from Pexels.com, a CC0 website. 5
  5. Load dataset & visualize input The image dataset folder structure

    is pre-labeled so images can be automatically labeled appropriately: ⬥ /img ⬦ /train ⬩ /fall ⬩ /spring ⬩ /summer ⬩ /winter ⬦ /test ⬩ /fall ⬩ /spring ⬩ /summer ⬩ /winter 7 01 Data is loaded as it will be labeled: • 0: /winter • 1: /spring • 2: /summer • 3: /fall Each folder contains 100 images, so offsetting data by 100 changes the season we can visualize.
  6. Pre-process (standardize) data 01 Standardize input a. Proportionally resize images

    to 96px (later 224px) along the narrowest side b. Center crop the images to 96x96 (later 224x224) 02 Standardize output a. Use integer encoding for seasons: b. (0 = winter, 1 = spring, 2 = summer, 3 = fall) 03 Standardize IMG_LIST a. Return a new array STD_LIST where: b. STD_LIST[index] is the image and label c. STD_LIST[index][0] is the image d. STD_LIST[index][1] is its label 04 Visualize the standardized data 8 02
  7. Dataset: 200 training imgs, 96x96px standardized Visualize multiple standardized samples

    9 02 /winter Dataset: 400 training imgs, 224x224px standardized
  8. Network Architectures Attempted In this project, I attempted to utilize

    multiple different network architectures, dataset sizes, color channels, learning rates, optimizers, and so on in an attempt to find the best model for this problem (or at least, a good enough model). However, my attempts were met with mixed results. Therefore, I’ll present all my findings here, as well as my reflection on the project: what worked well, what didn’t work, and possible reasons why. 11 List of Network Architectures: a. LeNet5 (color) - 200 training b. LeNet5 (grayscale) - 200 training c. AlexNet (attempt) - 400 training d. LeNet5 bigger (attempt) - 400 e. LeNet5 (color) - 400 training f. LeNet5 (grayscale) - 400 training I chose to attempt these architectures a er researching them on this website. The network illustrations were helpful. LeNet5
  9. LeNet5 (color) - short training Being the simplest architecture on

    the Illustrated CNNs webpage, I chose to do this one first. I started small, with basic settings and simple hyperparameters. I noticed some training loss a er 3 epochs, so decided to increase the epochs for the next training to see how much better it would do. Test accuracy was only about as good as guessing (< 50%). 12 200 training images, 96x96px standardized, batch_size 10, 3 epochs : 45% SUCCESS?
  10. LeNet5 (color) - long training A er the initial training

    loss for only 3 epochs, I increased the epochs to 300 with everything else remaining the same. This time, the Test accuracy was slightly better than just guessing (> 50%). Only fall’s accuracy fell (from 90% to 75%), but all other season’s accuracy increased by between 10-20%. 13 200 training images, 96x96px standardized, batch_size 10, 300 epochs : 55% SUCCESS?
  11. LeNet5 (grayscale) - long training A er working with color

    images, I decided to perform the same training again (300 epochs), but this time using grayscale images rather than color. I thought with grayscale images, the CNN might have an easier time recognizing patterns and extracting features. My Test Accuracy results are nearly the same for grayscale as they were for color images, although accuracy for winter images has fallen significantly (from 70% with color to 45% now). Comparing this Training Curve (graph) to the Training Curve for color, the color curve is closer to a line, and this one is more of a curve. 14 200 training images, 96x96px standardized, batch_size 10, 300 epochs : 52% SUCCESS?
  12. AlexNet (attempt) Next, I wanted to try to implement AlexNet,

    so I found a few helpful implementations (here, here, and here). However, a er creating the network, there was NO training loss whatsoever, and Test Accuracy fell to almost 33% (with 0% accuracy for winter & summer). Therefore, I thought I’d coded it wrong. I tried multiple fixes, different kernels, strides, learning rates, but no change. 15 400 training images, 96x96px standardized, batch_size 20, 30 epochs : 37% Epic fail!
  13. LeNet5 Big (attempt) Because my AlexNet implementation had failed, I

    decided to try my original LeNet5 architecture again, but with a few additional layers. Specifically, I added: 1. One more pooling layer 2. One more convolutional layer 3. One more dropout layer 4. One more fully-connected layer Additionally, I adjusted most of the in_channels and out_channels in the layers to be closer to AlexNet. However, about halfway through the training, training loss ceased entirely and reset to the first level - which was also where my AlexNet had gotten stuck. 16 400 training images, 224x224px standardized, batch_size 10, 100 epochs Epic fail! I realized that CNNs are more complicated than they look. At one point, I also got an “out of memory” error, so I moved everything to a GPU to continue, but then I was unable to visualize the results.
  14. LeNet5* (color) - long training Therefore, I returned to my

    original (working) network architecture. This time, I wanted to attempt to get better results by increasing: 1. # of training images (200 → 400) 2. Input image size (96x96 → 224x224) Using the same Network as before, with larger image sizes, and a larger dataset, the accuracy of the predictions increased by over 10% (55% → 66%). Therefore, larger data makes a difference. 17 400 training images, 224x224px standardized, batch_size 10, 200 : 66% SUCCESS? * Return to original model; attempt to optimize 300 epochs 200 epochs I was unable to visualize test loss and predictions for 300 epochs of training due to the CPU shutting down. But I wonder how much different it would have been.
  15. LeNet5* (grayscale) - long training A er successfully training on

    fewer grayscale images (200) at a lower resolution (96x96px), I decided to increase the dataset to 400 training images, and the size of the images to train on to 224x224px. I let the model (the same one that had successfully trained before) run through about 82 epochs of training before I stopped it. A er 82 epochs of training, loss remained constant around 1.38, and no learning had taken place. This was the same situation I’d encountered with my AlexNet implementation. (Was it the same problem for both?) 18 400 training images, 224x224px standardized, batch_size 10, 200 epochs * Return to original model; attempt to optimize Epic fail!
  16. Comparison of Results 19 LeNet5 (grayscale) LeNet5 (color) - 3

    epochs : 45% accuracy . LeNet5 (grayscale) - 300 epochs : 52% accuracy . AlexNet (attempt) LeNet5 (color) - 300 epochs : 55% accuracy . LeNet5 bigger (attempt) LeNet5 (color) - larger dataset - 200 epochs : 66% accuracy . Epic fail! Success? Success? Out of all the models and training I attempted, the best one was my original LeNet5, using a larger training dataset (400 images) and larger input images (224x224px).
  17. Comparison to Hue Feature Extractor 01 Visualize Test data 20

    02 Determine Accuracy First try Subsequent tries 03 Visualize Errors LeNet5 CNN (Final Project) Midterm Project “It turns out Hue is the best overall predictor of season.” In the midterm project, I attempted to use only color as a feature extractor to classify the images. It was only 45% accurate. Final Project My implementation of a CNN that is similar to the LeNet5 CNN produced better results than using color alone as a feature to try to classify images (66%).
  18. Seasons are di icult to classify ANALYSIS Images classified by

    season may contain a large variety of objects that may make them more difficult to classify. Therefore, color, hue, and other factors may also need to be considered. For future research, other solutions to this problem should be explored. This may include different Neural Network architectures or other solutions. Because it seems that CNNs are quite good at classifying images based on objects contained within them, and this season classifier problem seems to deal more with the overall mood / tone of an image than specifically with objects contained within each of them, it makes the classification problem more difficult. That’s partly why my initial research dealt with feature extraction based on color (Hue in particular). Perhaps it would even be possible to combine the two. 22
  19. Summary This project was an extension and continuation on my

    previous research into classifying images by season by extracting features of the images primarily based on colors, Hues, and lightness present in those images. This project further illustrated that classifying images based on season is not a simple and straightforward task. CNNs can be trained to classify objects contained in images with up to 90% accuracy (based on models such as EfficientNet and teacher-student training). However, seasonal images may contain a variety of different objects (trees, for example) in a variety of different states (blooming, covered in snow, and so on). Therefore, it seems like additional features, such as average Hue within the image, or colors that are present or dominant within the image, must also be taken into account in order to produce a better model. Therefore, it makes me curious if anyone else has attempted to solve this problem, and if so, how. This is a question for further consideration and research. How can we classify images not only based on objects within them, but also based on the overall mood or tone of the image? We could then apply this knowledge to other images such as classifying whether or not an image is taken from a horror movie, or romantic comedy. 23
  20. THANKS! The Jupyter Notebooks used for this project including code

    and output can be found at: https://www.kaggle.com/aaronsnowberger/code 24