Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Season Image Classifier CNN with PyTorch

Building a Season Image Classifier CNN with PyTorch

This project is a continuation of my previous "Building a Season Image Classifier with Feature Extraction" project that only performed color conversions and attempted to extract features based on average image color values in order to classify the images into one of four seasons (spring, summer, fall, winter). In this iteration of the project, I built a Convolutional Neural Network that was similar to LeNet5, and trained it on 400 images (100 per season) for 300 epochs.

This PPT was submitted as a my Final Project portion of my PhD coursework in Computer Vision to detail what I learned throughout my period of research during the course.

Aaron Snowberger

June 17, 2021
Tweet

More Decks by Aaron Snowberger

Other Decks in Technology

Transcript

  1. Building a Season Image Classifier
    CNN with PyTorch
    NAME: Aaron Snowberger (정보통신공학과, 30211060)
    DATE: 2021.06.17
    SUBJECT: Computer Vision (Dong-Geol Choi)

    View full-size slide


  2. This final project is an update to the midterm project for this course.
    Midterm Project:
    Building a Season Image Classifier with Color Space
    Conversion and Feature Extraction
    Final Project:
    Building a Season Image Classifier CNN with PyTorch
    Updated sections and contents will be indicated with
    a dark highlight and bright blue text. .
    2

    View full-size slide

  3. CONTENTS
    INSPIRATION
    This Season Image Classifier project was inspired
    by a Day and Night Image Classifier project as
    part of Udacity's Computer Vision training.
    01 Problem Overview
    02 Data Preprocessing
    03 CNN Design in PyTorch
    a. LeNet5 (color) - 200 training
    b. LeNet5 (grayscale) - 200 training
    c. AlexNet (attempt) - 400 training
    d. LeNet5 bigger (attempt) - 400
    e. LeNet5 (color) - 400 training
    f. LeNet5 (grayscale) - 400 training
    g. Comparison of Results
    04 Future Research Plan
    05 Conclusion
    3

    View full-size slide

  4. Problem Overview
    Image classification by Season
    1

    View full-size slide

  5. Season Image Classifier
    The Season Image Classifier project is intended to be able to accurately predict and
    label what season an image was captured in. This relies on extracting certain
    distinguishing features from the images in order to make its prediction.
    Dataset (updated):
    The dataset consists of 480 RGB color images split into two directories and four
    categories each (winter, spring, summer, fall):
    ⬥ img_train: 400 images for training (100 of each season)
    ⬥ img_test: 80 images for testing (20 of each season)
    The image dataset has been gathered from Pexels.com, a CC0 website.
    5

    View full-size slide

  6. Data Preprocessing
    Preprocess data for the CNN (same as the midterm project)
    2

    View full-size slide

  7. Load dataset & visualize input
    The image dataset folder
    structure is pre-labeled so
    images can be automatically
    labeled appropriately:
    ⬥ /img
    ⬦ /train
    ⬩ /fall
    ⬩ /spring
    ⬩ /summer
    ⬩ /winter
    ⬦ /test
    ⬩ /fall
    ⬩ /spring
    ⬩ /summer
    ⬩ /winter
    7
    01
    Data is loaded as it will be labeled:
    ● 0: /winter
    ● 1: /spring
    ● 2: /summer
    ● 3: /fall
    Each folder contains 100 images,
    so offsetting data by 100 changes
    the season we can visualize.

    View full-size slide

  8. Pre-process (standardize) data
    01 Standardize input
    a. Proportionally resize images to 96px (later 224px) along the narrowest side
    b. Center crop the images to 96x96 (later 224x224)
    02 Standardize output
    a. Use integer encoding for seasons:
    b. (0 = winter, 1 = spring, 2 = summer, 3 = fall)
    03 Standardize IMG_LIST
    a. Return a new array STD_LIST where:
    b. STD_LIST[index] is the image and label
    c. STD_LIST[index][0] is the image
    d. STD_LIST[index][1] is its label
    04 Visualize the standardized data
    8
    02

    View full-size slide

  9. Dataset: 200 training imgs, 96x96px standardized
    Visualize multiple standardized samples
    9
    02
    /winter
    Dataset: 400 training imgs, 224x224px standardized

    View full-size slide

  10. CNN Design in PyTorch
    5 Attempts, mixed results
    3

    View full-size slide

  11. Network Architectures Attempted
    In this project, I attempted to utilize
    multiple different network architectures,
    dataset sizes, color channels, learning
    rates, optimizers, and so on in an attempt
    to find the best model for this problem
    (or at least, a good enough model).
    However, my attempts were met with
    mixed results. Therefore, I’ll present all
    my findings here, as well as my reflection
    on the project: what worked well, what
    didn’t work, and possible reasons why.
    11
    List of Network Architectures:
    a. LeNet5 (color) - 200 training
    b. LeNet5 (grayscale) - 200 training
    c. AlexNet (attempt) - 400 training
    d. LeNet5 bigger (attempt) - 400
    e. LeNet5 (color) - 400 training
    f. LeNet5 (grayscale) - 400 training
    I chose to attempt these architectures
    a er researching them on this website.
    The network illustrations were helpful.
    LeNet5

    View full-size slide

  12. LeNet5 (color) - short training
    Being the simplest architecture on
    the Illustrated CNNs webpage, I
    chose to do this one first.
    I started small, with basic settings
    and simple hyperparameters.
    I noticed some training loss a er 3
    epochs, so decided to increase the
    epochs for the next training to see
    how much better it would do.
    Test accuracy was only about as
    good as guessing (< 50%).
    12
    200 training images, 96x96px standardized, batch_size 10, 3 epochs : 45%
    SUCCESS?

    View full-size slide

  13. LeNet5 (color) - long training
    A er the initial training loss for
    only 3 epochs, I increased the
    epochs to 300 with everything else
    remaining the same.
    This time, the Test accuracy was
    slightly better than just guessing
    (> 50%). Only fall’s accuracy fell
    (from 90% to 75%), but all other
    season’s accuracy increased by
    between 10-20%.
    13
    200 training images, 96x96px standardized, batch_size 10, 300 epochs : 55%
    SUCCESS?

    View full-size slide

  14. LeNet5 (grayscale) - long training
    A er working with color images, I
    decided to perform the same training
    again (300 epochs), but this time using
    grayscale images rather than color. I
    thought with grayscale images, the
    CNN might have an easier time
    recognizing patterns and extracting
    features.
    My Test Accuracy results are nearly the
    same for grayscale as they were for
    color images, although accuracy for
    winter images has fallen significantly
    (from 70% with color to 45% now).
    Comparing this Training Curve (graph)
    to the Training Curve for color, the color
    curve is closer to a line, and this one is
    more of a curve.
    14
    200 training images, 96x96px standardized, batch_size 10, 300 epochs : 52%
    SUCCESS?

    View full-size slide

  15. AlexNet (attempt)
    Next, I wanted to try to implement
    AlexNet, so I found a few helpful
    implementations (here, here, and here).
    However, a er creating the network,
    there was NO training loss whatsoever,
    and Test Accuracy fell to almost 33%
    (with 0% accuracy for winter & summer).
    Therefore, I thought I’d coded it wrong. I
    tried multiple fixes, different kernels,
    strides, learning rates, but no change.
    15
    400 training images, 96x96px standardized, batch_size 20, 30 epochs : 37%
    Epic fail!

    View full-size slide

  16. LeNet5 Big (attempt)
    Because my AlexNet implementation
    had failed, I decided to try my original
    LeNet5 architecture again, but with a
    few additional layers.
    Specifically, I added:
    1. One more pooling layer
    2. One more convolutional layer
    3. One more dropout layer
    4. One more fully-connected layer
    Additionally, I adjusted most of the
    in_channels and out_channels in
    the layers to be closer to AlexNet.
    However, about halfway through the
    training, training loss ceased entirely
    and reset to the first level - which was
    also where my AlexNet had gotten stuck.
    16
    400 training images, 224x224px standardized, batch_size 10, 100 epochs
    Epic fail!
    I realized that CNNs are more
    complicated than they look.
    At one point, I also got an “out of
    memory” error, so I moved
    everything to a GPU to continue,
    but then I was unable to visualize
    the results.

    View full-size slide

  17. LeNet5* (color) - long training
    Therefore, I returned to my original
    (working) network architecture.
    This time, I wanted to attempt to get
    better results by increasing:
    1. # of training images (200 → 400)
    2. Input image size (96x96 → 224x224)
    Using the same Network as before, with
    larger image sizes, and a larger dataset,
    the accuracy of the predictions increased
    by over 10% (55% → 66%).
    Therefore, larger data makes a difference.
    17
    400 training images, 224x224px standardized, batch_size 10, 200 : 66%
    SUCCESS?
    * Return to original model; attempt to optimize
    300 epochs 200 epochs
    I was unable to visualize test
    loss and predictions for 300
    epochs of training due to the
    CPU shutting down. But I
    wonder how much different
    it would have been.

    View full-size slide

  18. LeNet5* (grayscale) - long training
    A er successfully training on fewer
    grayscale images (200) at a lower
    resolution (96x96px), I decided to
    increase the dataset to 400
    training images, and the size of the
    images to train on to 224x224px.
    I let the model (the same one that
    had successfully trained before)
    run through about 82 epochs of
    training before I stopped it. A er
    82 epochs of training, loss
    remained constant around 1.38,
    and no learning had taken place.
    This was the same situation I’d
    encountered with my AlexNet
    implementation.
    (Was it the same problem for both?)
    18
    400 training images, 224x224px standardized, batch_size 10, 200 epochs
    * Return to original model; attempt to optimize
    Epic fail!

    View full-size slide

  19. Comparison of Results
    19
    LeNet5 (grayscale)
    LeNet5 (color) - 3 epochs : 45% accuracy . LeNet5 (grayscale) - 300 epochs : 52% accuracy .
    AlexNet (attempt)
    LeNet5 (color) - 300 epochs : 55% accuracy .
    LeNet5 bigger (attempt)
    LeNet5 (color) - larger dataset - 200 epochs : 66% accuracy .
    Epic fail!
    Success?
    Success?
    Out of all the models and training I
    attempted, the best one was my
    original LeNet5, using a larger
    training dataset (400 images) and
    larger input images (224x224px).

    View full-size slide

  20. Comparison to Hue Feature Extractor
    01 Visualize Test data
    20
    02 Determine Accuracy
    First try
    Subsequent tries
    03 Visualize Errors LeNet5 CNN (Final Project)
    Midterm Project
    “It turns out Hue is the best
    overall predictor of season.”
    In the midterm project, I
    attempted to use only color as a
    feature extractor to classify the
    images. It was only 45% accurate.
    Final Project
    My implementation of a CNN
    that is similar to the LeNet5 CNN
    produced better results than
    using color alone as a feature to
    try to classify images (66%).

    View full-size slide

  21. Conclusion & Future Research
    Seasons are difficult to classify
    4

    View full-size slide

  22. Seasons are di icult to classify
    ANALYSIS
    Images classified by season may contain a large
    variety of objects that may make them more
    difficult to classify. Therefore, color, hue, and
    other factors may also need to be considered.
    For future research, other solutions to
    this problem should be explored. This
    may include different Neural Network
    architectures or other solutions.
    Because it seems that CNNs are quite
    good at classifying images based on
    objects contained within them, and this
    season classifier problem seems to deal
    more with the overall mood / tone of an
    image than specifically with objects
    contained within each of them, it makes
    the classification problem more difficult.
    That’s partly why my initial research dealt
    with feature extraction based on color
    (Hue in particular). Perhaps it would even
    be possible to combine the two. 22

    View full-size slide

  23. Summary
    This project was an extension and continuation on my previous research into classifying
    images by season by extracting features of the images primarily based on colors, Hues,
    and lightness present in those images. This project further illustrated that classifying
    images based on season is not a simple and straightforward task.
    CNNs can be trained to classify objects contained in images with up to 90% accuracy
    (based on models such as EfficientNet and teacher-student training). However, seasonal
    images may contain a variety of different objects (trees, for example) in a variety of
    different states (blooming, covered in snow, and so on). Therefore, it seems like additional
    features, such as average Hue within the image, or colors that are present or dominant
    within the image, must also be taken into account in order to produce a better model.
    Therefore, it makes me curious if anyone else has attempted to solve this problem, and if
    so, how. This is a question for further consideration and research. How can we classify
    images not only based on objects within them, but also based on the overall mood or
    tone of the image? We could then apply this knowledge to other images such as
    classifying whether or not an image is taken from a horror movie, or romantic comedy.
    23

    View full-size slide

  24. THANKS!
    The Jupyter Notebooks used for this project
    including code and output can be found at:
    https://www.kaggle.com/aaronsnowberger/code
    24

    View full-size slide