Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computer Vision Project 1: Facial Keypoints Detection

Computer Vision Project 1: Facial Keypoints Detection

As a part of Udacity's Computer Vision Nanodegree, I defined a Convolutional Neural Network in PyTorch and trained a model to be able to identify 68 facial keypoints on a given image of a face.

This PPT was submitted as a part of my PhD coursework to detail my project process and reflection.

Aaron Snowberger

May 10, 2021
Tweet

More Decks by Aaron Snowberger

Other Decks in Technology

Transcript

  1. Computer Vision Project 1:
    Facial Keypoints Detection
    NAME: Aaron Snowberger (정보통신공학과, 30211060)
    DATE: 2021.05.10
    SUBJECT: Computer Vision (Dong-Geol Choi)
    CURRICULUM: Udacity Computer Vision | Udacity GitHub
    Project Repo | Certificate of Completion

    View Slide

  2. CONTENTS
    01 Problem Overview
    02 Load & Visualize the Data
    a. First example images
    b. Data Preprocessing
    03 Network Architecture & Training
    a. Visualize keypoints before Training
    b. Define the CNN
    c. Train the CNN!
    d. Visualize keypoints a er Training
    e. Feature visualization
    04 Fun with Keypoints
    05 Project Reflection
    06 Conclusion & Future Research
    2

    View Slide

  3. Problem Overview
    To identify 68 keypoints on any given face image
    1

    View Slide

  4. Problem Overview
    In this project, I defined a Convolutional Neural Network (CNN)
    architecture and trained a model to perform facial keypoint
    detection on numerous images. Each training and test image
    contained 68 keypoints with coordinates (x,y) for that face. The
    keypoints (pictured below) mark important areas of the face such as
    the jaw line, eyes, eyebrows, nose, and mouth.
    Total data set: 5770 color images
    (from the YouTube Faces Dataset)
    Training: 3462 images
    Testing: 2308 images
    4

    View Slide

  5. Load & Visualize the Data
    Understanding the data & preprocessing it
    2

    View Slide

  6. First Example Images
    The following are a few of the sample images that were loaded in
    order to better visualize and understand the problem.
    6
    Image information
    before preprocessing.
    Index (y, x, colors) (keypoints, coordinates)
    First 4 keypoints of
    the third image.
    Keypoints data was
    loaded from a CSV file.

    View Slide

  7. Data Preprocessing
    Transforms were performed on the images to standardize them.
    1. Normalize(object)
    a. Converts a color image to grayscale with values between [0,1]
    b. Normalizes keypoints to be in a range of [-1,1]
    2. Rescale(object)
    a. Rescales image to desired size (250px at the smallest side)
    3. RandomCrop(object)
    a. Crops image in a square, randomly (224 x 224 px)
    4. ToTensor(object)
    a. Converts numpy images to torch images
    7
    rescale = Rescale(100)
    crop = RandomCrop(50)
    composed = transforms.Compose([Rescale(250), RandomCrop(224)])
    Example use:

    View Slide

  8. Network Architecture
    Convolutional Neural Network in PyTorch
    3

    View Slide

  9. Visualize keypoints before Training
    9
    green: True keypoints given by CSV file
    pink: Predicted keypoints (before Training)

    View Slide

  10. Define CNN Architecture
    10

    View Slide

  11. Train the Network!
    11
    Right around Epoch 16-17, the training loss started to level off near or below 0.03.
    Perhaps 20 Epochs would have been enough training.

    View Slide

  12. Visualize keypoints after Training
    12
    green: True keypoints given by CSV file
    pink: Predicted keypoints (after Training)

    View Slide

  13. Feature visualization
    13
    Extract a single filter (by index) from the first
    convolutional layer in order to visualize the weights
    that make up each convolutional kernel (size 5x5)
    Filter an image to see the effect of a convolutional
    kernel and get an idea about what features it detects.
    This filter emphasizes the right side of the face most clearly.
    Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.

    View Slide

  14. Fun with Keypoints
    Detect faces, find keypoints, add stickers
    4

    View Slide

  15. Detect all faces & find keypoints
    15
    1. Detect faces with Haar Cascades.
    2. Load in our Trained model.
    3. Add padding to the faces.
    4. Detect and display keypoints.

    View Slide

  16. Add stickers
    16
    1. Load .png sticker file
    2. Detect alpha channel
    (transparency)
    3. Display facial keypoints
    4. Overlay sticker where
    pixels are non-transparent
    (alpha > 0)

    View Slide

  17. Project Reflection
    What I learned through 4 iterations of the CNN
    5

    View Slide

  18. Details on 4 iterations of the CNN
    18
    Attempt #1: (Warm up, trial run)
    Optimizer: optim.SGD() - I learned it first
    Loss function: BCEWithLogitsLoss() - mistake, nan
    Architecture: 2 convolutional layers + max pooling
    3 fully connected layers
    2 dropout layers
    Epochs: 1
    Batch_size: 10
    Training loss: nan
    Attempt #3: (Good, could it be better?)
    Optimizer: optim.Adam()
    Loss function: SmoothL1Loss()
    Architecture: 5 convolutional layers + max pooling
    Batch normalization + 2 dropout layers
    5 fully connected layers + normalization
    Epochs: 15
    Batch_size: 32
    Training loss: ~ 0.09 (before training canceled)
    Attempt #2: (Fix the loss function)
    Optimizer: optim.SGD() - I learned it first
    Loss function: MSELoss() - not for classifications
    Architecture: 2 convolutional layers + max pooling
    3 fully connected layers
    2 dropout layers
    Epochs: 5
    Batch_size: 12
    Training loss: 0.26
    Attempt #4: (Final, successful)
    Optimizer: optim.Adam()
    Loss: SmoothL1Loss()
    Architecture: 3 convolutional layers + max pooling
    4 fully connected layers
    3 dropout layers
    Epochs: 30
    Batch_size: 64 & 128 (too big, froze)
    Training loss: < 0.03

    View Slide

  19. Di erent CNN Architectures
    In truth, CNN architectures are still quite new to me.
    I understand the basic concepts, but not how best to optimize them, nor why some layers
    are used more o en and why some are used less o en. When comparing my chosen
    architecture, training, hyperparameters, and results to other examples, I found myself
    wondering why certain models work better than others, and what I could do to better
    optimize my own model. More research and experience is needed to better understand.
    19
    VGG-16 Architecture

    View Slide

  20. Conclusion & Future Research
    Deep(er) Learning
    6

    View Slide

  21. Optimizing CNNs
    The goal of this project was to identify faces in
    a given image and use a trained CNN model to
    predict 68 keypoints for each face. Through
    working on the project, I was able to learn the
    basics of building a CNN architecture and
    training it. However, this is a deep subject and
    this project only skimmed the surface.
    For future research, optimizing CNN
    architectures should be a primary focus,
    particularly determining optimal
    hyperparameters and layers.
    Additionally, I hope to explore transfer learning
    in more depth, where a pre-trained CNN model
    is used on other problems by adding a final fully
    connected layer that is specific for the problem
    being investigated.
    21

    View Slide

  22. THANKS!
    The code, output, and Jupyter Notebooks
    used in this project can be found here:
    https://github.com/jekkilekki/computer-vision/tree/
    master/Facial%20Keypoints%20Detector
    22

    View Slide