Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computer Vision Project 1: Facial Keypoints Detection

Computer Vision Project 1: Facial Keypoints Detection

As a part of Udacity's Computer Vision Nanodegree, I defined a Convolutional Neural Network in PyTorch and trained a model to be able to identify 68 facial keypoints on a given image of a face.

This PPT was submitted as a part of my PhD coursework to detail my project process and reflection.

Aaron Snowberger

May 10, 2021
Tweet

More Decks by Aaron Snowberger

Other Decks in Technology

Transcript

  1. Computer Vision Project 1: Facial Keypoints Detection NAME: Aaron Snowberger

    (정보통신공학과, 30211060) DATE: 2021.05.10 SUBJECT: Computer Vision (Dong-Geol Choi) CURRICULUM: Udacity Computer Vision | Udacity GitHub Project Repo | Certificate of Completion
  2. CONTENTS 01 Problem Overview 02 Load & Visualize the Data

    a. First example images b. Data Preprocessing 03 Network Architecture & Training a. Visualize keypoints before Training b. Define the CNN c. Train the CNN! d. Visualize keypoints a er Training e. Feature visualization 04 Fun with Keypoints 05 Project Reflection 06 Conclusion & Future Research 2
  3. Problem Overview In this project, I defined a Convolutional Neural

    Network (CNN) architecture and trained a model to perform facial keypoint detection on numerous images. Each training and test image contained 68 keypoints with coordinates (x,y) for that face. The keypoints (pictured below) mark important areas of the face such as the jaw line, eyes, eyebrows, nose, and mouth. Total data set: 5770 color images (from the YouTube Faces Dataset) Training: 3462 images Testing: 2308 images 4
  4. First Example Images The following are a few of the

    sample images that were loaded in order to better visualize and understand the problem. 6 Image information before preprocessing. Index (y, x, colors) (keypoints, coordinates) First 4 keypoints of the third image. Keypoints data was loaded from a CSV file.
  5. Data Preprocessing Transforms were performed on the images to standardize

    them. 1. Normalize(object) a. Converts a color image to grayscale with values between [0,1] b. Normalizes keypoints to be in a range of [-1,1] 2. Rescale(object) a. Rescales image to desired size (250px at the smallest side) 3. RandomCrop(object) a. Crops image in a square, randomly (224 x 224 px) 4. ToTensor(object) a. Converts numpy images to torch images 7 rescale = Rescale(100) crop = RandomCrop(50) composed = transforms.Compose([Rescale(250), RandomCrop(224)]) Example use:
  6. Visualize keypoints before Training 9 green: True keypoints given by

    CSV file pink: Predicted keypoints (before Training)
  7. Train the Network! 11 Right around Epoch 16-17, the training

    loss started to level off near or below 0.03. Perhaps 20 Epochs would have been enough training.
  8. Visualize keypoints after Training 12 green: True keypoints given by

    CSV file pink: Predicted keypoints (after Training)
  9. Feature visualization 13 Extract a single filter (by index) from

    the first convolutional layer in order to visualize the weights that make up each convolutional kernel (size 5x5) Filter an image to see the effect of a convolutional kernel and get an idea about what features it detects. This filter emphasizes the right side of the face most clearly. Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.
  10. Detect all faces & find keypoints 15 1. Detect faces

    with Haar Cascades. 2. Load in our Trained model. 3. Add padding to the faces. 4. Detect and display keypoints.
  11. Add stickers 16 1. Load .png sticker file 2. Detect

    alpha channel (transparency) 3. Display facial keypoints 4. Overlay sticker where pixels are non-transparent (alpha > 0)
  12. Details on 4 iterations of the CNN 18 Attempt #1:

    (Warm up, trial run) Optimizer: optim.SGD() - I learned it first Loss function: BCEWithLogitsLoss() - mistake, nan Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 1 Batch_size: 10 Training loss: nan Attempt #3: (Good, could it be better?) Optimizer: optim.Adam() Loss function: SmoothL1Loss() Architecture: 5 convolutional layers + max pooling Batch normalization + 2 dropout layers 5 fully connected layers + normalization Epochs: 15 Batch_size: 32 Training loss: ~ 0.09 (before training canceled) Attempt #2: (Fix the loss function) Optimizer: optim.SGD() - I learned it first Loss function: MSELoss() - not for classifications Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 5 Batch_size: 12 Training loss: 0.26 Attempt #4: (Final, successful) Optimizer: optim.Adam() Loss: SmoothL1Loss() Architecture: 3 convolutional layers + max pooling 4 fully connected layers 3 dropout layers Epochs: 30 Batch_size: 64 & 128 (too big, froze) Training loss: < 0.03
  13. Di erent CNN Architectures In truth, CNN architectures are still

    quite new to me. I understand the basic concepts, but not how best to optimize them, nor why some layers are used more o en and why some are used less o en. When comparing my chosen architecture, training, hyperparameters, and results to other examples, I found myself wondering why certain models work better than others, and what I could do to better optimize my own model. More research and experience is needed to better understand. 19 VGG-16 Architecture
  14. Optimizing CNNs The goal of this project was to identify

    faces in a given image and use a trained CNN model to predict 68 keypoints for each face. Through working on the project, I was able to learn the basics of building a CNN architecture and training it. However, this is a deep subject and this project only skimmed the surface. For future research, optimizing CNN architectures should be a primary focus, particularly determining optimal hyperparameters and layers. Additionally, I hope to explore transfer learning in more depth, where a pre-trained CNN model is used on other problems by adding a final fully connected layer that is specific for the problem being investigated. 21
  15. THANKS! The code, output, and Jupyter Notebooks used in this

    project can be found here: https://github.com/jekkilekki/computer-vision/tree/ master/Facial%20Keypoints%20Detector 22