Aaron Snowberger
May 10, 2021
43

# Computer Vision Project 1: Facial Keypoints Detection

As a part of Udacity's Computer Vision Nanodegree, I defined a Convolutional Neural Network in PyTorch and trained a model to be able to identify 68 facial keypoints on a given image of a face.

This PPT was submitted as a part of my PhD coursework to detail my project process and reflection.

May 10, 2021

## Transcript

1. ### Computer Vision Project 1: Facial Keypoints Detection NAME: Aaron Snowberger

(정보통신공학과, 30211060) DATE: 2021.05.10 SUBJECT: Computer Vision (Dong-Geol Choi) CURRICULUM: Udacity Computer Vision | Udacity GitHub Project Repo | Certiﬁcate of Completion
2. ### CONTENTS 01 Problem Overview 02 Load & Visualize the Data

a. First example images b. Data Preprocessing 03 Network Architecture & Training a. Visualize keypoints before Training b. Deﬁne the CNN c. Train the CNN! d. Visualize keypoints a er Training e. Feature visualization 04 Fun with Keypoints 05 Project Reﬂection 06 Conclusion & Future Research 2

image 1
4. ### Problem Overview In this project, I deﬁned a Convolutional Neural

Network (CNN) architecture and trained a model to perform facial keypoint detection on numerous images. Each training and test image contained 68 keypoints with coordinates (x,y) for that face. The keypoints (pictured below) mark important areas of the face such as the jaw line, eyes, eyebrows, nose, and mouth. Total data set: 5770 color images (from the YouTube Faces Dataset) Training: 3462 images Testing: 2308 images 4

it 2
6. ### First Example Images The following are a few of the

sample images that were loaded in order to better visualize and understand the problem. 6 Image information before preprocessing. Index (y, x, colors) (keypoints, coordinates) First 4 keypoints of the third image. Keypoints data was loaded from a CSV ﬁle.
7. ### Data Preprocessing Transforms were performed on the images to standardize

them. 1. Normalize(object) a. Converts a color image to grayscale with values between [0,1] b. Normalizes keypoints to be in a range of [-1,1] 2. Rescale(object) a. Rescales image to desired size (250px at the smallest side) 3. RandomCrop(object) a. Crops image in a square, randomly (224 x 224 px) 4. ToTensor(object) a. Converts numpy images to torch images 7 rescale = Rescale(100) crop = RandomCrop(50) composed = transforms.Compose([Rescale(250), RandomCrop(224)]) Example use:

9. ### Visualize keypoints before Training 9 green: True keypoints given by

CSV ﬁle pink: Predicted keypoints (before Training)

11. ### Train the Network! 11 Right around Epoch 16-17, the training

loss started to level oﬀ near or below 0.03. Perhaps 20 Epochs would have been enough training.
12. ### Visualize keypoints after Training 12 green: True keypoints given by

CSV ﬁle pink: Predicted keypoints (after Training)
13. ### Feature visualization 13 Extract a single ﬁlter (by index) from

the ﬁrst convolutional layer in order to visualize the weights that make up each convolutional kernel (size 5x5) Filter an image to see the eﬀect of a convolutional kernel and get an idea about what features it detects. This ﬁlter emphasizes the right side of the face most clearly. Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.

16. ### Add stickers 16 1. Load .png sticker ﬁle 2. Detect

alpha channel (transparency) 3. Display facial keypoints 4. Overlay sticker where pixels are non-transparent (alpha > 0)

CNN 5
18. ### Details on 4 iterations of the CNN 18 Attempt #1:

(Warm up, trial run) Optimizer: optim.SGD() - I learned it ﬁrst Loss function: BCEWithLogitsLoss() - mistake, nan Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 1 Batch_size: 10 Training loss: nan Attempt #3: (Good, could it be better?) Optimizer: optim.Adam() Loss function: SmoothL1Loss() Architecture: 5 convolutional layers + max pooling Batch normalization + 2 dropout layers 5 fully connected layers + normalization Epochs: 15 Batch_size: 32 Training loss: ~ 0.09 (before training canceled) Attempt #2: (Fix the loss function) Optimizer: optim.SGD() - I learned it ﬁrst Loss function: MSELoss() - not for classiﬁcations Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 5 Batch_size: 12 Training loss: 0.26 Attempt #4: (Final, successful) Optimizer: optim.Adam() Loss: SmoothL1Loss() Architecture: 3 convolutional layers + max pooling 4 fully connected layers 3 dropout layers Epochs: 30 Batch_size: 64 & 128 (too big, froze) Training loss: < 0.03
19. ### Di erent CNN Architectures In truth, CNN architectures are still

quite new to me. I understand the basic concepts, but not how best to optimize them, nor why some layers are used more o en and why some are used less o en. When comparing my chosen architecture, training, hyperparameters, and results to other examples, I found myself wondering why certain models work better than others, and what I could do to better optimize my own model. More research and experience is needed to better understand. 19 VGG-16 Architecture

21. ### Optimizing CNNs The goal of this project was to identify

faces in a given image and use a trained CNN model to predict 68 keypoints for each face. Through working on the project, I was able to learn the basics of building a CNN architecture and training it. However, this is a deep subject and this project only skimmed the surface. For future research, optimizing CNN architectures should be a primary focus, particularly determining optimal hyperparameters and layers. Additionally, I hope to explore transfer learning in more depth, where a pre-trained CNN model is used on other problems by adding a ﬁnal fully connected layer that is speciﬁc for the problem being investigated. 21
22. ### THANKS! The code, output, and Jupyter Notebooks used in this

project can be found here: https://github.com/jekkilekki/computer-vision/tree/ master/Facial%20Keypoints%20Detector 22