Aaron Snowberger
May 10, 2021
36

Computer Vision Project 1: Facial Keypoints Detection

As a part of Udacity's Computer Vision Nanodegree, I defined a Convolutional Neural Network in PyTorch and trained a model to be able to identify 68 facial keypoints on a given image of a face.

This PPT was submitted as a part of my PhD coursework to detail my project process and reflection.

May 10, 2021

Transcript

1. Computer Vision Project 1:
Facial Keypoints Detection
NAME: Aaron Snowberger (정보통신공학과, 30211060)
DATE: 2021.05.10
SUBJECT: Computer Vision (Dong-Geol Choi)
CURRICULUM: Udacity Computer Vision | Udacity GitHub
Project Repo | Certiﬁcate of Completion

2. CONTENTS
01 Problem Overview
02 Load & Visualize the Data
a. First example images
b. Data Preprocessing
03 Network Architecture & Training
a. Visualize keypoints before Training
b. Deﬁne the CNN
c. Train the CNN!
d. Visualize keypoints a er Training
e. Feature visualization
04 Fun with Keypoints
05 Project Reﬂection
06 Conclusion & Future Research
2

3. Problem Overview
To identify 68 keypoints on any given face image
1

4. Problem Overview
In this project, I deﬁned a Convolutional Neural Network (CNN)
architecture and trained a model to perform facial keypoint
detection on numerous images. Each training and test image
contained 68 keypoints with coordinates (x,y) for that face. The
keypoints (pictured below) mark important areas of the face such as
the jaw line, eyes, eyebrows, nose, and mouth.
Total data set: 5770 color images
Training: 3462 images
Testing: 2308 images
4

5. Load & Visualize the Data
Understanding the data & preprocessing it
2

6. First Example Images
The following are a few of the sample images that were loaded in
order to better visualize and understand the problem.
6
Image information
before preprocessing.
Index (y, x, colors) (keypoints, coordinates)
First 4 keypoints of
the third image.
Keypoints data was

7. Data Preprocessing
Transforms were performed on the images to standardize them.
1. Normalize(object)
a. Converts a color image to grayscale with values between [0,1]
b. Normalizes keypoints to be in a range of [-1,1]
2. Rescale(object)
a. Rescales image to desired size (250px at the smallest side)
3. RandomCrop(object)
a. Crops image in a square, randomly (224 x 224 px)
4. ToTensor(object)
a. Converts numpy images to torch images
7
rescale = Rescale(100)
crop = RandomCrop(50)
composed = transforms.Compose([Rescale(250), RandomCrop(224)])
Example use:

8. Network Architecture
Convolutional Neural Network in PyTorch
3

9. Visualize keypoints before Training
9
green: True keypoints given by CSV ﬁle
pink: Predicted keypoints (before Training)

10. Deﬁne CNN Architecture
10

11. Train the Network!
11
Right around Epoch 16-17, the training loss started to level oﬀ near or below 0.03.
Perhaps 20 Epochs would have been enough training.

12. Visualize keypoints after Training
12
green: True keypoints given by CSV ﬁle
pink: Predicted keypoints (after Training)

13. Feature visualization
13
Extract a single ﬁlter (by index) from the ﬁrst
convolutional layer in order to visualize the weights
that make up each convolutional kernel (size 5x5)
Filter an image to see the eﬀect of a convolutional
kernel and get an idea about what features it detects.
This ﬁlter emphasizes the right side of the face most clearly.
Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.

14. Fun with Keypoints
Detect faces, ﬁnd keypoints, add stickers
4

15. Detect all faces & ﬁnd keypoints
15
1. Detect faces with Haar Cascades.
2. Load in our Trained model.
4. Detect and display keypoints.

16
2. Detect alpha channel
(transparency)
3. Display facial keypoints
4. Overlay sticker where
pixels are non-transparent
(alpha > 0)

17. Project Reﬂection
What I learned through 4 iterations of the CNN
5

18. Details on 4 iterations of the CNN
18
Attempt #1: (Warm up, trial run)
Optimizer: optim.SGD() - I learned it ﬁrst
Loss function: BCEWithLogitsLoss() - mistake, nan
Architecture: 2 convolutional layers + max pooling
3 fully connected layers
2 dropout layers
Epochs: 1
Batch_size: 10
Training loss: nan
Attempt #3: (Good, could it be better?)
Loss function: SmoothL1Loss()
Architecture: 5 convolutional layers + max pooling
Batch normalization + 2 dropout layers
5 fully connected layers + normalization
Epochs: 15
Batch_size: 32
Training loss: ~ 0.09 (before training canceled)
Attempt #2: (Fix the loss function)
Optimizer: optim.SGD() - I learned it ﬁrst
Loss function: MSELoss() - not for classiﬁcations
Architecture: 2 convolutional layers + max pooling
3 fully connected layers
2 dropout layers
Epochs: 5
Batch_size: 12
Training loss: 0.26
Attempt #4: (Final, successful)
Loss: SmoothL1Loss()
Architecture: 3 convolutional layers + max pooling
4 fully connected layers
3 dropout layers
Epochs: 30
Batch_size: 64 & 128 (too big, froze)
Training loss: < 0.03

19. Di erent CNN Architectures
In truth, CNN architectures are still quite new to me.
I understand the basic concepts, but not how best to optimize them, nor why some layers
are used more o en and why some are used less o en. When comparing my chosen
architecture, training, hyperparameters, and results to other examples, I found myself
wondering why certain models work better than others, and what I could do to better
optimize my own model. More research and experience is needed to better understand.
19
VGG-16 Architecture

20. Conclusion & Future Research
Deep(er) Learning
6

21. Optimizing CNNs
The goal of this project was to identify faces in
a given image and use a trained CNN model to
predict 68 keypoints for each face. Through
working on the project, I was able to learn the
basics of building a CNN architecture and
training it. However, this is a deep subject and
this project only skimmed the surface.
For future research, optimizing CNN
architectures should be a primary focus,
particularly determining optimal
hyperparameters and layers.
Additionally, I hope to explore transfer learning
in more depth, where a pre-trained CNN model
is used on other problems by adding a ﬁnal fully
connected layer that is speciﬁc for the problem
being investigated.
21

22. THANKS!
The code, output, and Jupyter Notebooks
used in this project can be found here:
https://github.com/jekkilekki/computer-vision/tree/
master/Facial%20Keypoints%20Detector
22