Computer Vision Project 1: Facial Keypoints Detection

Computer Vision Project 1: Facial Keypoints Detection NAME: Aaron Snowberger
(정보통신공학과, 30211060) DATE: 2021.05.10 SUBJECT: Computer Vision (Dong-Geol Choi) CURRICULUM: Udacity Computer Vision | Udacity GitHub Project Repo | Certiﬁcate of Completion

CONTENTS 01 Problem Overview 02 Load & Visualize the Data
a. First example images b. Data Preprocessing 03 Network Architecture & Training a. Visualize keypoints before Training b. Deﬁne the CNN c. Train the CNN! d. Visualize keypoints a er Training e. Feature visualization 04 Fun with Keypoints 05 Project Reﬂection 06 Conclusion & Future Research 2

Problem Overview To identify 68 keypoints on any given face
image 1

Problem Overview In this project, I deﬁned a Convolutional Neural
Network (CNN) architecture and trained a model to perform facial keypoint detection on numerous images. Each training and test image contained 68 keypoints with coordinates (x,y) for that face. The keypoints (pictured below) mark important areas of the face such as the jaw line, eyes, eyebrows, nose, and mouth. Total data set: 5770 color images (from the YouTube Faces Dataset) Training: 3462 images Testing: 2308 images 4

Load & Visualize the Data Understanding the data & preprocessing
it 2

First Example Images The following are a few of the
sample images that were loaded in order to better visualize and understand the problem. 6 Image information before preprocessing. Index (y, x, colors) (keypoints, coordinates) First 4 keypoints of the third image. Keypoints data was loaded from a CSV ﬁle.

Data Preprocessing Transforms were performed on the images to standardize
them. 1. Normalize(object) a. Converts a color image to grayscale with values between [0,1] b. Normalizes keypoints to be in a range of [-1,1] 2. Rescale(object) a. Rescales image to desired size (250px at the smallest side) 3. RandomCrop(object) a. Crops image in a square, randomly (224 x 224 px) 4. ToTensor(object) a. Converts numpy images to torch images 7 rescale = Rescale(100) crop = RandomCrop(50) composed = transforms.Compose([Rescale(250), RandomCrop(224)]) Example use:

Network Architecture Convolutional Neural Network in PyTorch 3

Visualize keypoints before Training 9 green: True keypoints given by
CSV ﬁle pink: Predicted keypoints (before Training)

Deﬁne CNN Architecture 10

Train the Network! 11 Right around Epoch 16-17, the training
loss started to level oﬀ near or below 0.03. Perhaps 20 Epochs would have been enough training.

Visualize keypoints after Training 12 green: True keypoints given by
CSV ﬁle pink: Predicted keypoints (after Training)

Feature visualization 13 Extract a single filter (by index) from
the first convolutional layer in order to visualize the weights that make up each convolutional kernel (size 5x5) Filter an image to see the effect of a convolutional kernel and get an idea about what features it detects. This filter emphasizes the right side of the face most clearly. Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.

Fun with Keypoints Detect faces, ﬁnd keypoints, add stickers 4

Detect all faces & ﬁnd keypoints 15 1. Detect faces
with Haar Cascades. 2. Load in our Trained model. 3. Add padding to the faces. 4. Detect and display keypoints.

Add stickers 16 1. Load .png sticker ﬁle 2. Detect
alpha channel (transparency) 3. Display facial keypoints 4. Overlay sticker where pixels are non-transparent (alpha > 0)

Project Reﬂection What I learned through 4 iterations of the
CNN 5

Details on 4 iterations of the CNN 18 Attempt #1:
(Warm up, trial run) Optimizer: optim.SGD() - I learned it first Loss function: BCEWithLogitsLoss() - mistake, nan Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 1 Batch_size: 10 Training loss: nan Attempt #3: (Good, could it be better?) Optimizer: optim.Adam() Loss function: SmoothL1Loss() Architecture: 5 convolutional layers + max pooling Batch normalization + 2 dropout layers 5 fully connected layers + normalization Epochs: 15 Batch_size: 32 Training loss: ~ 0.09 (before training canceled) Attempt #2: (Fix the loss function) Optimizer: optim.SGD() - I learned it first Loss function: MSELoss() - not for classifications Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 5 Batch_size: 12 Training loss: 0.26 Attempt #4: (Final, successful) Optimizer: optim.Adam() Loss: SmoothL1Loss() Architecture: 3 convolutional layers + max pooling 4 fully connected layers 3 dropout layers Epochs: 30 Batch_size: 64 & 128 (too big, froze) Training loss: < 0.03

Di erent CNN Architectures In truth, CNN architectures are still
quite new to me. I understand the basic concepts, but not how best to optimize them, nor why some layers are used more o en and why some are used less o en. When comparing my chosen architecture, training, hyperparameters, and results to other examples, I found myself wondering why certain models work better than others, and what I could do to better optimize my own model. More research and experience is needed to better understand. 19 VGG-16 Architecture

Conclusion & Future Research Deep(er) Learning 6

Optimizing CNNs The goal of this project was to identify
faces in a given image and use a trained CNN model to predict 68 keypoints for each face. Through working on the project, I was able to learn the basics of building a CNN architecture and training it. However, this is a deep subject and this project only skimmed the surface. For future research, optimizing CNN architectures should be a primary focus, particularly determining optimal hyperparameters and layers. Additionally, I hope to explore transfer learning in more depth, where a pre-trained CNN model is used on other problems by adding a ﬁnal fully connected layer that is speciﬁc for the problem being investigated. 21

THANKS! The code, output, and Jupyter Notebooks used in this
project can be found here: https://github.com/jekkilekki/computer-vision/tree/ master/Facial%20Keypoints%20Detector 22

Computer Vision Project 1: Facial Keypoints Det...

Computer Vision Project 1: Facial Keypoints Detection

Aaron Snowberger

More Decks by Aaron Snowberger

Other Decks in Technology

Featured

Transcript