Computer Vision Project 1: Facial Keypoints Detection

Slide 1

Slide 1 text

Computer Vision Project 1: Facial Keypoints Detection NAME: Aaron Snowberger (정보통신공학과, 30211060) DATE: 2021.05.10 SUBJECT: Computer Vision (Dong-Geol Choi) CURRICULUM: Udacity Computer Vision | Udacity GitHub Project Repo | Certiﬁcate of Completion

Slide 2

Slide 2 text

CONTENTS 01 Problem Overview 02 Load & Visualize the Data a. First example images b. Data Preprocessing 03 Network Architecture & Training a. Visualize keypoints before Training b. Deﬁne the CNN c. Train the CNN! d. Visualize keypoints a er Training e. Feature visualization 04 Fun with Keypoints 05 Project Reﬂection 06 Conclusion & Future Research 2

Slide 3

Slide 3 text

Problem Overview To identify 68 keypoints on any given face image 1

Slide 4

Slide 4 text

Problem Overview In this project, I deﬁned a Convolutional Neural Network (CNN) architecture and trained a model to perform facial keypoint detection on numerous images. Each training and test image contained 68 keypoints with coordinates (x,y) for that face. The keypoints (pictured below) mark important areas of the face such as the jaw line, eyes, eyebrows, nose, and mouth. Total data set: 5770 color images (from the YouTube Faces Dataset) Training: 3462 images Testing: 2308 images 4

Slide 5

Slide 5 text

Load & Visualize the Data Understanding the data & preprocessing it 2

Slide 6

Slide 6 text

First Example Images The following are a few of the sample images that were loaded in order to better visualize and understand the problem. 6 Image information before preprocessing. Index (y, x, colors) (keypoints, coordinates) First 4 keypoints of the third image. Keypoints data was loaded from a CSV ﬁle.

Slide 7

Slide 7 text

Data Preprocessing Transforms were performed on the images to standardize them. 1. Normalize(object) a. Converts a color image to grayscale with values between [0,1] b. Normalizes keypoints to be in a range of [-1,1] 2. Rescale(object) a. Rescales image to desired size (250px at the smallest side) 3. RandomCrop(object) a. Crops image in a square, randomly (224 x 224 px) 4. ToTensor(object) a. Converts numpy images to torch images 7 rescale = Rescale(100) crop = RandomCrop(50) composed = transforms.Compose([Rescale(250), RandomCrop(224)]) Example use:

Slide 8

Slide 8 text

Network Architecture Convolutional Neural Network in PyTorch 3

Slide 9

Slide 9 text

Visualize keypoints before Training 9 green: True keypoints given by CSV ﬁle pink: Predicted keypoints (before Training)

Slide 10

Slide 10 text

Deﬁne CNN Architecture 10

Slide 11

Slide 11 text

Train the Network! 11 Right around Epoch 16-17, the training loss started to level oﬀ near or below 0.03. Perhaps 20 Epochs would have been enough training.

Slide 12

Slide 12 text

Visualize keypoints after Training 12 green: True keypoints given by CSV ﬁle pink: Predicted keypoints (after Training)

Slide 13

Slide 13 text

Feature visualization 13 Extract a single filter (by index) from the first convolutional layer in order to visualize the weights that make up each convolutional kernel (size 5x5) Filter an image to see the effect of a convolutional kernel and get an idea about what features it detects. This filter emphasizes the right side of the face most clearly. Horizontal lines in the eyes, mouth, and top of the head are drawn out in white.

Slide 14

Slide 14 text

Fun with Keypoints Detect faces, ﬁnd keypoints, add stickers 4

Slide 15

Slide 15 text

Detect all faces & ﬁnd keypoints 15 1. Detect faces with Haar Cascades. 2. Load in our Trained model. 3. Add padding to the faces. 4. Detect and display keypoints.

Slide 16

Slide 16 text

Add stickers 16 1. Load .png sticker ﬁle 2. Detect alpha channel (transparency) 3. Display facial keypoints 4. Overlay sticker where pixels are non-transparent (alpha > 0)

Slide 17

Slide 17 text

Project Reﬂection What I learned through 4 iterations of the CNN 5

Slide 18

Slide 18 text

Details on 4 iterations of the CNN 18 Attempt #1: (Warm up, trial run) Optimizer: optim.SGD() - I learned it first Loss function: BCEWithLogitsLoss() - mistake, nan Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 1 Batch_size: 10 Training loss: nan Attempt #3: (Good, could it be better?) Optimizer: optim.Adam() Loss function: SmoothL1Loss() Architecture: 5 convolutional layers + max pooling Batch normalization + 2 dropout layers 5 fully connected layers + normalization Epochs: 15 Batch_size: 32 Training loss: ~ 0.09 (before training canceled) Attempt #2: (Fix the loss function) Optimizer: optim.SGD() - I learned it first Loss function: MSELoss() - not for classifications Architecture: 2 convolutional layers + max pooling 3 fully connected layers 2 dropout layers Epochs: 5 Batch_size: 12 Training loss: 0.26 Attempt #4: (Final, successful) Optimizer: optim.Adam() Loss: SmoothL1Loss() Architecture: 3 convolutional layers + max pooling 4 fully connected layers 3 dropout layers Epochs: 30 Batch_size: 64 & 128 (too big, froze) Training loss: < 0.03

Slide 19

Slide 19 text

Di erent CNN Architectures In truth, CNN architectures are still quite new to me. I understand the basic concepts, but not how best to optimize them, nor why some layers are used more o en and why some are used less o en. When comparing my chosen architecture, training, hyperparameters, and results to other examples, I found myself wondering why certain models work better than others, and what I could do to better optimize my own model. More research and experience is needed to better understand. 19 VGG-16 Architecture

Slide 20

Slide 20 text

Conclusion & Future Research Deep(er) Learning 6

Slide 21

Slide 21 text

Optimizing CNNs The goal of this project was to identify faces in a given image and use a trained CNN model to predict 68 keypoints for each face. Through working on the project, I was able to learn the basics of building a CNN architecture and training it. However, this is a deep subject and this project only skimmed the surface. For future research, optimizing CNN architectures should be a primary focus, particularly determining optimal hyperparameters and layers. Additionally, I hope to explore transfer learning in more depth, where a pre-trained CNN model is used on other problems by adding a ﬁnal fully connected layer that is speciﬁc for the problem being investigated. 21

Slide 22

Slide 22 text

THANKS! The code, output, and Jupyter Notebooks used in this project can be found here: https://github.com/jekkilekki/computer-vision/tree/ master/Facial%20Keypoints%20Detector 22