Introduction to Face Processing with Computer Vision

Big Data & AI Conference Dallas, Texas June 27 –
29, 2019 www.BigDataAIconference.com

Introduction to Face Processing with Computer Vision Gabriel Bianconi Founder,
Scalar Research

Gabriel Bianconi Founder, Scalar Research AI & Data Science Consulting
Firm Previously at the Stanford AI Lab

5 Face Detection

Haar-Like Features • Summarize image based on simple color patterns
• Manually determined feature extractors (kernels) • Leveraged for frst real-time face detector (2001) 6 Ref: Viola & Jones (2001). Image: Wikimedia

Histogram of Oriented Gradients (HOG) • Summarize image by distribution
of color gradients • Gradient intensities and orientations represent edges, etc. • Captures more information than simple Haar-like features 9 Ref: Shu et al. (2011).

10 Ref: Shu et al. (2011)

11 Ref: Shu et al. (2011)

12 Ref: Rojas et al. (2011)

13 Ref: Rojas et al. (2011)

R-CNN • Introduces CNNs for object detection • CNNs learn
how to extract features from data • Breakthrough in performance • Beats previous SOTA methods by huge margin • However, detection is extremely slow 14 Ref: Girshick et al. (2014).

CNN Features 15 Ref: Lee et al. (2009).

R-CNN 19 Ref: Girshick et al. (2014).

Fast R-CNN • Improvement to R-CNN that leverages CNN for
classifcation and regression • Other than proposing regions, system is now end-to-end vs. three components trained greedily. • Predictions are 200x+ faster with better performance • Region proposals still are a bottleneck; total inference time is ~2s. 20 Ref: Girshick (2015).

Fast R-CNN 21 Ref: Girshick (2015).

Faster R-CNN • Leverages CNN for region proposals as well
• “Region Proposal Network” • Finally an end-to-end system with deep learning • About 10x faster than Fast R-CNN, with better performance • Total inference time is ~0.2s 22 Ref: Ren et al. (2016).

Faster R-CNN 23 Ref: Ren et al. (2016).

MTCNN • Many model for face detection draw heavily from
the generalized object detection methods. • MTCNN, for example, trains a multi-task system for detection and alignment. 24 Ref: Zhang et al. (2015).

MTCNN 25 Ref: Zhang et al. (2015).

DSFD • The current SOTA method draws heavily from modern
single-shot detection architectures. • DSFD extends to a dual-shot detector with enhanced features and loss functions. 26 Ref: Li et al. (2018).

DSFD 27 Ref: Li et al. (2018).

Are we there yet? 28 Ref: Yang (2016). WIDER Face
(Easy) ~96% AP WIDER Face (Medium) ~95% AP WIDER Face (Hard) ~90% AP

29 Facial Recognition

Facial Recognition • Facial recognition actually corresponds to group of
diferent tasks. • Verifcation vs. Identifcation vs. Grouping vs. … • Closed-Set vs. Open-Set 30

Closed-Set Recognition • Every identity appears in training set •
Example: recognizing celebrities • Efectively a classifcation problem • Model aims to learn separable features 31

Closed-Set Identifcation 32 Mod el Label Confdences Test Sample Label
0 Label 1 … … … Images: Wikimedia

Closed-Set Verifcation 33 Mod el Test Sample A Test Sample
B Label Confdences Label Confdences Images: Wikimedia

Open-Set Recognition • Not every identity appears in training set
• Example: Facebook Photos • Efectively a metric learning problem • Model aims to learn large-margin features (embeddings) 34

Embeddings • Map each sample to a vector (coordinate system)
• Used for words, graphs, faces, etc. • Embeddings preserve similarity • Similar samples close to each other • Dissimilar samples far from each other 35

36 Images: Wikimedia

Embeddings • “Similar” depends on the training data • Same
person, physical characteristic, etc. • Embeddings represent latent information • High-dimensional embeddings trained on large datasets learn to represent latent information about the person (e.g. physical characteristics) 37

Open-Set Identifcation 38 Mod el Embedding + Distance Test Sample
Emb. 0 Emb. 1 Emb. 2 … Images: Wikimedia

Open-Set Verifcation 39 Mod el Test Sample A Test Sample
B Embedding A Embedding B Distance vs. Threshold Images: Wikimedia

40 Images: WikiMedia Example - 0.31 0.59 0.69 0.31 -
0.52 0.63 0.59 0.52 - 0.50 0.69 0.63 0.50 -

Metric Learning 41 Ref: Liu et al. (2018)

Are we there yet? 42 Ref: Deng et al. (2018);
Learned-Miller et al. (2016) LFW (Labeled Faces in the Wild) 99.8%+ accuracy

43 Thank you. [email protected]

Introduction to Face Processing with Computer V...

Introduction to Face Processing with Computer Vision

More Decks by finid

Other Decks in Technology

Featured

Transcript