Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Face Processing with Computer V...

finid
June 28, 2019

Introduction to Face Processing with Computer Vision

Ever wonder how Facebook’s facial recognition or Snapchat’s filters work? Faces are a fundamental piece of photography, and building applications around them has never been easier with open-source libraries and pre-trained models.

In this talk, we’ll help you understand some of the computer vision and machine learning techniques behind these applications. Then, we’ll use this knowledge to develop our own prototypes to tackle tasks such as face detection (e.g. digital cameras), recognition (e.g. Facebook Photos), classification (e.g. identifying emotions), manipulation (e.g. Snapchat filters), and more.

finid

June 28, 2019
Tweet

More Decks by finid

Other Decks in Technology

Transcript

  1. Big Data & AI Conference Dallas, Texas June 27 –

    29, 2019 www.BigDataAIconference.com
  2. Haar-Like Features • Summarize image based on simple color patterns

    • Manually determined feature extractors (kernels) • Leveraged for frst real-time face detector (2001) 6 Ref: Viola & Jones (2001). Image: Wikimedia
  3. 7

  4. 8

  5. Histogram of Oriented Gradients (HOG) • Summarize image by distribution

    of color gradients • Gradient intensities and orientations represent edges, etc. • Captures more information than simple Haar-like features 9 Ref: Shu et al. (2011).
  6. R-CNN • Introduces CNNs for object detection • CNNs learn

    how to extract features from data • Breakthrough in performance • Beats previous SOTA methods by huge margin • However, detection is extremely slow 14 Ref: Girshick et al. (2014).
  7. Fast R-CNN • Improvement to R-CNN that leverages CNN for

    classifcation and regression • Other than proposing regions, system is now end-to-end vs. three components trained greedily. • Predictions are 200x+ faster with better performance • Region proposals still are a bottleneck; total inference time is ~2s. 20 Ref: Girshick (2015).
  8. Faster R-CNN • Leverages CNN for region proposals as well

    • “Region Proposal Network” • Finally an end-to-end system with deep learning • About 10x faster than Fast R-CNN, with better performance • Total inference time is ~0.2s 22 Ref: Ren et al. (2016).
  9. MTCNN • Many model for face detection draw heavily from

    the generalized object detection methods. • MTCNN, for example, trains a multi-task system for detection and alignment. 24 Ref: Zhang et al. (2015).
  10. DSFD • The current SOTA method draws heavily from modern

    single-shot detection architectures. • DSFD extends to a dual-shot detector with enhanced features and loss functions. 26 Ref: Li et al. (2018).
  11. Are we there yet? 28 Ref: Yang (2016). WIDER Face

    (Easy) ~96% AP WIDER Face (Medium) ~95% AP WIDER Face (Hard) ~90% AP
  12. Facial Recognition • Facial recognition actually corresponds to group of

    diferent tasks. • Verifcation vs. Identifcation vs. Grouping vs. … • Closed-Set vs. Open-Set 30
  13. Closed-Set Recognition • Every identity appears in training set •

    Example: recognizing celebrities • Efectively a classifcation problem • Model aims to learn separable features 31
  14. Closed-Set Verifcation 33 Mod el Test Sample A Test Sample

    B Label Confdences Label Confdences Images: Wikimedia
  15. Open-Set Recognition • Not every identity appears in training set

    • Example: Facebook Photos • Efectively a metric learning problem • Model aims to learn large-margin features (embeddings) 34
  16. Embeddings • Map each sample to a vector (coordinate system)

    • Used for words, graphs, faces, etc. • Embeddings preserve similarity • Similar samples close to each other • Dissimilar samples far from each other 35
  17. Embeddings • “Similar” depends on the training data • Same

    person, physical characteristic, etc. • Embeddings represent latent information • High-dimensional embeddings trained on large datasets learn to represent latent information about the person (e.g. physical characteristics) 37
  18. Open-Set Identifcation 38 Mod el Embedding + Distance Test Sample

    Emb. 0 Emb. 1 Emb. 2 … Images: Wikimedia
  19. Open-Set Verifcation 39 Mod el Test Sample A Test Sample

    B Embedding A Embedding B Distance vs. Threshold Images: Wikimedia
  20. 40 Images: WikiMedia Example - 0.31 0.59 0.69 0.31 -

    0.52 0.63 0.59 0.52 - 0.50 0.69 0.63 0.50 -
  21. Are we there yet? 42 Ref: Deng et al. (2018);

    Learned-Miller et al. (2016) LFW (Labeled Faces in the Wild) 99.8%+ accuracy