Introduction to Face Processing with Computer Vision

E7d6e390a90513756419be75a43609ca?s=47 finid
June 28, 2019

Introduction to Face Processing with Computer Vision

Ever wonder how Facebook’s facial recognition or Snapchat’s filters work? Faces are a fundamental piece of photography, and building applications around them has never been easier with open-source libraries and pre-trained models.

In this talk, we’ll help you understand some of the computer vision and machine learning techniques behind these applications. Then, we’ll use this knowledge to develop our own prototypes to tackle tasks such as face detection (e.g. digital cameras), recognition (e.g. Facebook Photos), classification (e.g. identifying emotions), manipulation (e.g. Snapchat filters), and more.

E7d6e390a90513756419be75a43609ca?s=128

finid

June 28, 2019
Tweet

Transcript

  1. None
  2. Big Data & AI Conference Dallas, Texas June 27 –

    29, 2019 www.BigDataAIconference.com
  3. Introduction to Face Processing with Computer Vision Gabriel Bianconi Founder,

    Scalar Research
  4. Gabriel Bianconi Founder, Scalar Research AI & Data Science Consulting

    Firm Previously at the Stanford AI Lab
  5. 5 Face Detection

  6. Haar-Like Features • Summarize image based on simple color patterns

    • Manually determined feature extractors (kernels) • Leveraged for frst real-time face detector (2001) 6 Ref: Viola & Jones (2001). Image: Wikimedia
  7. 7

  8. 8

  9. Histogram of Oriented Gradients (HOG) • Summarize image by distribution

    of color gradients • Gradient intensities and orientations represent edges, etc. • Captures more information than simple Haar-like features 9 Ref: Shu et al. (2011).
  10. 10 Ref: Shu et al. (2011)

  11. 11 Ref: Shu et al. (2011)

  12. 12 Ref: Rojas et al. (2011)

  13. 13 Ref: Rojas et al. (2011)

  14. R-CNN • Introduces CNNs for object detection • CNNs learn

    how to extract features from data • Breakthrough in performance • Beats previous SOTA methods by huge margin • However, detection is extremely slow 14 Ref: Girshick et al. (2014).
  15. CNN Features 15 Ref: Lee et al. (2009).

  16. CNN Features 16 Ref: Lee et al. (2009).

  17. CNN Features 17 Ref: Lee et al. (2009).

  18. CNN Features 18 Ref: Lee et al. (2009).

  19. R-CNN 19 Ref: Girshick et al. (2014).

  20. Fast R-CNN • Improvement to R-CNN that leverages CNN for

    classifcation and regression • Other than proposing regions, system is now end-to-end vs. three components trained greedily. • Predictions are 200x+ faster with better performance • Region proposals still are a bottleneck; total inference time is ~2s. 20 Ref: Girshick (2015).
  21. Fast R-CNN 21 Ref: Girshick (2015).

  22. Faster R-CNN • Leverages CNN for region proposals as well

    • “Region Proposal Network” • Finally an end-to-end system with deep learning • About 10x faster than Fast R-CNN, with better performance • Total inference time is ~0.2s 22 Ref: Ren et al. (2016).
  23. Faster R-CNN 23 Ref: Ren et al. (2016).

  24. MTCNN • Many model for face detection draw heavily from

    the generalized object detection methods. • MTCNN, for example, trains a multi-task system for detection and alignment. 24 Ref: Zhang et al. (2015).
  25. MTCNN 25 Ref: Zhang et al. (2015).

  26. DSFD • The current SOTA method draws heavily from modern

    single-shot detection architectures. • DSFD extends to a dual-shot detector with enhanced features and loss functions. 26 Ref: Li et al. (2018).
  27. DSFD 27 Ref: Li et al. (2018).

  28. Are we there yet? 28 Ref: Yang (2016). WIDER Face

    (Easy) ~96% AP WIDER Face (Medium) ~95% AP WIDER Face (Hard) ~90% AP
  29. 29 Facial Recognition

  30. Facial Recognition • Facial recognition actually corresponds to group of

    diferent tasks. • Verifcation vs. Identifcation vs. Grouping vs. … • Closed-Set vs. Open-Set 30
  31. Closed-Set Recognition • Every identity appears in training set •

    Example: recognizing celebrities • Efectively a classifcation problem • Model aims to learn separable features 31
  32. Closed-Set Identifcation 32 Mod el Label Confdences Test Sample Label

    0 Label 1 … … … Images: Wikimedia
  33. Closed-Set Verifcation 33 Mod el Test Sample A Test Sample

    B Label Confdences Label Confdences Images: Wikimedia
  34. Open-Set Recognition • Not every identity appears in training set

    • Example: Facebook Photos • Efectively a metric learning problem • Model aims to learn large-margin features (embeddings) 34
  35. Embeddings • Map each sample to a vector (coordinate system)

    • Used for words, graphs, faces, etc. • Embeddings preserve similarity • Similar samples close to each other • Dissimilar samples far from each other 35
  36. 36 Images: Wikimedia

  37. Embeddings • “Similar” depends on the training data • Same

    person, physical characteristic, etc. • Embeddings represent latent information • High-dimensional embeddings trained on large datasets learn to represent latent information about the person (e.g. physical characteristics) 37
  38. Open-Set Identifcation 38 Mod el Embedding + Distance Test Sample

    Emb. 0 Emb. 1 Emb. 2 … Images: Wikimedia
  39. Open-Set Verifcation 39 Mod el Test Sample A Test Sample

    B Embedding A Embedding B Distance vs. Threshold Images: Wikimedia
  40. 40 Images: WikiMedia Example - 0.31 0.59 0.69 0.31 -

    0.52 0.63 0.59 0.52 - 0.50 0.69 0.63 0.50 -
  41. Metric Learning 41 Ref: Liu et al. (2018)

  42. Are we there yet? 42 Ref: Deng et al. (2018);

    Learned-Miller et al. (2016) LFW (Labeled Faces in the Wild) 99.8%+ accuracy
  43. 43 Thank you. gabriel@scalarresearch.com