Upgrade to Pro — share decks privately, control downloads, hide ads and more …

3D-CNN for Facial Emotion Recognition in Videos

3D-CNN for Facial Emotion Recognition in Videos

International Symposium on Visual Computing (ISVC)

1509d0ae6901a6cf2c2fd7cc97f02fc0?s=128

Olivier Lézoray

October 06, 2020
Tweet

Transcript

  1. 3D-CNN for Facial Emotion Recognition in Videos Presented by: Jad

    Haddad Authors: Jad Haddad Oliver LEZORAY Philippe Hamel ISVC'20: 15th International Symposium on Visual Computing
  2. Introduction 2/11 3D-CNN for Facial Emotion Recognition in Videos Objectives

    : • Develop a neural network architecture that can be applied on videos, therefore, the temporal aspect of the emotion is important • Our model should be light in terms of computational power Our approach: • A 3D Convolutional Neural Network
  3. 3D Convolutional Networks 3/11 3D-CNN for Facial Emotion Recognition in

    Videos • 2D convolution outputs 2D output • 3D convolution outputs 3D output • Temporal dimension is taken into account in feature extraction
  4. Full Architecture to Optimize 4/11 3D-CNN for Facial Emotion Recognition

    in Videos Parameter Options Optimization Lookahead+Adam/Adam CLAHE True/False Normalization True/False Activation ReLU/ELU/pReLU/Leaky ReLU/Mish Loss weights True/False Temporal size 1/3/5/7/9 Initializer Xavier uniform/Xavier normal Pooling layer AvgPooling/MaxPooling Second ConvLayer True/False Dropout [0, 1] • Red blocks are to be optimized
  5. Hyper-parameters Optimization 5/11 3D-CNN for Facial Emotion Recognition in Videos

    Tree-structured Parzen Estimator (TPE): • fits for each parameter one Gaussian Mixture Model (GMM) l(x) to the set of parameter values associated with the best objective values • and another GMM g(x) to the remaining parameter values. • Then it chooses the parameter value x that maximizes the ratio l(x)/g(x).
  6. Datasets 6/11 3D-CNN for Facial Emotion Recognition in Videos CK+:

    • 593 video sequences from 123 subjects • anger, contempt, disgust, fear, happiness, sadness, and surprise • Leave-One-Subject-Out (LOSO) and 10-fold cross- validation Oulu-CASIA: • 480 video sequences from 80 people between 23 to 58 years old • anger, disgust, fear, happiness, sadness, and surprise • 10-fold cross-validation
  7. Optimized Architectures 7/11 3D-CNN for Facial Emotion Recognition in Videos

    Optimized architecture for CK+ Optimized architecture for Oulu-CAISA
  8. Optimized Hyper-parameters 8/11 3D-CNN for Facial Emotion Recognition in Videos

    Parameter CK+ Oulu-CASIA Optimization Adam Lookahead + Adam CLAHE True False Normalization False True Activation ReLU Leaky ReLU Loss weights False False Temporal size 3 3 Initializer Xavier uniform None Pooling layer AvgPooling MaxPooling Second ConvLayer False False Dropout 0.2511 0.4233 • CK+: • Leave-One-Subject-Out: 97.56% • 10-fold: 100% • Oulu-CASIA: • 10-fold: 84.17% Cross-validation results
  9. Results Comparison 9/11 3D-CNN for Facial Emotion Recognition in Videos

    Approach Accuracy(%) FLT 74.17 C3D 74.38 FLT+C3D 81.49 Our approach 84.17 STC 84.72 LSTM (STC-NLSTM) 93.45 10-fold results for Oulu-CASIA Approach Accuracy(%) LBP/Gabor + SRC 98.09 DBN + MLP 98.57 CNN 98.62 FAN 99.69 Our approach 100 10-fold results for CK+ Approach Accuracy(%) CNN (AlexNet) 94.4 DAE (DSAE) 95.79 Our approach 97.56 LOSO results for CK+
  10. Future Objectives 10/11 3D-CNN for Facial Emotion Recognition in Videos

    Audio Video F U S I O N Emotion predicted Multi-modal emotion recognition
  11. Thank You for your attention