Upgrade to Pro — share decks privately, control downloads, hide ads and more …

3D-CNN for Facial Emotion Recognition in Videos

3D-CNN for Facial Emotion Recognition in Videos

International Symposium on Visual Computing (ISVC)

Olivier Lézoray

October 06, 2020
Tweet

More Decks by Olivier Lézoray

Other Decks in Research

Transcript

  1. 3D-CNN for Facial Emotion Recognition in Videos Presented by: Jad

    Haddad Authors: Jad Haddad Oliver LEZORAY Philippe Hamel ISVC'20: 15th International Symposium on Visual Computing
  2. Introduction 2/11 3D-CNN for Facial Emotion Recognition in Videos Objectives

    : • Develop a neural network architecture that can be applied on videos, therefore, the temporal aspect of the emotion is important • Our model should be light in terms of computational power Our approach: • A 3D Convolutional Neural Network
  3. 3D Convolutional Networks 3/11 3D-CNN for Facial Emotion Recognition in

    Videos • 2D convolution outputs 2D output • 3D convolution outputs 3D output • Temporal dimension is taken into account in feature extraction
  4. Full Architecture to Optimize 4/11 3D-CNN for Facial Emotion Recognition

    in Videos Parameter Options Optimization Lookahead+Adam/Adam CLAHE True/False Normalization True/False Activation ReLU/ELU/pReLU/Leaky ReLU/Mish Loss weights True/False Temporal size 1/3/5/7/9 Initializer Xavier uniform/Xavier normal Pooling layer AvgPooling/MaxPooling Second ConvLayer True/False Dropout [0, 1] • Red blocks are to be optimized
  5. Hyper-parameters Optimization 5/11 3D-CNN for Facial Emotion Recognition in Videos

    Tree-structured Parzen Estimator (TPE): • fits for each parameter one Gaussian Mixture Model (GMM) l(x) to the set of parameter values associated with the best objective values • and another GMM g(x) to the remaining parameter values. • Then it chooses the parameter value x that maximizes the ratio l(x)/g(x).
  6. Datasets 6/11 3D-CNN for Facial Emotion Recognition in Videos CK+:

    • 593 video sequences from 123 subjects • anger, contempt, disgust, fear, happiness, sadness, and surprise • Leave-One-Subject-Out (LOSO) and 10-fold cross- validation Oulu-CASIA: • 480 video sequences from 80 people between 23 to 58 years old • anger, disgust, fear, happiness, sadness, and surprise • 10-fold cross-validation
  7. Optimized Architectures 7/11 3D-CNN for Facial Emotion Recognition in Videos

    Optimized architecture for CK+ Optimized architecture for Oulu-CAISA
  8. Optimized Hyper-parameters 8/11 3D-CNN for Facial Emotion Recognition in Videos

    Parameter CK+ Oulu-CASIA Optimization Adam Lookahead + Adam CLAHE True False Normalization False True Activation ReLU Leaky ReLU Loss weights False False Temporal size 3 3 Initializer Xavier uniform None Pooling layer AvgPooling MaxPooling Second ConvLayer False False Dropout 0.2511 0.4233 • CK+: • Leave-One-Subject-Out: 97.56% • 10-fold: 100% • Oulu-CASIA: • 10-fold: 84.17% Cross-validation results
  9. Results Comparison 9/11 3D-CNN for Facial Emotion Recognition in Videos

    Approach Accuracy(%) FLT 74.17 C3D 74.38 FLT+C3D 81.49 Our approach 84.17 STC 84.72 LSTM (STC-NLSTM) 93.45 10-fold results for Oulu-CASIA Approach Accuracy(%) LBP/Gabor + SRC 98.09 DBN + MLP 98.57 CNN 98.62 FAN 99.69 Our approach 100 10-fold results for CK+ Approach Accuracy(%) CNN (AlexNet) 94.4 DAE (DSAE) 95.79 Our approach 97.56 LOSO results for CK+
  10. Future Objectives 10/11 3D-CNN for Facial Emotion Recognition in Videos

    Audio Video F U S I O N Emotion predicted Multi-modal emotion recognition