Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

phalanx
October 08, 2020
470

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

phalanx

October 08, 2020
Tweet

Transcript

  1. 2020/10/10 @phalanx
    journal part
    Gaussian Vector: An Efficient Solution for
    Facial Landmark Detection

    View Slide

  2. Journal Information
    Title: Gaussian Vector: An Efficient Solution for Facial Landmark Detection
    author: Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su (Horizon Robotics)
    arxiv: https://arxiv.org/abs/2010.01318
    submit date: 2020/10/03

    View Slide

  3. - Facial landmark detection is a fundamentally step in many face applications
    - face recognition, face tracking, face editing
    - Challenge
    - blur, overlap, occlusion, illumination
    - large head pose variation
    Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation, in Proc. of ICCV, 2019
    Facial Landmark Detection

    View Slide

  4. Coordinate regression method
    - Fully connected layers predict facial landmark coordinates
    - Performance degradation due to spatial information loss
    Facial landmark detection by deep multi task learning, In European conference on computer vision, Springer (2014) 90-108

    View Slide

  5. - Predict heatmap where each pixel predicts the probability that it is landmark
    - By utilizing spatial information, it boost the performance compared with
    coordinate regression method
    Heatmap based method

    View Slide

  6. Heatmap based method
    - Stacked Hourglass
    - supervised transformation to remove translation
    - stack multiple hourglass modules to extract multi-scale discriminative feature
    Jing Yang, Oingshan Liu, Kaihua Zhang, “Stacked Hourglass Network for Robust Facial Landmark Localisation”, in Proc. of CVPR, 2017

    View Slide

  7. - Wing Loss
    - pay more attention to small and medium range errors
    - switching from L1 loss to modified logarithm function
    Zhen-Hua Feng , et al, “Wing Loss for Robust Facial Landmark Localization with Convolutional Neural Networks”, in Proc. of CVPR, 2018
    Heatmap based method

    View Slide

  8. - Adaptive Wing Loss
    - adapt loss curvature to ground truth pixel values
    - it can focus on foreground and hard background pixels
    Xinyao Eang, Liefeng Bo, Fuxin Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression”, in Proc. of CoRR, 2019
    Heatmap based method

    View Slide

  9. Heatmap based method
    - Large output heatmap and complicated postprocess cause heavy burden of
    data transmission and computation in embedded system
    - Foreground-Background imbalance problem
    - Spatial information is not fully used, causing detection error
    - Imperfect face detection result in some of the facial landmarks out of bbox

    View Slide

  10. Proposed method
    - Gaussian Vector
    - encode face landmark into vector as supervision
    - accelerates the label preparing and extreme foreground-background problem
    - reduce system complexity including postprocess
    - Band Pooling Module
    - convert hxw output heatmap into hx1 and wx1 vectors as prediction
    - take more spatial information into account yet outputs smaller tensor

    View Slide

  11. Gaussian Vector Label
    - calculate Euclidean distance between each pixel and landmark
    - transform the distance vector to vector label
    - σ: standard deviation
    - θ: positive constant to reinforce the distribution peak

    View Slide

  12. Band Pooling Module (BPM)
    - consist of horizontal band and vertical band [L, w] and horizontal band [L, h]
    - L: adjustable bandwidth, which controls the receptive field size of vector elements
    - much smaller odd number, chosen from 1 to 7
    - each band slide on the heatmap and average the values
    - converting into vector reduce post-processing complexity: O(N^2) -> O(N)
    - output: [N, C, w/h, 2]
    - C: number of facial landmark
    - Aggregated
    - fusing the vectors generated by different bandwidth
    - use this method for experiments
    - bandwidth: 3 and 5

    View Slide

  13. Beyond Box Strategy
    - strategy to predict landmarks located in out of bbox
    - landmark location
    - inside: maximum position of predicted vector is in the middle of the vector
    - outside: maximum is close to one of the endpoints
    - assume predicted vectors obey the distribution when landmark is outside
    - Γ: scale parameter
    - d: distance between s and peak

    View Slide

  14. Beyond Box Strategy
    - From the assumption, we can get the revised landmark location

    View Slide

  15. Experiments: Datasets
    300W (ICCV’13)
    - benchmark challenge dataset for ICCV 2013
    - training data include LFPW, AFW, HELEN, and IBUG datasets
    - re-annotated with semi-supervised learning
    - 3837 images, utilize 3148 images for train and 689 images for val in many previous works
    - test data is newly collected (300 indoor and 300 outdoor)
    - 68 landmark points

    View Slide

  16. Experiments: Datasets
    COFW (Caltech Occluded Faces in the Wild) (ICCV’13)
    - large variations in shape and occlusion
    - train data: 1354 images
    - test data: 507 images
    - 29 landmark points consistent with LFPW

    View Slide

  17. Experiments: Datasets
    WFLW (Wider Facial Landmarks in-the-wild) (CVPR’18)
    - include extreme disturbance
    - train data: 7500 images
    - test data: 2500 images
    - 98 landmark points

    View Slide

  18. JD-landmark (ICME’19)
    - benchmark challenge dataset in ICME 2019
    - train data: 11393 images
    - val/test data: 2000 images
    - 106 landmark points
    Experiments: Datasets

    View Slide

  19. - Normalized Mean Error (NME)
    - average the Euclidean distance between predicted and ground truth landmarks
    - normalized to eliminate the impact caused by the image size inconsistency
    - d: normalization factor
    - Failure Rate (FR)
    - percentage of failure samples whose NME is larger than a threshold
    - Area Under Curve (AUC)
    - calculates the area under the cumulative error distribution (CED) curve
    Experiments: Evaluation Metrics

    View Slide

  20. - For face bbox, extend short side to the same as the long
    - enlarge bbox in WFLW and COFW by 25% and 10%
    - crop image and resize to 256x256
    - augmentation
    - random rotate/scale/occlusion, hflip
    - Adam optimizer
    - backbone: ResNet50, HRNet
    - 4x TITAN X GPUs
    Experiments: implementation setup

    View Slide

  21. Main Result

    View Slide

  22. JD-landmark challenge
    - Single model achieve comparable results to the champion
    - Baidu VIS
    - AutoML for architecture search
    - ensemble model
    - well-designed augmentation

    View Slide

  23. - dataset: 300W
    - Table5
    - left: compare heatmap based method and proposed method
    - right: shrink the GT bbox by 10% in each dimension and compare w/wo BBS
    Effectiveness of BPM/BBS

    View Slide

  24. - use MobileNetV3 for the limitation of the practical system
    - proposed method saves 26% time of whole process
    Time cost analysis

    View Slide

  25. Thank you!
    Question?

    View Slide