Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

Db0553d2aacb394f95a0dd064d0311bf?s=47 phalanx
October 08, 2020

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection



October 08, 2020


  1. 2020/10/10 @phalanx journal part Gaussian Vector: An Efficient Solution for

    Facial Landmark Detection
  2. Journal Information Title: Gaussian Vector: An Efficient Solution for Facial

    Landmark Detection author: Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su (Horizon Robotics) arxiv: https://arxiv.org/abs/2010.01318 submit date: 2020/10/03
  3. - Facial landmark detection is a fundamentally step in many

    face applications - face recognition, face tracking, face editing - Challenge - blur, overlap, occlusion, illumination - large head pose variation Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation, in Proc. of ICCV, 2019 Facial Landmark Detection
  4. Coordinate regression method - Fully connected layers predict facial landmark

    coordinates - Performance degradation due to spatial information loss Facial landmark detection by deep multi task learning, In European conference on computer vision, Springer (2014) 90-108
  5. - Predict heatmap where each pixel predicts the probability that

    it is landmark - By utilizing spatial information, it boost the performance compared with coordinate regression method Heatmap based method
  6. Heatmap based method - Stacked Hourglass - supervised transformation to

    remove translation - stack multiple hourglass modules to extract multi-scale discriminative feature Jing Yang, Oingshan Liu, Kaihua Zhang, “Stacked Hourglass Network for Robust Facial Landmark Localisation”, in Proc. of CVPR, 2017
  7. - Wing Loss - pay more attention to small and

    medium range errors - switching from L1 loss to modified logarithm function Zhen-Hua Feng , et al, “Wing Loss for Robust Facial Landmark Localization with Convolutional Neural Networks”, in Proc. of CVPR, 2018 Heatmap based method
  8. - Adaptive Wing Loss - adapt loss curvature to ground

    truth pixel values - it can focus on foreground and hard background pixels Xinyao Eang, Liefeng Bo, Fuxin Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression”, in Proc. of CoRR, 2019 Heatmap based method
  9. Heatmap based method - Large output heatmap and complicated postprocess

    cause heavy burden of data transmission and computation in embedded system - Foreground-Background imbalance problem - Spatial information is not fully used, causing detection error - Imperfect face detection result in some of the facial landmarks out of bbox
  10. Proposed method - Gaussian Vector - encode face landmark into

    vector as supervision - accelerates the label preparing and extreme foreground-background problem - reduce system complexity including postprocess - Band Pooling Module - convert hxw output heatmap into hx1 and wx1 vectors as prediction - take more spatial information into account yet outputs smaller tensor
  11. Gaussian Vector Label - calculate Euclidean distance between each pixel

    and landmark - transform the distance vector to vector label - σ: standard deviation - θ: positive constant to reinforce the distribution peak
  12. Band Pooling Module (BPM) - consist of horizontal band and

    vertical band [L, w] and horizontal band [L, h] - L: adjustable bandwidth, which controls the receptive field size of vector elements - much smaller odd number, chosen from 1 to 7 - each band slide on the heatmap and average the values - converting into vector reduce post-processing complexity: O(N^2) -> O(N) - output: [N, C, w/h, 2] - C: number of facial landmark - Aggregated - fusing the vectors generated by different bandwidth - use this method for experiments - bandwidth: 3 and 5
  13. Beyond Box Strategy - strategy to predict landmarks located in

    out of bbox - landmark location - inside: maximum position of predicted vector is in the middle of the vector - outside: maximum is close to one of the endpoints - assume predicted vectors obey the distribution when landmark is outside - Γ: scale parameter - d: distance between s and peak
  14. Beyond Box Strategy - From the assumption, we can get

    the revised landmark location
  15. Experiments: Datasets 300W (ICCV’13) - benchmark challenge dataset for ICCV

    2013 - training data include LFPW, AFW, HELEN, and IBUG datasets - re-annotated with semi-supervised learning - 3837 images, utilize 3148 images for train and 689 images for val in many previous works - test data is newly collected (300 indoor and 300 outdoor) - 68 landmark points
  16. Experiments: Datasets COFW (Caltech Occluded Faces in the Wild) (ICCV’13)

    - large variations in shape and occlusion - train data: 1354 images - test data: 507 images - 29 landmark points consistent with LFPW
  17. Experiments: Datasets WFLW (Wider Facial Landmarks in-the-wild) (CVPR’18) - include

    extreme disturbance - train data: 7500 images - test data: 2500 images - 98 landmark points
  18. JD-landmark (ICME’19) - benchmark challenge dataset in ICME 2019 -

    train data: 11393 images - val/test data: 2000 images - 106 landmark points Experiments: Datasets
  19. - Normalized Mean Error (NME) - average the Euclidean distance

    between predicted and ground truth landmarks - normalized to eliminate the impact caused by the image size inconsistency - d: normalization factor - Failure Rate (FR) - percentage of failure samples whose NME is larger than a threshold - Area Under Curve (AUC) - calculates the area under the cumulative error distribution (CED) curve Experiments: Evaluation Metrics
  20. - For face bbox, extend short side to the same

    as the long - enlarge bbox in WFLW and COFW by 25% and 10% - crop image and resize to 256x256 - augmentation - random rotate/scale/occlusion, hflip - Adam optimizer - backbone: ResNet50, HRNet - 4x TITAN X GPUs Experiments: implementation setup
  21. Main Result

  22. JD-landmark challenge - Single model achieve comparable results to the

    champion - Baidu VIS - AutoML for architecture search - ensemble model - well-designed augmentation
  23. - dataset: 300W - Table5 - left: compare heatmap based

    method and proposed method - right: shrink the GT bbox by 10% in each dimension and compare w/wo BBS Effectiveness of BPM/BBS
  24. - use MobileNetV3 for the limitation of the practical system

    - proposed method saves 26% time of whole process Time cost analysis
  25. Thank you! Question?