Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

phalanx
October 08, 2020
530

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

phalanx

October 08, 2020
Tweet

Transcript

  1. Journal Information Title: Gaussian Vector: An Efficient Solution for Facial

    Landmark Detection author: Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su (Horizon Robotics) arxiv: https://arxiv.org/abs/2010.01318 submit date: 2020/10/03
  2. - Facial landmark detection is a fundamentally step in many

    face applications - face recognition, face tracking, face editing - Challenge - blur, overlap, occlusion, illumination - large head pose variation Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation, in Proc. of ICCV, 2019 Facial Landmark Detection
  3. Coordinate regression method - Fully connected layers predict facial landmark

    coordinates - Performance degradation due to spatial information loss Facial landmark detection by deep multi task learning, In European conference on computer vision, Springer (2014) 90-108
  4. - Predict heatmap where each pixel predicts the probability that

    it is landmark - By utilizing spatial information, it boost the performance compared with coordinate regression method Heatmap based method
  5. Heatmap based method - Stacked Hourglass - supervised transformation to

    remove translation - stack multiple hourglass modules to extract multi-scale discriminative feature Jing Yang, Oingshan Liu, Kaihua Zhang, “Stacked Hourglass Network for Robust Facial Landmark Localisation”, in Proc. of CVPR, 2017
  6. - Wing Loss - pay more attention to small and

    medium range errors - switching from L1 loss to modified logarithm function Zhen-Hua Feng , et al, “Wing Loss for Robust Facial Landmark Localization with Convolutional Neural Networks”, in Proc. of CVPR, 2018 Heatmap based method
  7. - Adaptive Wing Loss - adapt loss curvature to ground

    truth pixel values - it can focus on foreground and hard background pixels Xinyao Eang, Liefeng Bo, Fuxin Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression”, in Proc. of CoRR, 2019 Heatmap based method
  8. Heatmap based method - Large output heatmap and complicated postprocess

    cause heavy burden of data transmission and computation in embedded system - Foreground-Background imbalance problem - Spatial information is not fully used, causing detection error - Imperfect face detection result in some of the facial landmarks out of bbox
  9. Proposed method - Gaussian Vector - encode face landmark into

    vector as supervision - accelerates the label preparing and extreme foreground-background problem - reduce system complexity including postprocess - Band Pooling Module - convert hxw output heatmap into hx1 and wx1 vectors as prediction - take more spatial information into account yet outputs smaller tensor
  10. Gaussian Vector Label - calculate Euclidean distance between each pixel

    and landmark - transform the distance vector to vector label - σ: standard deviation - θ: positive constant to reinforce the distribution peak
  11. Band Pooling Module (BPM) - consist of horizontal band and

    vertical band [L, w] and horizontal band [L, h] - L: adjustable bandwidth, which controls the receptive field size of vector elements - much smaller odd number, chosen from 1 to 7 - each band slide on the heatmap and average the values - converting into vector reduce post-processing complexity: O(N^2) -> O(N) - output: [N, C, w/h, 2] - C: number of facial landmark - Aggregated - fusing the vectors generated by different bandwidth - use this method for experiments - bandwidth: 3 and 5
  12. Beyond Box Strategy - strategy to predict landmarks located in

    out of bbox - landmark location - inside: maximum position of predicted vector is in the middle of the vector - outside: maximum is close to one of the endpoints - assume predicted vectors obey the distribution when landmark is outside - Γ: scale parameter - d: distance between s and peak
  13. Experiments: Datasets 300W (ICCV’13) - benchmark challenge dataset for ICCV

    2013 - training data include LFPW, AFW, HELEN, and IBUG datasets - re-annotated with semi-supervised learning - 3837 images, utilize 3148 images for train and 689 images for val in many previous works - test data is newly collected (300 indoor and 300 outdoor) - 68 landmark points
  14. Experiments: Datasets COFW (Caltech Occluded Faces in the Wild) (ICCV’13)

    - large variations in shape and occlusion - train data: 1354 images - test data: 507 images - 29 landmark points consistent with LFPW
  15. Experiments: Datasets WFLW (Wider Facial Landmarks in-the-wild) (CVPR’18) - include

    extreme disturbance - train data: 7500 images - test data: 2500 images - 98 landmark points
  16. JD-landmark (ICME’19) - benchmark challenge dataset in ICME 2019 -

    train data: 11393 images - val/test data: 2000 images - 106 landmark points Experiments: Datasets
  17. - Normalized Mean Error (NME) - average the Euclidean distance

    between predicted and ground truth landmarks - normalized to eliminate the impact caused by the image size inconsistency - d: normalization factor - Failure Rate (FR) - percentage of failure samples whose NME is larger than a threshold - Area Under Curve (AUC) - calculates the area under the cumulative error distribution (CED) curve Experiments: Evaluation Metrics
  18. - For face bbox, extend short side to the same

    as the long - enlarge bbox in WFLW and COFW by 25% and 10% - crop image and resize to 256x256 - augmentation - random rotate/scale/occlusion, hflip - Adam optimizer - backbone: ResNet50, HRNet - 4x TITAN X GPUs Experiments: implementation setup
  19. JD-landmark challenge - Single model achieve comparable results to the

    champion - Baidu VIS - AutoML for architecture search - ensemble model - well-designed augmentation
  20. - dataset: 300W - Table5 - left: compare heatmap based

    method and proposed method - right: shrink the GT bbox by 10% in each dimension and compare w/wo BBS Effectiveness of BPM/BBS
  21. - use MobileNetV3 for the limitation of the practical system

    - proposed method saves 26% time of whole process Time cost analysis