[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

2020/10/10 @phalanx journal part Gaussian Vector: An Efficient Solution for
Facial Landmark Detection

Journal Information Title: Gaussian Vector: An Efficient Solution for Facial
Landmark Detection author: Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su (Horizon Robotics) arxiv: https://arxiv.org/abs/2010.01318 submit date: 2020/10/03

- Facial landmark detection is a fundamentally step in many
face applications - face recognition, face tracking, face editing - Challenge - blur, overlap, occlusion, illumination - large head pose variation Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation, in Proc. of ICCV, 2019 Facial Landmark Detection

Coordinate regression method - Fully connected layers predict facial landmark
coordinates - Performance degradation due to spatial information loss Facial landmark detection by deep multi task learning, In European conference on computer vision, Springer (2014) 90-108

- Predict heatmap where each pixel predicts the probability that
it is landmark - By utilizing spatial information, it boost the performance compared with coordinate regression method Heatmap based method

Heatmap based method - Stacked Hourglass - supervised transformation to
remove translation - stack multiple hourglass modules to extract multi-scale discriminative feature Jing Yang, Oingshan Liu, Kaihua Zhang, “Stacked Hourglass Network for Robust Facial Landmark Localisation”, in Proc. of CVPR, 2017

- Wing Loss - pay more attention to small and
medium range errors - switching from L1 loss to modified logarithm function Zhen-Hua Feng , et al, “Wing Loss for Robust Facial Landmark Localization with Convolutional Neural Networks”, in Proc. of CVPR, 2018 Heatmap based method

- Adaptive Wing Loss - adapt loss curvature to ground
truth pixel values - it can focus on foreground and hard background pixels Xinyao Eang, Liefeng Bo, Fuxin Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression”, in Proc. of CoRR, 2019 Heatmap based method

Heatmap based method - Large output heatmap and complicated postprocess
cause heavy burden of data transmission and computation in embedded system - Foreground-Background imbalance problem - Spatial information is not fully used, causing detection error - Imperfect face detection result in some of the facial landmarks out of bbox

Proposed method - Gaussian Vector - encode face landmark into
vector as supervision - accelerates the label preparing and extreme foreground-background problem - reduce system complexity including postprocess - Band Pooling Module - convert hxw output heatmap into hx1 and wx1 vectors as prediction - take more spatial information into account yet outputs smaller tensor

Gaussian Vector Label - calculate Euclidean distance between each pixel
and landmark - transform the distance vector to vector label - σ: standard deviation - θ: positive constant to reinforce the distribution peak

Band Pooling Module (BPM) - consist of horizontal band and
vertical band [L, w] and horizontal band [L, h] - L: adjustable bandwidth, which controls the receptive field size of vector elements - much smaller odd number, chosen from 1 to 7 - each band slide on the heatmap and average the values - converting into vector reduce post-processing complexity: O(N^2) -> O(N) - output: [N, C, w/h, 2] - C: number of facial landmark - Aggregated - fusing the vectors generated by different bandwidth - use this method for experiments - bandwidth: 3 and 5

Beyond Box Strategy - strategy to predict landmarks located in
out of bbox - landmark location - inside: maximum position of predicted vector is in the middle of the vector - outside: maximum is close to one of the endpoints - assume predicted vectors obey the distribution when landmark is outside - Γ: scale parameter - d: distance between s and peak

Beyond Box Strategy - From the assumption, we can get
the revised landmark location

Experiments: Datasets 300W (ICCV’13) - benchmark challenge dataset for ICCV
2013 - training data include LFPW, AFW, HELEN, and IBUG datasets - re-annotated with semi-supervised learning - 3837 images, utilize 3148 images for train and 689 images for val in many previous works - test data is newly collected (300 indoor and 300 outdoor) - 68 landmark points

Experiments: Datasets COFW (Caltech Occluded Faces in the Wild) (ICCV’13)
- large variations in shape and occlusion - train data: 1354 images - test data: 507 images - 29 landmark points consistent with LFPW

Experiments: Datasets WFLW (Wider Facial Landmarks in-the-wild) (CVPR’18) - include
extreme disturbance - train data: 7500 images - test data: 2500 images - 98 landmark points

JD-landmark (ICME’19) - benchmark challenge dataset in ICME 2019 -
train data: 11393 images - val/test data: 2000 images - 106 landmark points Experiments: Datasets

- Normalized Mean Error (NME) - average the Euclidean distance
between predicted and ground truth landmarks - normalized to eliminate the impact caused by the image size inconsistency - d: normalization factor - Failure Rate (FR) - percentage of failure samples whose NME is larger than a threshold - Area Under Curve (AUC) - calculates the area under the cumulative error distribution (CED) curve Experiments: Evaluation Metrics

- For face bbox, extend short side to the same
as the long - enlarge bbox in WFLW and COFW by 25% and 10% - crop image and resize to 256x256 - augmentation - random rotate/scale/occlusion, hflip - Adam optimizer - backbone: ResNet50, HRNet - 4x TITAN X GPUs Experiments: implementation setup

Main Result

JD-landmark challenge - Single model achieve comparable results to the
champion - Baidu VIS - AutoML for architecture search - ensemble model - well-designed augmentation

- dataset: 300W - Table5 - left: compare heatmap based
method and proposed method - right: shrink the GT bbox by 10% in each dimension and compare w/wo BBS Effectiveness of BPM/BBS

- use MobileNetV3 for the limitation of the practical system
- proposed method saves 26% time of whole process Time cost analysis

Thank you! Question?

[kaggle-cv] Gaussian Vector: An Efficient Solut...

[kaggle-cv] Gaussian Vector: An Efficient Solution for Facial Landmark Detection

phalanx

More Decks by phalanx

Featured

Transcript

2020/10/10 @phalanx journal part Gaussian Vector: An Efficient Solution for

Journal Information Title: Gaussian Vector: An Efficient Solution for Facial

- Facial landmark detection is a fundamentally step in many

Coordinate regression method - Fully connected layers predict facial landmark

- Predict heatmap where each pixel predicts the probability that

Heatmap based method - Stacked Hourglass - supervised transformation to

- Wing Loss - pay more attention to small and

- Adaptive Wing Loss - adapt loss curvature to ground

Heatmap based method - Large output heatmap and complicated postprocess

Proposed method - Gaussian Vector - encode face landmark into

Gaussian Vector Label - calculate Euclidean distance between each pixel

Band Pooling Module (BPM) - consist of horizontal band and

Beyond Box Strategy - strategy to predict landmarks located in

Beyond Box Strategy - From the assumption, we can get

Experiments: Datasets 300W (ICCV’13) - benchmark challenge dataset for ICCV

Experiments: Datasets COFW (Caltech Occluded Faces in the Wild) (ICCV’13)

Experiments: Datasets WFLW (Wider Facial Landmarks in-the-wild) (CVPR’18) - include

JD-landmark (ICME’19) - benchmark challenge dataset in ICME 2019 -

- Normalized Mean Error (NME) - average the Euclidean distance

- For face bbox, extend short side to the same

Main Result

JD-landmark challenge - Single model achieve comparable results to the

- dataset: 300W - Table5 - left: compare heatmap based

- use MobileNetV3 for the limitation of the practical system

Thank you! Question?