face applications - face recognition, face tracking, face editing - Challenge - blur, overlap, occlusion, illumination - large head pose variation Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation, in Proc. of ICCV, 2019 Facial Landmark Detection
coordinates - Performance degradation due to spatial information loss Facial landmark detection by deep multi task learning, In European conference on computer vision, Springer (2014) 90-108
medium range errors - switching from L1 loss to modified logarithm function Zhen-Hua Feng , et al, “Wing Loss for Robust Facial Landmark Localization with Convolutional Neural Networks”, in Proc. of CVPR, 2018 Heatmap based method
truth pixel values - it can focus on foreground and hard background pixels Xinyao Eang, Liefeng Bo, Fuxin Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression”, in Proc. of CoRR, 2019 Heatmap based method
cause heavy burden of data transmission and computation in embedded system - Foreground-Background imbalance problem - Spatial information is not fully used, causing detection error - Imperfect face detection result in some of the facial landmarks out of bbox
vector as supervision - accelerates the label preparing and extreme foreground-background problem - reduce system complexity including postprocess - Band Pooling Module - convert hxw output heatmap into hx1 and wx1 vectors as prediction - take more spatial information into account yet outputs smaller tensor
vertical band [L, w] and horizontal band [L, h] - L: adjustable bandwidth, which controls the receptive field size of vector elements - much smaller odd number, chosen from 1 to 7 - each band slide on the heatmap and average the values - converting into vector reduce post-processing complexity: O(N^2) -> O(N) - output: [N, C, w/h, 2] - C: number of facial landmark - Aggregated - fusing the vectors generated by different bandwidth - use this method for experiments - bandwidth: 3 and 5
out of bbox - landmark location - inside: maximum position of predicted vector is in the middle of the vector - outside: maximum is close to one of the endpoints - assume predicted vectors obey the distribution when landmark is outside - Γ: scale parameter - d: distance between s and peak
2013 - training data include LFPW, AFW, HELEN, and IBUG datasets - re-annotated with semi-supervised learning - 3837 images, utilize 3148 images for train and 689 images for val in many previous works - test data is newly collected (300 indoor and 300 outdoor) - 68 landmark points
between predicted and ground truth landmarks - normalized to eliminate the impact caused by the image size inconsistency - d: normalization factor - Failure Rate (FR) - percentage of failure samples whose NME is larger than a threshold - Area Under Curve (AUC) - calculates the area under the cumulative error distribution (CED) curve Experiments: Evaluation Metrics
as the long - enlarge bbox in WFLW and COFW by 25% and 10% - crop image and resize to 256x256 - augmentation - random rotate/scale/occlusion, hflip - Adam optimizer - backbone: ResNet50, HRNet - 4x TITAN X GPUs Experiments: implementation setup