CodeFest 2019. Борис Лесцов (Mail.Ru Group) — Детектирование людей в толпе

Person detection in crowds Boris Lestsov Mail.Ru Group

Computer Vision Team We solve computer vision problems at Mail.Ru
Projects: 1) Vision (b2b) 2) Cloud 3) Mail 4) ...

Business case

Business case 1) Queue optimisation a) Open the elevator (ski
resort) b) Call the cashier 2) Await time estimation

Business requirements 1) Works in various setups 2) Real time
3) Robust on video

Challenges

Challenges Heavy occlusion

Challenges Pose, illumination, clothing variability.

Metrics and datasets

Intersection Over Union (IoU) Measures detection quality for single bounding
box. Gives FP, FN

Metrics: AP - Average Precision (single class) 1. Compute predictions.
2. Plot Precision-Recall Curve, make non-increasing 3. Compute Area Under Curve Multiclass: mean AP (mAP) Different IoU thresholds

mAP - mean Average Precision (VOC) Compute mean of AP
for all classes. Problem: These detections give the same contribution to mAP.

Metrics: mAP@[.5:.95] (COCO) 1) For each IoU threshold in [.5:.95]
= [0.5, 0.55, 0.6, …, 0.9, 0.95] compute mAP. 2) Average these values to get mAP@[.5:.95]: Also: log average miss-rate (mMR) is used sometimes

Benchmarks CrowdHuman - our main dataset Custom dataset similar to
target domain

CrowdHuman examples

Why create own solution? Existing solutions: • Not adapted to
crowds • Slow inference

Approaches

Approaches 1) Classical CV (HOG, Deformable Part Models, ViolaJones) 2)
Motion-based detection (background subtraction) 3) CNN: a) Two stage - Faster RCNN b) Single stage - SSD, YOLO, RetinaNet. c) Cascaded - MTCNN

Faster RCNN

Faster RCNN Person?

Faster RCNN + Accurate + Bigger resolution => better result
- Slow - More objects => more proposals => slower detection

Single Shot Detector

SSD (Single Shot Detector) Image

SSD (Single Shot Detector) Extract features

Single Shot Detector Reducing height & width => to detect
at different scales

Different Scales Smaller scale Bigger scale

Predict displacement Predict ∆x, ∆y

Refine bbox shape Predict Sx , Sy :

Single Shot Detector Predict bounding boxes

Single Shot Detector Merge similar detections with NMS

Non Maximum Suppression

Problems with SSD • Backbone - VGG-16 • 512x512 =>
breaking aspect ratio

Architecture: RetinaNet 1) Backbone - ResNet 2) Feature Pyramid Network
(FPN) 3) FocalLoss against class disbalance

Feature Pyramid Network Higher level features for smaller scales

FocalLoss Problem: class disbalance 99 : 1 Cross Entropy (CE):
Focal Loss (FL): pt - predicted probability of g.t. class:

Focal Loss 1) Well classified examples => smaller contribution 2)
Analogue of Online Hard Example Mining

CodeFest 2019. Борис Лесцов (Mail.Ru Group) — Д...

CodeFest 2019. Борис Лесцов (Mail.Ru Group) — Детектирование людей в толпе

More Decks by CodeFest

Other Decks in Technology

Featured

Transcript