Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Борис Лесцов, Детектирование людей в толпе

OpenTalks.AI
February 15, 2019

OpenTalks.AI - Борис Лесцов, Детектирование людей в толпе

OpenTalks.AI

February 15, 2019
Tweet

More Decks by OpenTalks.AI

Other Decks in Science

Transcript

  1. Computer Vision Team We solve computer vision problems at Mail.Ru

    Projects: 1) Vision (b2b) 2) Cloud 3) Mail 4) ...
  2. Business case 1) Queue optimisation a) Open the elevator (ski

    resort) b) Call the cashier 2) Await time estimation
  3. Approaches 1) Classical CV (HOG, Deformable Part Models, ViolaJones) 2)

    Motion-based detection (background subtraction) 3) CNN: a) Two stage - Faster RCNN b) Single stage - SSD, YOLO, RetinaNet.
  4. Faster RCNN + Accurate + Bigger resolution => better result

    - Slow - More objects => more proposals => slower detection
  5. FocalLoss Problem: class disbalance 99 : 1 Cross Entropy (CE):

    Focal Loss (FL): p t - predicted probability of g.t. class:
  6. Small pedestrians Bigger resolution => better result, but slower. 800x600

    : 30 fps, ~73.5% AP 1200x800: 15 fps, ~78.0% AP
  7. Tracking use cases 1) Tracking itself 2) Less False Positives

    on a video stream. 3) Deal with “blinking” detections.
  8. SORT (Simple Online and Realtime Tracking) • Association by IoU

    • Kalman Filters • Fast We fine-tuned SORT
  9. Conclusion 1) Two stage detectors are more accurate, but slower

    2) Bigger resolution => better accuracy, slower 3) ResNet, FPN, Focal Loss => better result
  10. Metrics: AP - Average Precision (single class) • False Positive

    (FP) - predicted bbox without IoU>0.5 with some g.t. • False Negative (FN) - g.t. bbox without IoU>0.5 with some predicted box. 1) Compute predictions. 2) Plot Precision-Recall Curve, make
  11. Intuition about Kalman Filter in SORT Box is represented with

    vector: • u,v - coordinates of the center • s - box scale • r - box aspect ratio • dotted u, v, s - corresponding derivatives Notes: 1. Linear prediction from frame to frame with correction from detector output. 2. Generally can model broad range of dynamic systems (fluid in a tank, the temperature of a car engine).
  12. Appendix Repulsion Loss Three components: 1) Attraction to matched g.t.

    box. 2) Repulsion from other g.t. boxes. 3) Repulsion from other predicted boxes. Technically, IoU is
  13. RetinaMask 1) RetinaNet adapted to instance segmentation 2) Mask prediction

    gives good improvement in detection quality (~2.3% mAP on COCO). 3) Masks are predicted in Faster-RCNN manner. Mask prediction can be discarded during inference to speed up the detector. 4) Tune masks on COCO “Person” category, detection on CrowdHuman. 5) Code and models available!
  14. Resolution 1) SGD training instead of Adam. 2) Replacing SSD

    with RetinaNet arch. 3) FocalLoss 4) Bigger resolution (current models: 800x600 and 1200x800) 5) scale_by_aspect instead of simple resize. 6) Anchor box tuning. 7) Crop augmentations 8) Joint training with head detection. 9) Removing strides from convolutions in last
  15. Things that did NOT work out 1) MTCNN for detecting

    small people. 2) Prediction of full bounding box instead of the visible one.