Deep Learning based object Detection with YOLO v2

Jumabek Alikhanov @Information Security Research Lab, Inha University YOLO9000: Better,
Faster, Stronger (CVPR 2017, Best Paper Honorable Mention) 1

1. Introduction & Previous Work 2. Better detection performance 3.
Faster processing speed 4. Detecting more classes(object types) 5. Conclusion CONTENTS 2

Task & Evaluation Metric mAP- mean Avarage Precision 3 https://github.com/rafaelpadilla/Object-Detection-Metrics

YOLO v1 Network Output shape = (S, S, B×5 +
C) = (7, 7, 2×5 + 20) = (7, 7, 30). 4

YOLOv1: Loss Function pi-conditional class Prob. Ci - box confidence
score 5 Localization Confidence Classification

Previously Pascal 2007 mAP Speed DPM v5 33.7 .07 FPS
14 s/img R-CNN 66.0 .05 FPS 20 s/img Fast R-CNN 70.0 .5 FPS 2 s/img Faster R-CNN 73.2 7 FPS 140 ms/img YOLO 63.4 45 FPS 22 ms/img 6

Previously Pascal 2007 mAP Speed DPM v5 33.7 .07 FPS
14 s/img R-CNN 66.0 .05 FPS 20 s/img Fast R-CNN 70.0 .5 FPS 2 s/img Faster R-CNN 73.2 7 FPS 140 ms/img YOLO 63.4 45 FPS 22 ms/img 7

Better Performance 8

9 YOLO Train on ImageNet Fine-tune on detection Resize network

10 Fine-tune 448x448 Classifier: +3.5% mAP Train on ImageNet Fine-tune
on detection Resize, fine-tune on ImageNet

Anchor boxes use static initialization

Use k-means clustering to find better initializations https://github.com/Jumabek/darknet_scripts

Static Anchors vs Dimension Clusters 14

Box Location Prediction 15

Dimension Clusters: +5% mAP

17 Multi-scale training: +1.5% mAP

YOLOv2: Fast, Accurate Detection

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object
detectors." arXiv preprint arXiv:1611.10012 (2016).

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object
detectors." arXiv preprint arXiv:1611.10012 (2016). YOLOv2

Faster Detection Speed 23

Speed is not just parameter counts or FLOPs Top 1
Top 5 FLOPs GPU Speed VGG-16 70.5 90.0 30.95 Bn 100 FPS Extraction (YOLOv1) 72.5 90.8 8.52 Bn 180 FPS Resnet50 75.3 92.2 7.66 Bn 90 FPS

Darknet19: A good balance of speed and accuracy Top 1
Top 5 FLOPs GPU Speed VGG-16 70.5 90.0 30.95 Bn 100 FPS Extraction (YOLOv1) 72.5 90.8 8.52 Bn 180 FPS Resnet50 75.3 92.2 7.66 Bn 90 FPS Darknet19 74.0 91.8 5.58 Bn 200 FPS

Why is it fast? Simple & efficient architecture C implementation
26

Stronger - Detecting more classes 27

- 14 million images - 22k classes - Classiﬁcation labels
- 100k images - 80 classes - Detection labels Golden eagle

Typically use softmax over all classes

Can’t just mash classes together...

WordNet has structure but it’s messy

... Each node is a conditional probability

Conclusion • YOLOv2 and YOLO9000 real-time detection systems • YOLOv2
state of the art and faster than other systems • 9K object category detection by YOLO9000 47

1. CVPR paper - https://pjreddie.com/media/files/papers/YOLO9000.pdf 2. Article - https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088 3.
Author’s Presentation - https://docs.google.com/presentation/d/14qBAiyhMOFl_wZW4dA1CkixgXwf0zKGbpw_0oHK8yEM/edit#slide=id.g1f9fb98e4b_0 _132 References 48

Deep Learning based object Detection with YOLO v2

Deep Learning based object Detection with YOLO v2

Jumabek Alikhanov

Other Decks in Research

Featured

Transcript

Jumabek Alikhanov @Information Security Research Lab, Inha University YOLO9000: Better,

1. Introduction & Previous Work 2. Better detection performance 3.

Task & Evaluation Metric mAP- mean Avarage Precision 3 https://github.com/rafaelpadilla/Object-Detection-Metrics

YOLO v1 Network Output shape = (S, S, B×5 +

YOLOv1: Loss Function pi-conditional class Prob. Ci - box confidence

Previously Pascal 2007 mAP Speed DPM v5 33.7 .07 FPS

Previously Pascal 2007 mAP Speed DPM v5 33.7 .07 FPS

Better Performance 8

9 YOLO Train on ImageNet Fine-tune on detection Resize network

10 Fine-tune 448x448 Classifier: +3.5% mAP Train on ImageNet Fine-tune

Anchor boxes use static initialization

Use k-means clustering to find better initializations https://github.com/Jumabek/darknet_scripts

Static Anchors vs Dimension Clusters 14

Box Location Prediction 15

Dimension Clusters: +5% mAP

17 Multi-scale training: +1.5% mAP

YOLOv2: Fast, Accurate Detection

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object

Faster Detection Speed 23

Speed is not just parameter counts or FLOPs Top 1

Darknet19: A good balance of speed and accuracy Top 1

Why is it fast? Simple & efficient architecture C implementation

Stronger - Detecting more classes 27

- 14 million images - 22k classes - Classiﬁcation labels

Typically use softmax over all classes

Can’t just mash classes together...

Can’t just mash classes together...

WordNet has structure but it’s messy

... Each node is a conditional probability

... Each node is a conditional probability P(Bedlington terrier) =

Conclusion • YOLOv2 and YOLO9000 real-time detection systems • YOLOv2

1. CVPR paper - https://pjreddie.com/media/files/papers/YOLO9000.pdf 2. Article - https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088 3.