Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning for Object Detection

Yiqi Yan
November 10, 2017

Deep Learning for Object Detection

My course presentation in Spring-2017 course Pattern Recognition and Machine Learning

Yiqi Yan

November 10, 2017
Tweet

More Decks by Yiqi Yan

Other Decks in Research

Transcript

  1. Pa r t I F u n d a m

    e n t a l B a c kg r o u n d s
  2. 5  Convolution: receptive field receptive field 3 X 3

    The deeper the network goes, the larger the receptive fields will be
  3. 9 Pa r t I I St a t e

    - of- a r t m e t h o d s f o r o b j e c t d e t e c t i o n
  4. 10  Group One: RCNN & Modifications  (CVPR 2014)

    RCNN: Region Based CNN (TOO SLOW!)  (ECCV 2014) SPP-net: Spatial Pyramid Pooling in CNN  (ICCV 2015) Fast RCNN  (NIPS 2015) Faster RCNN (Online Object Detection)  Group Two: Fast! Online Object Detection  (ECCV 2016) SSD: Single Shot MultiBox Detector  (CVPR 2016) YOLO  (CVPR 2017) YOLO9000  Group Three: Deformable Convolutional Filters  (CVPR 2015) DeepID-Net  (arXiv 2017) Deformable Convolutional Networks  Group Four: Detection + Segmentation  (arXiv 2017) Mask R-CNN
  5. 11 Pa r t I I I Re g i

    o n B a s e d C N N
  6. 12  Motivation Problem One: CNN seems unsuited for Object

    Detection  Unlike image classification, detection requires localizing objects within an image  Deep CNNs have very large receptive fields, which makes precise localization very challenging Problem Two: Deep networks need large dataset to train
  7. 13  Framework Solve Problem One  Extract region proposals

     Recognition using regions Solve Problem Two  Supervised Pre-training on ImageNet  Domain-specified fine-tuning
  8. 15  Region Proposals: Selective Search Uijlings et al, “Selective

    Search for Object Recognition”, IJCV 2013 over-segmentation region-merging
  9. 16  Training method Step 1: Supervised Pre-training Train a

    classification model for ImageNet (AlexNet, VGGNet, etc.) No localization can be done, thus this is just pre-training the parameters. VGG-19 (2015)
  10. 17  Training method Step 2: Domain-specified fine-tuning Change network

    architecture Instead of 1000 ImageNet classes, want N object classes + background (N+1 classes) Need to reinitialize the soft-max layer Fine-tuning using Region Proposals Keep training model using positive / negative regions from detection images This time, use Detection Datasets (VOC, ILCVRC, COC, etc.)
  11. 18  Run Detection Now we get the trained network,

    let us test it!  Step 1: Extract region proposals for all images  Step 2: (for each region) run through CNN, save pool5 features  Step 3: use binary SVM to classify region features (WHY NOT just use soft-max)  Step 5: bounding box regression: For each class, train a linear regression model to make up for “slightly wrong” proposals
  12. 21  R-CNN Problems: too slow!  Training is a

    multi-stage pipeline: RCNN⟶SVMs⟶bounding-box regression  Training is expensive in space and time CNN features are stored for use of training SVMs and regression ~200GB disk place for PASCAL dataset!  Object detection is slow: features are extracted from each object proposal 47s / image on a GPU!
  13. 23  ROI (region of interest) Pooling  Pooling each

    region into a fixed size (7 X 7 in the paper)  Back propagate similar to traditional max pooling
  14. 24  Multi-task loss The network outputs two vectors per

    ROI 1. Soft-max probabilities for classification 2. Per-class bounding-box regression offsets 1st term: traditional cross-entropy loss for soft-max 2nd term: error between predicted and true bounding-box
  15. 25  Why superior to RCNN: problem #1 & #2

    RCNN problem #1: Training is a multi-stage pipeline RCNN problem #2: Training is expensive in space and time Faster RCNN: end-to-end training; no need to store features
  16. 26  Why superior to RCNN: problem #3 RCNN problem

    #3: Object detection is slow because features are extracted from each object proposal Fast RCNN: just run the whole image through CNN; regions are extracted from feature map
  17. 28 Pa r t V Fa s t e r

    R C N N O n l i n e D e t e c t i o n !
  18. 29  Fast RCNN is not fast enough Main bottleneck:

    Selective Search Region Proposal Faster RCNN: Why not just make the CNN do region proposals too!
  19. 30  Faster RCNN framework Insert a Region Proposal Network

    (RPN) trained to produce region proposals directly ROI Pooling, soft-max classifier and bounding box regression are just like Fast RCNN
  20. 31  Region Proposal Network Similar to the multi-task training

    in fast RCNN: classification + bounding box prediction. The difference is that we only need two-class classification here: object & not object
  21. 32  End-to-end joint training!  RPN classification  RPN

    bbx regression  Fast RCNN classification  Fast R-CNN bbx regression
  22. 34 Pa r t V I YO LO : Yo

    u O n l y L o o k O n O n c e
  23. 35  YOLO Framework  Divide image into S x

    S grid (7 X 7 in the parper)  Within each grid cell predict: B Boxes: 4 coordinates + confidence C Class scores  Regression from image to 7 x 7 x (5 * B + C) tensor  Direct prediction using a CNN
  24. 37  Ross Girshick: http://www.rossgirshick.info/  Kaiming He: http://kaiminghe.com/ 

    Joseph Chet Redmon: https://pjreddie.com/  Great Contributors  Code  R-CNN (Cafffe+MATLAB): https://github.com/rbgirshick/rcnn  Fast R-CNN (Caffe+MATLAB): https://github.com/rbgirshick/fast-rcnn  Faster R-CNN (Caffe+MATLAB): https://github.com/ShaoqingRen/faster_rcnn (Caffe+Python): https://github.com/rbgirshick/py-faster-rcnn  YOLO: http://pjreddie.com/darknet/yolo/