Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning for Object Detection

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Yiqi Yan Yiqi Yan
November 10, 2017

Deep Learning for Object Detection

My course presentation in Spring-2017 course Pattern Recognition and Machine Learning

Avatar for Yiqi Yan

Yiqi Yan

November 10, 2017
Tweet

More Decks by Yiqi Yan

Other Decks in Research

Transcript

  1. Pa r t I F u n d a m

    e n t a l B a c kg r o u n d s
  2. 5  Convolution: receptive field receptive field 3 X 3

    The deeper the network goes, the larger the receptive fields will be
  3. 9 Pa r t I I St a t e

    - of- a r t m e t h o d s f o r o b j e c t d e t e c t i o n
  4. 10  Group One: RCNN & Modifications  (CVPR 2014)

    RCNN: Region Based CNN (TOO SLOW!)  (ECCV 2014) SPP-net: Spatial Pyramid Pooling in CNN  (ICCV 2015) Fast RCNN  (NIPS 2015) Faster RCNN (Online Object Detection)  Group Two: Fast! Online Object Detection  (ECCV 2016) SSD: Single Shot MultiBox Detector  (CVPR 2016) YOLO  (CVPR 2017) YOLO9000  Group Three: Deformable Convolutional Filters  (CVPR 2015) DeepID-Net  (arXiv 2017) Deformable Convolutional Networks  Group Four: Detection + Segmentation  (arXiv 2017) Mask R-CNN
  5. 11 Pa r t I I I Re g i

    o n B a s e d C N N
  6. 12  Motivation Problem One: CNN seems unsuited for Object

    Detection  Unlike image classification, detection requires localizing objects within an image  Deep CNNs have very large receptive fields, which makes precise localization very challenging Problem Two: Deep networks need large dataset to train
  7. 13  Framework Solve Problem One  Extract region proposals

     Recognition using regions Solve Problem Two  Supervised Pre-training on ImageNet  Domain-specified fine-tuning
  8. 15  Region Proposals: Selective Search Uijlings et al, “Selective

    Search for Object Recognition”, IJCV 2013 over-segmentation region-merging
  9. 16  Training method Step 1: Supervised Pre-training Train a

    classification model for ImageNet (AlexNet, VGGNet, etc.) No localization can be done, thus this is just pre-training the parameters. VGG-19 (2015)
  10. 17  Training method Step 2: Domain-specified fine-tuning Change network

    architecture Instead of 1000 ImageNet classes, want N object classes + background (N+1 classes) Need to reinitialize the soft-max layer Fine-tuning using Region Proposals Keep training model using positive / negative regions from detection images This time, use Detection Datasets (VOC, ILCVRC, COC, etc.)
  11. 18  Run Detection Now we get the trained network,

    let us test it!  Step 1: Extract region proposals for all images  Step 2: (for each region) run through CNN, save pool5 features  Step 3: use binary SVM to classify region features (WHY NOT just use soft-max)  Step 5: bounding box regression: For each class, train a linear regression model to make up for “slightly wrong” proposals
  12. 21  R-CNN Problems: too slow!  Training is a

    multi-stage pipeline: RCNN⟶SVMs⟶bounding-box regression  Training is expensive in space and time CNN features are stored for use of training SVMs and regression ~200GB disk place for PASCAL dataset!  Object detection is slow: features are extracted from each object proposal 47s / image on a GPU!
  13. 23  ROI (region of interest) Pooling  Pooling each

    region into a fixed size (7 X 7 in the paper)  Back propagate similar to traditional max pooling
  14. 24  Multi-task loss The network outputs two vectors per

    ROI 1. Soft-max probabilities for classification 2. Per-class bounding-box regression offsets 1st term: traditional cross-entropy loss for soft-max 2nd term: error between predicted and true bounding-box
  15. 25  Why superior to RCNN: problem #1 & #2

    RCNN problem #1: Training is a multi-stage pipeline RCNN problem #2: Training is expensive in space and time Faster RCNN: end-to-end training; no need to store features
  16. 26  Why superior to RCNN: problem #3 RCNN problem

    #3: Object detection is slow because features are extracted from each object proposal Fast RCNN: just run the whole image through CNN; regions are extracted from feature map
  17. 28 Pa r t V Fa s t e r

    R C N N O n l i n e D e t e c t i o n !
  18. 29  Fast RCNN is not fast enough Main bottleneck:

    Selective Search Region Proposal Faster RCNN: Why not just make the CNN do region proposals too!
  19. 30  Faster RCNN framework Insert a Region Proposal Network

    (RPN) trained to produce region proposals directly ROI Pooling, soft-max classifier and bounding box regression are just like Fast RCNN
  20. 31  Region Proposal Network Similar to the multi-task training

    in fast RCNN: classification + bounding box prediction. The difference is that we only need two-class classification here: object & not object
  21. 32  End-to-end joint training!  RPN classification  RPN

    bbx regression  Fast RCNN classification  Fast R-CNN bbx regression
  22. 34 Pa r t V I YO LO : Yo

    u O n l y L o o k O n O n c e
  23. 35  YOLO Framework  Divide image into S x

    S grid (7 X 7 in the parper)  Within each grid cell predict: B Boxes: 4 coordinates + confidence C Class scores  Regression from image to 7 x 7 x (5 * B + C) tensor  Direct prediction using a CNN
  24. 37  Ross Girshick: http://www.rossgirshick.info/  Kaiming He: http://kaiminghe.com/ 

    Joseph Chet Redmon: https://pjreddie.com/  Great Contributors  Code  R-CNN (Cafffe+MATLAB): https://github.com/rbgirshick/rcnn  Fast R-CNN (Caffe+MATLAB): https://github.com/rbgirshick/fast-rcnn  Faster R-CNN (Caffe+MATLAB): https://github.com/ShaoqingRen/faster_rcnn (Caffe+Python): https://github.com/rbgirshick/py-faster-rcnn  YOLO: http://pjreddie.com/darknet/yolo/