Tryolabs Workshop: Object Detection with Deep Learning

611844ca2799fc1565945836b55c59bb?s=47 Tryolabs
November 17, 2018

Tryolabs Workshop: Object Detection with Deep Learning

Learn the inner workings of Faster R-CNN, and implement it yourself.

After this, learn about Luminoth, an open source toolkit for Computer Vision which implements this algorithm.



November 17, 2018


  1. 3.

    Introduction Must • Familiarity with Python. • Basic knowledge of

    Machine Learning. • Familiarity with numpy and Jupyter Notebooks. Recommended • Familiarity with TensorFlow. Helpful to have • Basics of Deep Learning and Convolutional Neural Networks (CNN).
  2. 4.

    Introduction Fundamentals: image classification and object detection (Faster R-CNN) Hands

    on: implement components of Faster R-CNN model Hands on: using Luminoth toolkit for a real world problem
  3. 7.

    Image classification Classification There’s a cat in the photo Localization

    There’s a cat and it’s here Detection There are two cats, here and here
  4. 10.

    Image classification Classical models Extract features from the images and

    use as input to a simple classification algorithm. Deep Learning models Use the images directly as input to a more complex classification algorithm. DATASET
  5. 11.

    Image classification Neural Networks for classification, widely used in the

    80s. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. Handwritten digit recognition LeNet-5, Yann LeCun, 1998.
  6. 12.

    Image classification A filter that looks at a small region

    and activates more strongly in the presence of certain pattern. Several filters can detect more complex patterns:
  7. 13.

    Image classification Slide each filter through the image to produce

    an activation map. Source: Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)
  8. 14.
  9. 15.

    Image classification ... ... 1900 x 1300 x 64 950

    x 850 x 64 2x2 regions become 1x1 (max in each)
  10. 17.

    Image classification How do we know which filters/patterns to setup?

    → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → Hyperparameter of the network (try and see what works best). Source:
  11. 18.

    Image classification Learning combinations of filters that are activated (from

    the activation map) makes it a lot easier to find complex patterns!
  12. 22.

    Object detection CT scan of a lung cancer patient at

    the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest
  13. 23.
  14. 24.

    Object detection Regression based methods Region proposal based methods Single

    stage prediction of object classes and bounding boxes. Examples: • You Only Look Once (YOLO, YOLOv2, YOLOv3) • Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: • R-CNN, Fast R-CNN, Faster R-CNN
  15. 25.
  16. 26.

    Evolution of methods proposed in previous years: Faster R-CNN 2014

    R-CNN - Girshick et al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.
  17. 27.

    Faster R-CNN 1. Propose interesting regions (Region Proposal Network, RPN)

    Where should we look? 2. Analyze proposals & adjust (Region-based CNN, R-CNN) Is this an object? If so, which class? person bicycle
  18. 29.

    Faster R-CNN: region proposal Potentially hundreds or thousands! Regions are

    agnostic to particular object classes. → “There might be something here!”
  19. 30.

    Faster R-CNN Resize all the regions to the same dimensions

    (through Region of Interest Pooling). ... ...
  20. 31.

    Faster R-CNN Classification: what type of object is it (or

    is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class ... ...
  21. 32.

    Faster R-CNN • Get an feature map with a CNN.

    • Use it to propose interesting regions worth exploring. Associate an objectness score to them. • Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object.
  22. 36.
  23. 37.

    Faster R-CNN: base network Image of arbitrary size → feature

    map. Common architectures: • VGG (16, 19) • ResNet (50, 101, 152, ...) • Inception (V2, V3) • Xception • MobileNet • ... 1/16 spatially, 1024 deep for ResNet 101. Feature map 50 37 600 800 CNN (ResNet) 3 1024
  24. 38.

    Faster R-CNN: region proposal Idea: 1. Look at spatial position

    and its vicinity. 2. Predict 2 points (x1, y1), (x2, y2) for each location. Issues: • Can we make the network predict exact pixel coordinates? • Image dimensions are variable.
  25. 39.

    Faster R-CNN: region proposal 1. Take a single spatial position.

    2. Define fixed-size reference box (called anchor). 3. Find “closest” GT box. 4. Predict the “objectness” of the region. 5. Learn how to modify the anchor (in relative terms, ie. “double its width”). 6. Repeat for every spatial position.
  26. 40.

    Faster R-CNN: region proposal For each spatial position of the

    feature map, generate k fixed anchors (with same center). Ie. 3 scales, 3 aspect ratios (k=9) But choose what’s best for dataset
  27. 41.

    Anchor centers in original image Anchors reference (9 anchors per

    position) All anchors Faster R-CNN: region proposal
  28. 42.

    Faster R-CNN: region proposal Feature map → rectangular proposals +

    “objectness” score RPN 3x3 conv (pad 1, 512 output channels) 1x1 conv (2k output channels) 1x1 conv (4k output channels) 2k objectness scores 4k box regression scores
  29. 43.

    All positive anchors IoU > 0.7 Anchors batch positive (green),

    negative (red) Faster R-CNN Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN: region proposal
  30. 44.

    Faster R-CNN: region proposal Multi-task loss Filtering of proposals •

    Use Non-Maximum Suppression (NMS). • Keep top in “objectness” only. Classification Standard cross-entropy for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).
  31. 45.

    Faster R-CNN: region proposal 1. Run image through base network

    to get feature map. 2. Run feature map through RPN convolutional layers (3x3, 1x1 & 1x1) a. Obtain objectness and box regression scores for each anchor type and spatial position. b. Use regression scores to adjust each anchor. 3. Sort proposals by objectness score. 4. Apply NMS to remove redundant proposals. Result Set of proposals with associated objectness scores
  32. 46.
  33. 47.

    Faster R-CNN: region proposal • Read, read, read. Docstrings have

    crucial implementation details, such as shapes and types. • Comments have hints to help you. ◦ We can help you too, don’t be shy and ask! :D Priorities: 1. Make it work (whatever it takes!). 2. Implement it with vectorized numpy. 3. Implement it in pure TensorFlow. a. Can compile and run in GPU. b. You would have to do this for a real implementation.
  34. 48.
  35. 49.

    Faster R-CNN To learn how to adjust anchors, we looked

    at a small part of the feature map. To decide better, need to look at all activations corresponding to the regions.
  36. 50.

    Faster R-CNN Turn arbitrarily sized proposals into fixed size vectors

    / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer of NN. 7 7
  37. 51.

    Faster R-CNN Fixed-size outputs of RoI Pooling→ Faster R-CNN 7x7x1024

    probability distribution (N+1 classes) bounding box regressions (N classes) Flatten FC FC bicycle p=0.96 Softmax
  38. 53.
  39. 54.
  40. 55.

    Building a toolkit Open-source deep learning library/toolkit for computer vision

    object detection. CLI tools Pre-defined models Cloud integration
  41. 57.

    $ pip install luminoth $ lumi predict video.mp4 -k car

    Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Building a toolkit
  42. 59.

    Building a toolkit $ pip install luminoth $ lumi

    --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models
  43. 60.

    $ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/

    # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Building a toolkit
  44. 61.
  45. 62.

    Luminoth for real world object detection Environment setup Follow the

    Luminoth Tutorial
  46. 63.

    Learning more • • (Stanford) CS231n: Convolutional Neural Networks

    for Visual Recognition • Deep Learning (Goodfellow, Bengio, Courville) •