PyImageConf Workshop: Object Detection with Deep Learning

611844ca2799fc1565945836b55c59bb?s=47 Tryolabs
August 28, 2018

PyImageConf Workshop: Object Detection with Deep Learning

Learn the inner workings of Faster R-CNN, and implement it yourself.

After this, learn about Luminoth, an open source toolkit for Computer Vision which implements this algorithm.



August 28, 2018


  1. Workshop: Object Detection with Deep Learning Understand, implement from scratch

    & apply.
  2. Introduction Who we are 2 Introduction | @tryolabs Agustín Azzinnari

    Lead Research Engineer @ganitsu Alan Descoins CTO @dekked_
  3. Prerequisites for this workshop 3 Introduction Must • Familiarity with

    Python. Recommended • Familiarity with numpy, TensorFlow and Jupyter Notebooks. • Access to Microsoft-sponsored DSVMs (ssh/Jupyter). Helpful to have • Basic knowledge of Machine Learning. • Basics of Deep Learning and Convolutional Neural Networks (CNN).
  4. Agenda Introduction 4 Fundamentals: image classification and object detection (Faster

    R-CNN) Hands on: implement components of Faster R-CNN model Hands on: using Luminoth toolkit for a real world problem
  5. Theory 1: Image classification Aka, label this picture for me,

  6. Image classification What can I do with an image? There’s

    a cat. 6 (millions of operations later...) (many wasted kWh later...)
  7. Image classification From classification to detection Classification There’s a cat

    in the photo 7 Localization There’s a cat and it’s here Detection There are two cats, here and here
  8. What is so hard about this problem? Introducción 8 red

    green blue 1900 1300 Image classification
  9. Challenges of image classification 9 Image classification

  10. The Machine Learning approach 10 Image classification Classical models Extract

    features from the images and use as input to a simple classification algorithm. Deep Learning models Use the images directly as input to a more complex classification algorithm. DATASET
  11. Image classification Neural networks and Deep Learning Neural Networks for

    classification, widely used in the 80s. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. 11 Handwritten digit recognition LeNet-5, Yann LeCun, 1998.
  12. Image classification Convolutional filters A filter that looks at a

    small region and activates more strongly in the presence of certain pattern. 12 Several filters can detect more complex patterns:
  13. Image classification The convolution operation Slide each filter through the

    image to produce an activation map. 13 Source: Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)
  14. Result of a convolutional layer 14 Image classification ... Original

    image 1900 x 1300 x 3 1900 x 1300 x 1 1900 x 1300 x 64
  15. Summarizing information: pooling 15 Image classification ... ... 1900 x

    1300 x 64 950 x 850 x 64 2x2 regions become 1x1 (max in each)
  16. How (max) pooling looks like 16 Image classification Information corresponding

    to regions of the image is summarized.
  17. Image classification Remaining questions How do we know which filters/patterns

    to setup? → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → Hyperparameter of the network (try and see what works best). 17 Source:
  18. 18 Finding interesting filters Image classification Learning combinations of filters

    that are activated (from the activation map) makes it a lot easier to find complex patterns!
  19. Final activation map Source: Visualizing a convolutional network Pre-trained:

    19 Image classification
  20. Theory 2: From classification to detection Aka, tell me what

    and where, for all you see.
  21. Object detection as supervised learning 21 Object detection person bicycle

    bird bird bird stick door (Pascal VOC)
  22. Applications of object detection 22 Object detection CT scan of

    a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest
  23. None
  24. Type of object detection models 24 Object detection Regression based

    methods Region proposal based methods Single stage prediction of object classes and bounding boxes. Examples: • You Only Look Once (YOLO, YOLOv2, YOLOv3) • Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: • R-CNN, Fast R-CNN, Faster R-CNN
  25. Faster R-CNN Aka, Deep Learning model and its variants work

    really well.
  26. Background Evolution of methods proposed in previous years: 26 Faster

    R-CNN 2014 R-CNN - Girshick et al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.
  27. Faster R-CNN A two stage object detector 1. Propose interesting

    regions (Region Proposal Network, RPN) Where should we look? 2. Analyze proposals & adjust (Region-based CNN, R-CNN) Is this an object? If so, which class? 27 person bicycle
  28. First stage (1): Intuition behind regions 28 Faster R-CNN: base

    network Combination of activation maps encodes spatial information!
  29. First stage (2): Resulting set of regions 29 Faster R-CNN:

    region proposal Potentially hundreds or thousands! Regions are agnostic to particular object classes. → “There might be something here!”
  30. 30 Faster R-CNN Second stage (1): Deal with variable sizes

    Resize all the regions to the same dimensions (through Region of Interest Pooling). ... ...
  31. 31 Faster R-CNN Second stage (2): Classification and adjustment Classification:

    what type of object is it (or is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class ... ...
  32. Faster R-CNN Method working summary • Get an activation map

    with a CNN. • Use it to propose interesting regions worth exploring. Associate an objectness score to them. • Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object. 32
  33. 33 Overview of the network Faster R-CNN RoIP 3. Region

    of Interest (RoI) Pooling R-CNN 4. Region-based CNN (R-CNN) RPN 2. Region Proposal Network (RPN) 1. Pre-trained base network
  34. Hands-on: play around with a Convolutional Network Visualize inner workings

    of a ResNet.
  35. Hands on 1: setup your environment Environment setup Using Microsoft

    DSVMs SSH access to your instance: ssh <your-vm-ip> cd ~/notebooks/ git clone cd object-detection-workshop ./ Access Jupyter Hub and pick notebook on object-detection-workshop folder. https://<your-vm-ip>:8000/user/<your-username>/ Using your own laptop See the README: 35
  36. Theory 3: Region Proposal Network (RPN) First stage of Faster

  37. What we have so far... 37 Faster R-CNN: base network

    Image of arbitrary size → feature map. Common architectures: • VGG (16, 19) • ResNet (50, 101, 152, ...) • Inception (V2, V3) • Xception • MobileNet • ... 1/16 spatially, 1024 deep for ResNet 101. Feature map 50 37 600 800 CNN (ResNet) 3 1024
  38. Proposing regions (1): initial idea 38 Faster R-CNN: region proposal

    Idea: 1. Look at spatial position and its vicinity. 2. Predict 2 points (x1, y1), (x2, y2) for each location. Issues: • Can we make the network predict exact pixel coordinates? • Image dimensions are variable.
  39. Proposing regions (2): a better way 39 Faster R-CNN: region

    proposal 1. Take a single spatial position. 2. Define fixed-size reference box (anchor). 3. Find “closest” GT box. 4. Predict the “objectness” of the region. 5. Learn how to modify the reference box (in relative terms, ie. “double its width”). 6. Repeat for every spatial position.
  40. Anchor boxes 40 Faster R-CNN: region proposal For each spatial

    position of the feature map, generate k fixed anchors (with same center). Ie. 3 scales, 3 aspect ratios (k=9) But choose what’s best for dataset
  41. Anchor centers in original image Anchors reference (9 anchors per

    position) Visualizing anchor boxes All anchors 41 Faster R-CNN: region proposal
  42. 42 Region Proposal Network (RPN) Faster R-CNN: region proposal Feature

    map → rectangular proposals + “objectness” score RPN 3x3 conv (pad 1, 512 output channels) 1x1 conv (2k output channels) 1x1 conv (4k output channels) 2k objectness scores 4k box regression scores
  43. All positive anchors IoU > 0.7 Anchors batch positive (green),

    negative (red) 43 Faster R-CNN How does this learn? RPN anchor targets Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN: region proposal
  44. What’s missing 44 Faster R-CNN: region proposal Multi-task loss Filtering

    of proposals • Use Non-Maximum Suppression (NMS). • Keep top in “objectness” only. Classification Standard cross-entropy for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).
  45. 45 Faster R-CNN: region proposal RPN summary: how does a

    forward pass look like? 1. Run image through base network to get feature map. 2. Run feature map through RPN convolutional layers (3x3, 1x1 & 1x1) a. Obtain objectness and box regression scores for each anchor type and spatial position. b. Use regression scores to adjust each anchor. 3. Sort proposals by objectness score. 4. Apply NMS to remove redundant proposals. Result Set of proposals with associated objectness scores
  46. Hands-on: implementing a Region Proposal Network (RPN) First stage of

    Faster R-CNN, in actual code.
  47. Hands on: RPN implementation • Read, read, read. Docstrings have

    crucial implementation details, such as shapes and types. • Comments have hints to help you. ◦ We can help you too, don’t be shy and ask! :D Priorities: 1. Make it work (whatever it takes!). 2. Implement it with vectorized numpy. 3. Implement it in pure TensorFlow. a. Can compile and run in GPU. b. You would have to do this for a real implementation. 47
  48. Theory 4: RoI Pooling and R-CNN Second stage of Faster

  49. 49 Faster R-CNN Second stage (1): Using our proposals To

    learn how to adjust anchors, we looked at a small part of the activation map. To decide better, need to look at all activations corresponding to the regions.
  50. 50 Faster R-CNN Second stage (2): All proposals made equal

    Turn arbitrarily sized proposals into fixed size vectors / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer of NN. 7 7
  51. Region-based CNN (R-CNN) 51 Faster R-CNN Fixed-size outputs of RoI

    Pooling→ Faster R-CNN 7x7x1024 probability distribution (N+1 classes) bounding box regressions (N classes) Flatten FC FC bicycle p=0.96 Softmax
  52. 52 Faster R-CNN person (0.99) bicycle (0.97)

  53. Hands-on: implementing the R-CNN Second stage of Faster R-CNN, in

    actual code.
  54. Building a toolkit

  55. What is Luminoth? Building a toolkit Open-source deep learning library/toolkit

    for computer vision object detection. 55 CLI tools Pre-defined models Cloud integration
  56. Objectives Building a toolkit 56 “Out-of-the-box” usage Production ready Open

    source Readable code Extensible and modular
  57. $ pip install luminoth $ lumi predict video.mp4 -k car

    Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Simplicity as a goal 57 Building a toolkit
  58. Data pipeline Debugging Training Data visualization Evaluation Deployment Beyond the

    model Distributed 58 Building a toolkit Unit testing Monitoring Model
  59. Building a toolkit Using Luminoth 59 $ pip install

    luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models
  60. $ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/

    # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth use cycle 60 Building a toolkit
  61. Hands-on: Luminoth Build and train a Deep Learning model for

  62. Hands on 2: Luminoth for real world object detection Environment

    setup Read the content in GitHub 62
  63. Learn more Learning more • • (Stanford) CS231n: Convolutional

    Neural Networks for Visual Recognition • Deep Learning (Goodfellow, Bengio, Courville) • 63