Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an Object Detection toolkit with Tenso...

Tryolabs
October 14, 2017

Building an Object Detection toolkit with TensorFlow (ODSC Europe 2017)

From academic papers to open source implementation.

In recent years, models based on Convolutional Neural Networks (CNNs) have revolutionized the entire field of computer vision. Problems like image classification can now be considered solved, and it is easy to construct implementations with any modern Deep Learning framework using fine tuning with pre-trained weights on datasets such as ImageNet. However, the harder problems of object detection and segmentation require much more complex methods to solve. Object detection consists of picking up the objects and drawing a rectangular bounding box, while segmentation aims to identify the exact pixels that belong to each object. One of the main differences with image classification is that the same image may contain several objects and those could be in very different proportions, sizes, lighting and partially occluded. In this talk, we will discuss how state of the art object detection techniques work. In particular, we will take a look at Region Proposal Networks (RPNs), which propose candidate object locations (“proposals”) which are later refined to achieve precise localization. We will then look at the architecture of an object detection system, and the performance considerations of different algorithms. Moreover, we will explore an implementation of an open source Python object detection toolkit based on TensorFlow, going through the details and tricks of the trade when using such ecosystem on production ready applications.

Tryolabs

October 14, 2017
Tweet

More Decks by Tryolabs

Other Decks in Research

Transcript

  1. | @tryolabs Who we are Javier Rey Lead Research Engineer

    @vierja Alan Descoins CTO @dekked_ 3 Introduction
  2. 5 Introduction Felzenszwalb et. al., “Object Detection with Discriminatively Trained

    Part Based Models”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2010. Detected objects in a sample image (from the COCO dataset) (2017). Source: Google Research Blog. sofa bottle sofa
  3. Agenda 6 Introduction 6 6 Introduction Challenges and applications of

    object detection Demystifying it: dive into Faster R-CNN Luminoth: our open-source toolkit for computer vision
  4. Applications of object detection 8 Introduction CT scan of a

    lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et. al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest
  5. A hard problem with lots of applications 9 Make it

    accessible! Build a toolkit! Introduction
  6. Regression based methods Type of object detection models 12 Deep

    Learning & Object detection Region proposal based methods Single stage prediction of object classes and bounding boxes. Examples: • You Only Look Once (YOLO) • Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: • R-CNN, Fast R-CNN, Faster R-CNN
  7. Background Evolution of methods proposed in previous years: 14 Faster

    R-CNN 2014 R-CNN - Girshick et. al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et. al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.
  8. 16 Overview Faster R-CNN RoIP 3. Region of Interest (RoI)

    Pooling R-CNN 4. Region-based CNN (R-CNN) RPN 2. Region Proposal Network (RPN) 1. Pre-trained base network
  9. Pre-trained base network Image of arbitrary size → feature map.

    Common architectures: • VGG (16, 19) • ResNet (v1, v2) • Inception (V2, V3) Feature map encodes information for object detection. 18 Faster R-CNN Feature map 50 37 600 800 CNN (VGG16) 3 512
  10. Region proposals 20 Image (feature map) → proposals: • variable

    number. • different scales and aspect ratios. • efficient process • project bounding boxes to original image. Idea: start with reference boxes, later adjust. How many reference boxes? A lot! Faster R-CNN
  11. Anchor boxes For each spatial position of the feature map,

    generate k fixed anchors (with same center). 21 Faster R-CNN 3 scales, 3 aspect ratios (k=9)
  12. Anchor centers in original image Anchors reference (9 anchors per

    position) Visualizing anchor boxes (1) Anchors on top of single point 22 Faster R-CNN
  13. Region Proposal Network (RPN) Feature map → rectangular proposals +

    “objectness” score 24 Faster R-CNN RPN 3x3 conv (pad 1, 512 output channels) 1x1 conv (2k output channels) 1x1 conv (4k output channels) 2k objectness scores 4k box regression scores
  14. All positive anchors IoU > 0.7 Anchors batch positive (green),

    negative (purple) 25 Faster R-CNN RPN anchor targets Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN
  15. What’s missing 26 Faster R-CNN Multi-task loss Filtering of proposals

    • Use Non-Maximum Suppression (NMS). • Keep top in “objectness” only. Classification Standard logarithmic loss for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).
  16. RoI Pooling layer 29 Faster R-CNN Arbitrarily-sized proposals → fixed

    spatial size • Can feed output to fully connected layers. • Very similar to max pooling. Faster R-CNN Project RoI Pool 7x7x512 Proposal RoI 512
  17. Region-based CNN (R-CNN) 31 Faster R-CNN Fixed-size outputs of RoI

    Pooling→ Faster R-CNN 7x7x512 probability distribution (N+1 classes) bounding box regressions (N classes) Flatten FC FC bicycle p=0.96 Softmax
  18. Building a toolkit What is Luminoth? Open-source deep learning library/toolkit

    for computer vision object detection. 34 CLI tools Pre-defined models Cloud integration
  19. Building a toolkit TensorFlow + Sonnet import sonnet as snt

    def RPN(snt.AbstractModule): def __init__(self, *args, name='rpn'): [...] # submodules init, config def _build(self, inputs): # TensorFlow code. return outputs 37 +
  20. Building a toolkit “Model oriented programming” • Follow OOP good

    practices Faster R-CNN RPN in: → feature map, anchors out: → proposals RCNN in: → proposals, pooled feature maps out: → objects, labels, probabilities 38
  21. ObjectDetection 39 Hierarchical structure Building a toolkit RPN R-CNN RoIP

    FasterRCNN RPN RPNTargets RPNProposals TFRecordDataset ObjectDetectionDataset RoIPooling RCNN RCNNTargets RCNNProposals TruncatedNetwork VGG/ResNet
  22. Building a toolkit Challenges of coding from papers 40 Small

    implementation details have no room in academic papers Papers tend to remain frozen in time Many ways to implement it
  23. Building a toolkit Challenges of Faster R-CNN implementation 41 Multiple

    moving parts Module dependencies Multi-task training
  24. Data pipeline Debugging Training Data visualization Evaluation Deployment Beyond the

    model Distributed 42 Building a toolkit Unit testing Monitoring Model
  25. Building a toolkit https://github.com/tryolabs/luminoth Using Luminoth 43 $ pip install

    luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets evaluate Evaluate trained (or training) models server Groups of commands to serve models train Train models
  26. $ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/

    # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU $ lumi evaluate --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs # Finally $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth cycle 44 Building a toolkit
  27. Building a toolkit Luminoth’s future 45 Fine-tune trained models More

    models & problems Tagging ↔ Training integration Distributed deployment