Slide 1

Slide 1 text

@ODSC OPEN DATA SCIENCE CONFERENCE San Francisco | November 2th - 4th 2017

Slide 2

Slide 2 text

Building an Object Detection toolkit with TensorFlow From academic papers to open source implementation

Slide 3

Slide 3 text

| @tryolabs Who we are Javier Rey Lead Research Engineer @vierja Alan Descoins CTO @dekked_ 3 Introduction

Slide 4

Slide 4 text

Then vs now

Slide 5

Slide 5 text

5 Introduction Felzenszwalb et. al., “Object Detection with Discriminatively Trained Part Based Models”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2010. Detected objects in a sample image (from the COCO dataset) (2017). Source: Google Research Blog. sofa bottle sofa

Slide 6

Slide 6 text

Agenda 6 Introduction 6 6 Introduction Challenges and applications of object detection Demystifying it: dive into Faster R-CNN Luminoth: our open-source toolkit for computer vision

Slide 7

Slide 7 text

Challenges of object detection 7 Introduction

Slide 8

Slide 8 text

Applications of object detection 8 Introduction CT scan of a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et. al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest

Slide 9

Slide 9 text

A hard problem with lots of applications 9 Make it accessible! Build a toolkit! Introduction

Slide 10

Slide 10 text

Deep Learning & Object detection

Slide 11

Slide 11 text

Figure from Convolutional feature map Power of ConvNets as feature extractors Pre-train: 11 Deep Learning & Object detection

Slide 12

Slide 12 text

Regression based methods Type of object detection models 12 Deep Learning & Object detection Region proposal based methods Single stage prediction of object classes and bounding boxes. Examples: ● You Only Look Once (YOLO) ● Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: ● R-CNN, Fast R-CNN, Faster R-CNN

Slide 13

Slide 13 text

Faster R-CNN

Slide 14

Slide 14 text

Background Evolution of methods proposed in previous years: 14 Faster R-CNN 2014 R-CNN - Girshick et. al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et. al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.

Slide 15

Slide 15 text

15 Faster R-CNN

Slide 16

Slide 16 text

16 Overview Faster R-CNN RoIP 3. Region of Interest (RoI) Pooling R-CNN 4. Region-based CNN (R-CNN) RPN 2. Region Proposal Network (RPN) 1. Pre-trained base network

Slide 17

Slide 17 text

Faster R-CNN 1. Pre-trained base network

Slide 18

Slide 18 text

Pre-trained base network Image of arbitrary size → feature map. Common architectures: ● VGG (16, 19) ● ResNet (v1, v2) ● Inception (V2, V3) Feature map encodes information for object detection. 18 Faster R-CNN Feature map 50 37 600 800 CNN (VGG16) 3 512

Slide 19

Slide 19 text

Faster R-CNN 2. Region Proposal Network (RPN)

Slide 20

Slide 20 text

Region proposals 20 Image (feature map) → proposals: ● variable number. ● different scales and aspect ratios. ● efficient process ● project bounding boxes to original image. Idea: start with reference boxes, later adjust. How many reference boxes? A lot! Faster R-CNN

Slide 21

Slide 21 text

Anchor boxes For each spatial position of the feature map, generate k fixed anchors (with same center). 21 Faster R-CNN 3 scales, 3 aspect ratios (k=9)

Slide 22

Slide 22 text

Anchor centers in original image Anchors reference (9 anchors per position) Visualizing anchor boxes (1) Anchors on top of single point 22 Faster R-CNN

Slide 23

Slide 23 text

All anchors superimposed Visualizing anchor boxes (2) Ground truth boxes labels: person, bicycle 23 Faster R-CNN

Slide 24

Slide 24 text

Region Proposal Network (RPN) Feature map → rectangular proposals + “objectness” score 24 Faster R-CNN RPN 3x3 conv (pad 1, 512 output channels) 1x1 conv (2k output channels) 1x1 conv (4k output channels) 2k objectness scores 4k box regression scores

Slide 25

Slide 25 text

All positive anchors IoU > 0.7 Anchors batch positive (green), negative (red) 25 Faster R-CNN RPN anchor targets Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN

Slide 26

Slide 26 text

What’s missing 26 Faster R-CNN Multi-task loss Filtering of proposals ● Use Non-Maximum Suppression (NMS). ● Keep top in “objectness” only. Classification Standard logarithmic loss for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).

Slide 27

Slide 27 text

27 Faster R-CNN

Slide 28

Slide 28 text

Faster R-CNN 3. Region of Interest (RoI) Pooling

Slide 29

Slide 29 text

RoI Pooling layer 29 Faster R-CNN Arbitrarily-sized proposals → fixed spatial size ● Can feed output to fully connected layers. ● Very similar to max pooling. Faster R-CNN Project RoI Pool 7x7x512 Proposal RoI 512

Slide 30

Slide 30 text

Faster R-CNN 4. Region-based CNN (R-CNN)

Slide 31

Slide 31 text

Region-based CNN (R-CNN) 31 Faster R-CNN Fixed-size outputs of RoI Pooling→ Faster R-CNN 7x7x512 probability distribution (N+1 classes) bounding box regressions (N classes) Flatten FC FC bicycle p=0.96 Softmax

Slide 32

Slide 32 text

32 Faster R-CNN person (0.99) bicycle (0.97)

Slide 33

Slide 33 text

Building a toolkit

Slide 34

Slide 34 text

Building a toolkit What is Luminoth? Open-source deep learning library/toolkit for computer vision object detection. 34 CLI tools Pre-defined models Cloud integration

Slide 35

Slide 35 text

$ pip install luminoth $ lumi train # Magic The goal 35 Building a toolkit

Slide 36

Slide 36 text

Building a toolkit Objectives 36 “Out-of-the-box” usage Production ready Open source Readable code Extensible and modular

Slide 37

Slide 37 text

Building a toolkit TensorFlow + Sonnet import sonnet as snt def RPN(snt.AbstractModule): def __init__(self, *args, name='rpn'): [...] # submodules init, config def _build(self, inputs): # TensorFlow code. return outputs 37 +

Slide 38

Slide 38 text

Building a toolkit “Model oriented programming” ● Follow OOP good practices Faster R-CNN RPN in: → feature map, anchors out: → proposals RCNN in: → proposals, pooled feature maps out: → objects, labels, probabilities 38

Slide 39

Slide 39 text

ObjectDetection 39 Hierarchical structure Building a toolkit RPN R-CNN RoIP FasterRCNN RPN RPNTargets RPNProposals TFRecordDataset ObjectDetectionDataset RoIPooling RCNN RCNNTargets RCNNProposals TruncatedNetwork VGG/ResNet

Slide 40

Slide 40 text

Building a toolkit Challenges of coding from papers 40 Small implementation details have no room in academic papers Papers tend to remain frozen in time Many ways to implement it

Slide 41

Slide 41 text

Building a toolkit Challenges of Faster R-CNN implementation 41 Multiple moving parts Module dependencies Multi-task training

Slide 42

Slide 42 text

Data pipeline Debugging Training Data visualization Evaluation Deployment Beyond the model Distributed 42 Building a toolkit Unit testing Monitoring Model

Slide 43

Slide 43 text

Building a toolkit Using Luminoth 43 $ pip install luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets evaluate Evaluate trained (or training) models server Groups of commands to serve models train Train models

Slide 44

Slide 44 text

$ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/ # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU $ lumi evaluate --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs # Finally $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth cycle 44 Building a toolkit

Slide 45

Slide 45 text


Slide 46

Slide 46 text

Building a toolkit Luminoth’s future 46 Fine-tune trained models More models & problems Tagging ↔ Training integration Distributed deployment

Slide 47

Slide 47 text

Thanks for listening! Questions? Learn more & contribute