@ODSC
OPEN
DATA
SCIENCE
CONFERENCE
San Francisco | November 2th - 4th 2017

Building an Object Detection
toolkit with TensorFlow
From academic papers to open
source implementation

| @tryolabs
Who we are
Javier Rey
Lead Research Engineer
@vierja
Alan Descoins
CTO
@dekked_
3 Introduction

Then vs now

5 Introduction
Felzenszwalb et. al., “Object Detection with Discriminatively Trained Part Based Models”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 32, 2010.
Detected objects in a sample image (from the COCO dataset) (2017). Source: Google Research Blog.
sofa
bottle
sofa

Agenda
6 Introduction
6
6 Introduction
Challenges and applications of
object detection
Demystifying it: dive into Faster
R-CNN
Luminoth: our open-source
toolkit for computer vision

Challenges of object detection
7 Introduction

Applications of object detection
8 Introduction
CT scan of a lung cancer patient at the Jingdong Zhongmei private hospital in
Yanjiao, China's Hebei Province (AP Photo/Andy Wong)
Hsieh et. al., “Drone-based Object Counting by Spatially Regularized Regional
Proposal Networks”, ICCV 2017.
Source: Pinterest

A hard problem with lots of applications
9
Make it accessible! Build a toolkit!
Introduction

Deep Learning &
Object detection

Figure from https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/
Convolutional feature map
Power of ConvNets as feature extractors
Pre-train:
11 Deep Learning & Object detection

Regression based methods
Type of object detection models
12 Deep Learning & Object detection
Region proposal based methods
Single stage prediction of object
classes and bounding boxes.
Examples:
● You Only Look Once (YOLO)
● Single Shot MultiBox Detector
(SSD)
Two stages:
1. Generate candidate locations
using some algorithm.
2. Adjustment of bounding
boxes and classification.
Examples:
● R-CNN, Fast R-CNN, Faster
R-CNN

Faster R-CNN

Background
Evolution of methods proposed in previous years:
14 Faster R-CNN
2014
R-CNN -
Girshick et. al.
2015
Fast R-CNN -
Girshick.
2016
Faster R-CNN
Ren, Girshick et. al. “Faster R-CNN:
Towards Real-Time Object Detection with
Region Proposal Networks”,
CVPR 2016.

15 Faster R-CNN

16
Overview
Faster R-CNN
RoIP
3. Region of Interest
(RoI) Pooling
R-CNN
4. Region-based CNN
(R-CNN)
RPN
2. Region
Proposal
Network (RPN)
1. Pre-trained
base network

Faster R-CNN
1. Pre-trained base network

Pre-trained base network
Image of arbitrary size → feature map.
Common architectures:
● VGG (16, 19)
● ResNet (v1, v2)
● Inception (V2, V3)
Feature map encodes information for
object detection.
18 Faster R-CNN
Feature map
50
37
600
800
CNN
(VGG16)
3
512

Faster R-CNN
2. Region Proposal Network (RPN)

Region proposals
20
Image (feature map) → proposals:
● variable number.
● different scales and aspect ratios.
● efficient process
● project bounding boxes to original image.
Idea: start with reference boxes, later adjust.
How many reference boxes? A lot!
Faster R-CNN

Anchor boxes
For each spatial position of the feature map,
generate k fixed anchors (with same center).
21 Faster R-CNN
3 scales, 3 aspect ratios (k=9)

Anchor centers in original image
Anchors reference
(9 anchors per position)
Visualizing anchor boxes (1)
Anchors on top of single point
22 Faster R-CNN

All anchors superimposed
Visualizing anchor boxes (2)
Ground truth boxes
labels: person, bicycle
23 Faster R-CNN

Region Proposal Network (RPN)
Feature map → rectangular proposals + “objectness” score
24 Faster R-CNN
RPN
3x3 conv
(pad 1, 512 output channels)
1x1 conv
(2k output channels)
1x1 conv
(4k output channels)
2k objectness scores 4k box regression scores

All positive anchors
IoU > 0.7
Anchors batch
positive (green), negative (red)
25 Faster R-CNN
RPN anchor targets
Need positive (foreground) vs negative (background) anchors.
Use Intersection over Union (IoU) with ground truth.
Faster R-CNN

What’s missing
26 Faster R-CNN
Multi-task loss
Filtering of proposals
● Use Non-Maximum Suppression (NMS).
● Keep top in “objectness” only.
Classification
Standard logarithmic loss
for 2 classes.
Box regression
Smooth L1 between difference of
coordinates (positive anchors).

27 Faster R-CNN

Faster R-CNN
3. Region of Interest (RoI) Pooling

RoI Pooling layer
29 Faster R-CNN
Arbitrarily-sized proposals → fixed spatial size
● Can feed output to fully connected layers.
● Very similar to max pooling.
Faster R-CNN
Project
RoI Pool
7x7x512
Proposal
RoI
512

Faster R-CNN
4. Region-based CNN (R-CNN)

Region-based CNN (R-CNN)
31 Faster R-CNN
Fixed-size outputs of RoI Pooling→
Faster R-CNN
7x7x512
probability distribution (N+1 classes)
bounding box regressions (N classes)
Flatten
FC FC
bicycle
p=0.96
Softmax

32 Faster R-CNN
person (0.99)
bicycle (0.97)

Building a toolkit

Building a toolkit
What is Luminoth?
Open-source deep learning library/toolkit for computer vision object detection.
34
CLI
tools
Pre-defined
models
Cloud
integration

$ pip install luminoth
$ lumi train
# Magic
The goal
35 Building a toolkit

Building a toolkit
Objectives
36
“Out-of-the-box” usage
Production ready
Open source
Readable code
Extensible and modular

Building a toolkit
TensorFlow + Sonnet
import sonnet as snt
def RPN(snt.AbstractModule):
def __init__(self, *args, name='rpn'):
[...] # submodules init, config
def _build(self, inputs):
# TensorFlow code.
return outputs
37
+

Building a toolkit
“Model oriented programming”
● Follow OOP good practices
Faster R-CNN
RPN in: → feature map, anchors
out: → proposals
RCNN in: → proposals, pooled feature maps
out: → objects, labels, probabilities
38

ObjectDetection
39
Hierarchical structure
Building a toolkit
RPN
R-CNN
RoIP
FasterRCNN
RPN
RPNTargets
RPNProposals
TFRecordDataset
ObjectDetectionDataset
RoIPooling
RCNN
RCNNTargets
RCNNProposals
TruncatedNetwork
VGG/ResNet

Building a toolkit
Challenges of coding from papers
40
Small implementation
details have no room
in academic papers
Papers tend to
remain frozen
in time
Many ways to
implement it

Building a toolkit
Challenges of Faster R-CNN implementation
41
Multiple moving parts
Module
dependencies
Multi-task
training

Data pipeline
Debugging
Training
Data visualization
Evaluation
Deployment
Beyond the model
Distributed
42 Building a toolkit
Unit testing
Monitoring
Model

Building a toolkit
https://github.com/tryolabs/luminoth
Using Luminoth
43
$ pip install luminoth
$ lumi --help
Usage: lumi [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
Commands:
cloud Groups of commands to train models in the cloud
dataset Groups of commands to manage datasets
evaluate Evaluate trained (or training) models
server Groups of commands to serve models
train Train models

$ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/
# Create tfrecords for optimizing data consumption.
$ lumi train --config pascal-fasterrcnn.yml
# Hours of training
$ tensorboard --logdir jobs/
# On another GPU/Machine/CPU
$ lumi evaluate --config pascal-fasterrcnn.yml
# Checks for new checkpoints and writes logs
# Finally
$ lumi server web --config pascal-fasterrcnn.yml
# Looks for checkpoint and loads it into a simple frontend/json API server.
Luminoth cycle
44 Building a toolkit

45

Building a toolkit
Luminoth’s future
46
Fine-tune trained models
More models & problems
Tagging ↔ Training integration
Distributed deployment

Thanks for listening! Questions?
Learn more & contribute
github.com/tryolabs/luminoth