Tryolabs Workshop: Object Detection with Deep Learning

Slide 1

Slide 1 text

Workshop

Slide 2

Slide 2 text

Introduction Introduction | @tryolabs Agustín Azzinnari Lead Research Engineer @ganitsu Alan Descoins CTO @dekked_

Slide 3

Slide 3 text

Introduction Must ● Familiarity with Python. ● Basic knowledge of Machine Learning. ● Familiarity with numpy and Jupyter Notebooks. Recommended ● Familiarity with TensorFlow. Helpful to have ● Basics of Deep Learning and Convolutional Neural Networks (CNN).

Slide 4

Slide 4 text

Introduction Fundamentals: image classification and object detection (Faster R-CNN) Hands on: implement components of Faster R-CNN model Hands on: using Luminoth toolkit for a real world problem

Slide 5

Slide 5 text

Theory 1: label

Slide 6

Slide 6 text

Image classification There’s a cat. (millions of operations later...) (many wasted kWh later...)

Slide 7

Slide 7 text

Image classification Classification There’s a cat in the photo Localization There’s a cat and it’s here Detection There are two cats, here and here

Slide 8

Slide 8 text

Introducción red green blue 1900 1300 Image classification

Slide 9

Slide 9 text

Image classification

Slide 10

Slide 10 text

Image classification Classical models Extract features from the images and use as input to a simple classification algorithm. Deep Learning models Use the images directly as input to a more complex classification algorithm. DATASET

Slide 11

Slide 11 text

Image classification Neural Networks for classification, widely used in the 80s. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. Handwritten digit recognition LeNet-5, Yann LeCun, 1998.

Slide 12

Slide 12 text

Image classification A filter that looks at a small region and activates more strongly in the presence of certain pattern. Several filters can detect more complex patterns:

Slide 13

Slide 13 text

Image classification Slide each filter through the image to produce an activation map. Source: https://github.com/vdumoulin/conv_arithmetic Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)

Slide 14

Slide 14 text

Image classification ... Original image 1900 x 1300 x 3 1900 x 1300 x 1 1900 x 1300 x 64

Slide 15

Slide 15 text

Image classification ... ... 1900 x 1300 x 64 950 x 850 x 64 2x2 regions become 1x1 (max in each)

Slide 16

Slide 16 text

Image classification Information corresponding to regions of the image is summarized.

Slide 17

Slide 17 text

Image classification How do we know which filters/patterns to setup? → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → Hyperparameter of the network (try and see what works best). Source: https://cs231n.github.io/understanding-cnn/

Slide 18

Slide 18 text

Image classification Learning combinations of filters that are activated (from the activation map) makes it a lot easier to find complex patterns!

Slide 19

Slide 19 text

Feature map Source: https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/ Pre-trained: Image classification

Slide 20

Slide 20 text

Theory 2: what where,

Slide 21

Slide 21 text

Object detection person bicycle bird bird bird stick door (Pascal VOC)

Slide 22

Slide 22 text

Object detection CT scan of a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Object detection Regression based methods Region proposal based methods Single stage prediction of object classes and bounding boxes. Examples: ● You Only Look Once (YOLO, YOLOv2, YOLOv3) ● Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: ● R-CNN, Fast R-CNN, Faster R-CNN

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Evolution of methods proposed in previous years: Faster R-CNN 2014 R-CNN - Girshick et al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.

Slide 27

Slide 27 text

Faster R-CNN 1. Propose interesting regions (Region Proposal Network, RPN) Where should we look? 2. Analyze proposals & adjust (Region-based CNN, R-CNN) Is this an object? If so, which class? person bicycle

Slide 28

Slide 28 text

Faster R-CNN: base network Combination of activation maps encodes spatial information!

Slide 29

Slide 29 text

Faster R-CNN: region proposal Potentially hundreds or thousands! Regions are agnostic to particular object classes. → “There might be something here!”

Slide 30

Slide 30 text

Faster R-CNN Resize all the regions to the same dimensions (through Region of Interest Pooling). ... ...

Slide 31

Slide 31 text

Faster R-CNN Classification: what type of object is it (or is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class ... ...

Slide 32

Slide 32 text

Faster R-CNN ● Get an feature map with a CNN. ● Use it to propose interesting regions worth exploring. Associate an objectness score to them. ● Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object.

Slide 33

Slide 33 text

Faster R-CNN RoIP R-CNN RPN

Slide 34

Slide 34 text

Hands-on ResNet

Slide 35

Slide 35 text

Environment setup Using your own laptop See the README: https://github.com/tryolabs/object-detection-workshop

Slide 36

Slide 36 text

Theory 3:

Slide 37

Slide 37 text

Faster R-CNN: base network Image of arbitrary size → feature map. Common architectures: ● VGG (16, 19) ● ResNet (50, 101, 152, ...) ● Inception (V2, V3) ● Xception ● MobileNet ● ... 1/16 spatially, 1024 deep for ResNet 101. Feature map 50 37 600 800 CNN (ResNet) 3 1024

Slide 38

Slide 38 text

Faster R-CNN: region proposal Idea: 1. Look at spatial position and its vicinity. 2. Predict 2 points (x1, y1), (x2, y2) for each location. Issues: ● Can we make the network predict exact pixel coordinates? ● Image dimensions are variable.

Slide 39

Slide 39 text

Faster R-CNN: region proposal 1. Take a single spatial position. 2. Define fixed-size reference box (called anchor). 3. Find “closest” GT box. 4. Predict the “objectness” of the region. 5. Learn how to modify the anchor (in relative terms, ie. “double its width”). 6. Repeat for every spatial position.

Slide 40

Slide 40 text

Faster R-CNN: region proposal For each spatial position of the feature map, generate k fixed anchors (with same center). Ie. 3 scales, 3 aspect ratios (k=9) But choose what’s best for dataset

Slide 41

Slide 41 text

Anchor centers in original image Anchors reference (9 anchors per position) All anchors Faster R-CNN: region proposal

Slide 42

Slide 42 text

Faster R-CNN: region proposal Feature map → rectangular proposals + “objectness” score RPN 3x3 conv (pad 1, 512 output channels) 1x1 conv (2k output channels) 1x1 conv (4k output channels) 2k objectness scores 4k box regression scores

Slide 43

Slide 43 text

All positive anchors IoU > 0.7 Anchors batch positive (green), negative (red) Faster R-CNN Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN: region proposal

Slide 44

Slide 44 text

Faster R-CNN: region proposal Multi-task loss Filtering of proposals ● Use Non-Maximum Suppression (NMS). ● Keep top in “objectness” only. Classification Standard cross-entropy for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).

Slide 45

Slide 45 text

Faster R-CNN: region proposal 1. Run image through base network to get feature map. 2. Run feature map through RPN convolutional layers (3x3, 1x1 & 1x1) a. Obtain objectness and box regression scores for each anchor type and spatial position. b. Use regression scores to adjust each anchor. 3. Sort proposals by objectness score. 4. Apply NMS to remove redundant proposals. Result Set of proposals with associated objectness scores

Slide 46

Slide 46 text

Hands-on

Slide 47

Slide 47 text

Faster R-CNN: region proposal ● Read, read, read. Docstrings have crucial implementation details, such as shapes and types. ● Comments have hints to help you. ○ We can help you too, don’t be shy and ask! :D Priorities: 1. Make it work (whatever it takes!). 2. Implement it with vectorized numpy. 3. Implement it in pure TensorFlow. a. Can compile and run in GPU. b. You would have to do this for a real implementation.

Slide 48

Slide 48 text

Theory 4:

Slide 49

Slide 49 text

Faster R-CNN To learn how to adjust anchors, we looked at a small part of the feature map. To decide better, need to look at all activations corresponding to the regions.

Slide 50

Slide 50 text

Faster R-CNN Turn arbitrarily sized proposals into fixed size vectors / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer of NN. 7 7

Slide 51

Slide 51 text

Faster R-CNN Fixed-size outputs of RoI Pooling→ Faster R-CNN 7x7x1024 probability distribution (N+1 classes) bounding box regressions (N classes) Flatten FC FC bicycle p=0.96 Softmax

Slide 52

Slide 52 text

Faster R-CNN person (0.99) bicycle (0.97)

Slide 53

Slide 53 text

Hands-on

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Building a toolkit Open-source deep learning library/toolkit for computer vision object detection. CLI tools Pre-defined models Cloud integration

Slide 56

Slide 56 text

Building a toolkit “Out-of-the-box” usage Production ready Open source Readable code Extensible and modular

Slide 57

Slide 57 text

$ pip install luminoth $ lumi predict video.mp4 -k car Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Building a toolkit

Slide 58

Slide 58 text

Data pipeline Debugging Training Data visualization Evaluation Deployment Distributed Building a toolkit Unit testing Monitoring Model

Slide 59

Slide 59 text

Building a toolkit https://github.com/tryolabs/luminoth $ pip install luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models

Slide 60

Slide 60 text

$ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/ # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Building a toolkit

Slide 61

Slide 61 text

Hands-on

Slide 62

Slide 62 text

Luminoth for real world object detection Environment setup Follow the Luminoth Tutorial https://luminoth.readthedocs.io/en/stable/tutorial/index.html

Slide 63

Slide 63 text

Learning more ● https://tryolabs.com/blog/ ● (Stanford) CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/ ● Deep Learning (Goodfellow, Bengio, Courville) http://www.deeplearningbook.org/ ● https://distill.pub/