Introduction to Object Detection - PyCon APAC 2018

by Tryolabs

Slide 1

Slide 1 text

Introduction to Object Detection How computers can see and how your Python app can, too. PyCon APAC

Slide 2

Slide 2 text

Introduction Who I am 2 Alan Descoins CTO @dekked_ | @tryolabs Introduction

Slide 3

Slide 3 text

3 Introduction Source: https://xkcd.com/1425/ (24 September 2014)

Slide 4

Slide 4 text

4 Introduction “In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.”

Slide 5

Slide 5 text

Introduction Fast Forward to 2018: bird or not? # Initialize pre-trained model (ImageNet) model = ResNet50() # Load image and transform for model input x = load_img(img_path, target_size=(224, 224)) x = img_to_array(x) x = np.expand_dims(x, axis=0) x = preprocess_input(x) # Predict and decode preds = model.predict(x) print(Pred:', decode_predictions(preds, top=1)) 5 ● Python ● < 50 lines (with comments) ○ Core solver is just 7 lines ● Keras library ● ImageNet https://github.com/dekked/bird-or-not Bird! NOT Bird!

Slide 6

Slide 6 text

Agenda Introduction 6 Understanding images: the basics Intuitions behind modern object detection Luminoth: Python toolkit

Slide 7

Slide 7 text

Image classification Aka, label this picture for me, please.

Slide 8

Slide 8 text

What is so hard about this problem? Introducción 8 red green blue 1900 1300 Image classification

Slide 9

Slide 9 text

The Machine Learning approach 9 Image classification Dataset Algorithm (Convolutional NN) Trained model

Slide 10

Slide 10 text

Image classification Neural networks and Deep Learning Neural Networks for classification, widely used in the 80. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. 10 Handwritten digit recognition LeNet-5, Yann LeCun, 1998.

Slide 11

Slide 11 text

Image classification Convolutional filters A filter that looks at a small region and activates more strongly in the presence of certain pattern. 11 Several filters can detect more complex patterns:

Slide 12

Slide 12 text

Image classification The convolution operation Slide each filter through the image to produce an activation map. 12 Source: https://github.com/vdumoulin/conv_arithmetic Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)

Slide 13

Slide 13 text

Image classification Remaining questions How do we know which filters/patterns to setup? → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → hyperparameter of the network (try and see what works best). 13 Source: https://cs231n.github.io/understanding-cnn/

Slide 14

Slide 14 text

Result of a convolutional layer 14 Image classification ... Original image 1900 x 1300 x 3 1900 x 1300 x 1 1900 x 1300 x 64

Slide 15

Slide 15 text

Summarizing information: pooling 15 Image classification ... ... 1900 x 1300 x 1 950 x 850 x 1 2x2 regions become 1x1 (max in each)

Slide 16

Slide 16 text

How (max) pooling looks like 16 Image classification Information corresponding to regions of the image is summarized.

Slide 17

Slide 17 text

Activation map Source: https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/ Visualizing a convolutional network Pre-trained: 17 Image classification

Slide 18

Slide 18 text

18 Finding interesting filters Image classification Learning combinations of filters that are activated (from the activation map) makes it a lot easier to find complex patterns!

Slide 19

Slide 19 text

Why didn’t this happen in the 80s? 19 Image classification ReLUs Dropout BatchNorm Skip-conn ...

Slide 20

Slide 20 text

From classification to detection Aka, tell me what and where, for all you see.

Slide 21

Slide 21 text

Object detection as supervised learning 21 Object detection person bicycle bird bird bird stick door (Pascal VOC)

Slide 22

Slide 22 text

Applications of object detection 22 Object detection CT scan of a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Faster R-CNN Aka, Deep Learning model and its variants work really well.

Slide 25

Slide 25 text

Background Evolution of methods proposed in previous years: 25 Faster R-CNN 2014 R-CNN - Girshick et al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.

Slide 26

Slide 26 text

Faster R-CNN A two stage object detector 1. Propose interesting regions (RPN) Where should we look? 2. Analyze proposals & adjust (R-CNN) Is this an object? If so, which class? 26 person bicycle

Slide 27

Slide 27 text

First stage (1): Intuition behind regions 27 Faster R-CNN Combination of activation maps encodes spatial information!

Slide 28

Slide 28 text

First stage (2): Proposing regions 28 Faster R-CNN 1. Take a single spatial position. 2. Define reference box (anchor). 3. Learn if it’s an object or not, and adjust. 4. Repeat for every spatial position.

Slide 29

Slide 29 text

Anchor centers in original image Anchors reference (9 anchors per position) First stage (3): Visualizing anchor boxes All anchors 29 Faster R-CNN

Slide 30

Slide 30 text

First stage (4): resulting set of regions 30 Faster R-CNN Potentially hundreds or thousands!

Slide 31

Slide 31 text

31 Faster R-CNN Second stage (1): Using our proposals To learn how to adjust anchors, we looked at a small part of the activation map. To decide better, need to look at all activations corresponding to the regions.

Slide 32

Slide 32 text

32 Faster R-CNN Second stage (2): All proposals made equal Turn arbitrarily sized proposals into fixed size vectors / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer. 7 7

Slide 33

Slide 33 text

33 Faster R-CNN Second stage (3): Classification and adjustment Classification: what type of object is it (or is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class

Slide 34

Slide 34 text

Faster R-CNN Method summary ● Get an activation map with a CNN. ● Use it to propose interesting regions worth exploring. Associate an objectness score to them. ● Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object. 34

Slide 35

Slide 35 text

Building a toolkit

Slide 36

Slide 36 text

What is Luminoth? Building a toolkit Open-source deep learning library/toolkit for computer vision object detection. 36 CLI tools Pre-defined models Cloud integration

Slide 37

Slide 37 text

Objectives Building a toolkit 37 “Out-of-the-box” usage Production ready Open source Readable code Extensible and modular

Slide 38

Slide 38 text

$ pip install luminoth $ lumi predict video.mp4 -k car Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Simplicity as a goal 38 Building a toolkit

Slide 39

Slide 39 text

Data pipeline Debugging Training Data visualization Evaluation Deployment Beyond the model Distributed 39 Building a toolkit Unit testing Monitoring Model

Slide 40

Slide 40 text

Building a toolkit https://github.com/tryolabs/luminoth Using Luminoth 40 $ pip install luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models

Slide 41

Slide 41 text

Invoking from your Python app 41 Building a toolkit from PIL import Image from luminoth.tools.checkpoint import get_checkpoint_config from luminoth.utils.predicting import PredictorNetwork image = Image.open('bird.jpg').convert('RGB') config = get_checkpoint_config('accurate') network = PredictorNetwork(config) objects = network.predict_image(image) # [{'bbox': [778, 323, 1124, 814], 'label': 'bird', 'prob': 0.9998}]

Slide 42

Slide 42 text

$ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/ # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth use cycle 42 Building a toolkit

Slide 43

Slide 43 text

Luminoth: Demo

Slide 44

Slide 44 text

Learn more Learning more ● https://tryolabs.com/blog/ ● (Stanford) CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/ ● Deep Learning (Goodfellow, Bengio, Courville) http://www.deeplearningbook.org/ ● https://distill.pub/ 44

Slide 45

Slide 45 text

Thanks for listening! Questions? github.com/tryolabs/luminoth @tryolabs @dekked_ Alan Descoins