Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Object Detection - PyCon APAC 2018

Tryolabs
June 04, 2018
2.3k

Introduction to Object Detection - PyCon APAC 2018

In recent years, models based on Convolutional Neural Networks (CNNs) have revolutionized the entire field of computer vision. Problems like image classification can now be considered solved, and it is easy to construct implementations with any modern Deep Learning framework using fine tuning with pre-trained weights on datasets such as ImageNet.

In this talk, we will explore how and why these techniques work, getting an understanding of the intuitive aspects of what the networks are actually doing. Moreover, this intuition will enable us to understand how to jump from image classification to the more complex problem of object detection, explaining the workings of the Faster R-CNN algorithm in the process.

We will also speak about an open source Python object detection toolkit based on TensorFlow called Luminoth, going over the motivation behind it and showing how it can be integrated to your application.

Tryolabs

June 04, 2018
Tweet

Transcript

  1. 4 Introduction “In the 60s, Marvin Minsky assigned a couple

    of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.”
  2. Introduction Fast Forward to 2018: bird or not? # Initialize

    pre-trained model (ImageNet) model = ResNet50() # Load image and transform for model input x = load_img(img_path, target_size=(224, 224)) x = img_to_array(x) x = np.expand_dims(x, axis=0) x = preprocess_input(x) # Predict and decode preds = model.predict(x) print(Pred:', decode_predictions(preds, top=1)) 5 • Python • < 50 lines (with comments) ◦ Core solver is just 7 lines • Keras library • ImageNet https://github.com/dekked/bird-or-not Bird! NOT Bird!
  3. What is so hard about this problem? Introducción 8 red

    green blue 1900 1300 Image classification
  4. Image classification Neural networks and Deep Learning Neural Networks for

    classification, widely used in the 80. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. 10 Handwritten digit recognition LeNet-5, Yann LeCun, 1998.
  5. Image classification Convolutional filters A filter that looks at a

    small region and activates more strongly in the presence of certain pattern. 11 Several filters can detect more complex patterns:
  6. Image classification The convolution operation Slide each filter through the

    image to produce an activation map. 12 Source: https://github.com/vdumoulin/conv_arithmetic Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)
  7. Image classification Remaining questions How do we know which filters/patterns

    to setup? → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → hyperparameter of the network (try and see what works best). 13 Source: https://cs231n.github.io/understanding-cnn/
  8. Result of a convolutional layer 14 Image classification ... Original

    image 1900 x 1300 x 3 1900 x 1300 x 1 1900 x 1300 x 64
  9. Summarizing information: pooling 15 Image classification ... ... 1900 x

    1300 x 1 950 x 850 x 1 2x2 regions become 1x1 (max in each)
  10. 18 Finding interesting filters Image classification Learning combinations of filters

    that are activated (from the activation map) makes it a lot easier to find complex patterns!
  11. Why didn’t this happen in the 80s? 19 Image classification

    ReLUs Dropout BatchNorm Skip-conn ...
  12. Applications of object detection 22 Object detection CT scan of

    a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest
  13. Background Evolution of methods proposed in previous years: 25 Faster

    R-CNN 2014 R-CNN - Girshick et al. 2015 Fast R-CNN - Girshick. 2016 Faster R-CNN Ren, Girshick et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, CVPR 2016.
  14. Faster R-CNN A two stage object detector 1. Propose interesting

    regions (RPN) Where should we look? 2. Analyze proposals & adjust (R-CNN) Is this an object? If so, which class? 26 person bicycle
  15. First stage (1): Intuition behind regions 27 Faster R-CNN Combination

    of activation maps encodes spatial information!
  16. First stage (2): Proposing regions 28 Faster R-CNN 1. Take

    a single spatial position. 2. Define reference box (anchor). 3. Learn if it’s an object or not, and adjust. 4. Repeat for every spatial position.
  17. Anchor centers in original image Anchors reference (9 anchors per

    position) First stage (3): Visualizing anchor boxes All anchors 29 Faster R-CNN
  18. 31 Faster R-CNN Second stage (1): Using our proposals To

    learn how to adjust anchors, we looked at a small part of the activation map. To decide better, need to look at all activations corresponding to the regions.
  19. 32 Faster R-CNN Second stage (2): All proposals made equal

    Turn arbitrarily sized proposals into fixed size vectors / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer. 7 7
  20. 33 Faster R-CNN Second stage (3): Classification and adjustment Classification:

    what type of object is it (or is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class
  21. Faster R-CNN Method summary • Get an activation map with

    a CNN. • Use it to propose interesting regions worth exploring. Associate an objectness score to them. • Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object. 34
  22. What is Luminoth? Building a toolkit Open-source deep learning library/toolkit

    for computer vision object detection. 36 CLI tools Pre-defined models Cloud integration
  23. $ pip install luminoth $ lumi predict video.mp4 -k car

    Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Simplicity as a goal 38 Building a toolkit
  24. Data pipeline Debugging Training Data visualization Evaluation Deployment Beyond the

    model Distributed 39 Building a toolkit Unit testing Monitoring Model
  25. Building a toolkit https://github.com/tryolabs/luminoth Using Luminoth 40 $ pip install

    luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models
  26. Invoking from your Python app 41 Building a toolkit from

    PIL import Image from luminoth.tools.checkpoint import get_checkpoint_config from luminoth.utils.predicting import PredictorNetwork image = Image.open('bird.jpg').convert('RGB') config = get_checkpoint_config('accurate') network = PredictorNetwork(config) objects = network.predict_image(image) # [{'bbox': [778, 323, 1124, 814], 'label': 'bird', 'prob': 0.9998}]
  27. $ lumi dataset transform --type pascal --data-dir /data/pascal --output /data/

    # Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth use cycle 42 Building a toolkit
  28. Learn more Learning more • https://tryolabs.com/blog/ • (Stanford) CS231n: Convolutional

    Neural Networks for Visual Recognition http://cs231n.stanford.edu/ • Deep Learning (Goodfellow, Bengio, Courville) http://www.deeplearningbook.org/ • https://distill.pub/ 44