Python. Recommended • Familiarity with numpy, TensorFlow and Jupyter Notebooks. • Access to Microsoft-sponsored DSVMs (ssh/Jupyter). Helpful to have • Basic knowledge of Machine Learning. • Basics of Deep Learning and Convolutional Neural Networks (CNN).
features from the images and use as input to a simple classification algorithm. Deep Learning models Use the images directly as input to a more complex classification algorithm. DATASET
classification, widely used in the 80s. Convolutional Neural Network (Yann LeCun, 1989) really good for pattern recognition with minimal preprocessing. 11 Handwritten digit recognition LeNet-5, Yann LeCun, 1998.
image to produce an activation map. 13 Source: https://github.com/vdumoulin/conv_arithmetic Use more filters to detect patterns over activation maps (patterns over patterns over patterns…)
to setup? → We learn them. They are regular weights of the network (use backpropagation). How do we know how many filters in each layer? → Hyperparameter of the network (try and see what works best). 17 Source: https://cs231n.github.io/understanding-cnn/
a lung cancer patient at the Jingdong Zhongmei private hospital in Yanjiao, China's Hebei Province (AP Photo/Andy Wong) Hsieh et al., “Drone-based Object Counting by Spatially Regularized Regional Proposal Networks”, ICCV 2017. Source: Pinterest
methods Region proposal based methods Single stage prediction of object classes and bounding boxes. Examples: • You Only Look Once (YOLO, YOLOv2, YOLOv3) • Single Shot MultiBox Detector (SSD) Two stages: 1. Generate candidate locations using some algorithm. 2. Adjustment of bounding boxes and classification. Examples: • R-CNN, Fast R-CNN, Faster R-CNN
regions (Region Proposal Network, RPN) Where should we look? 2. Analyze proposals & adjust (Region-based CNN, R-CNN) Is this an object? If so, which class? 27 person bicycle
what type of object is it (or is it background)? → probability distribution Regression: how should I resize the box to better enclose the object? → do it per class ... ...
with a CNN. • Use it to propose interesting regions worth exploring. Associate an objectness score to them. • Classify regions. Discard those that are background (ie. keep good scores only) Learn how to further adjust for each class of object. 32
DSVMs SSH access to your instance: ssh <your-vm-ip> cd ~/notebooks/ git clone https://github.com/tryolabs/object-detection-workshop.git cd object-detection-workshop ./download_checkpoint.sh Access Jupyter Hub and pick notebook on object-detection-workshop folder. https://<your-vm-ip>:8000/user/<your-username>/ Using your own laptop See the README: https://github.com/tryolabs/object-detection-workshop 35
Idea: 1. Look at spatial position and its vicinity. 2. Predict 2 points (x1, y1), (x2, y2) for each location. Issues: • Can we make the network predict exact pixel coordinates? • Image dimensions are variable.
proposal 1. Take a single spatial position. 2. Define fixed-size reference box (anchor). 3. Find “closest” GT box. 4. Predict the “objectness” of the region. 5. Learn how to modify the reference box (in relative terms, ie. “double its width”). 6. Repeat for every spatial position.
negative (red) 43 Faster R-CNN How does this learn? RPN anchor targets Need positive (foreground) vs negative (background) anchors. Use Intersection over Union (IoU) with ground truth. Faster R-CNN: region proposal
of proposals • Use Non-Maximum Suppression (NMS). • Keep top in “objectness” only. Classification Standard cross-entropy for 2 classes. Box regression Smooth L1 between difference of coordinates (positive anchors).
forward pass look like? 1. Run image through base network to get feature map. 2. Run feature map through RPN convolutional layers (3x3, 1x1 & 1x1) a. Obtain objectness and box regression scores for each anchor type and spatial position. b. Use regression scores to adjust each anchor. 3. Sort proposals by objectness score. 4. Apply NMS to remove redundant proposals. Result Set of proposals with associated objectness scores
crucial implementation details, such as shapes and types. • Comments have hints to help you. ◦ We can help you too, don’t be shy and ask! :D Priorities: 1. Make it work (whatever it takes!). 2. Implement it with vectorized numpy. 3. Implement it in pure TensorFlow. a. Can compile and run in GPU. b. You would have to do this for a real implementation. 47
learn how to adjust anchors, we looked at a small part of the activation map. To decide better, need to look at all activations corresponding to the regions.
Turn arbitrarily sized proposals into fixed size vectors / “squares”. Process is called RoI pooling. Allows us to feed into fully connected layer of NN. 7 7
Found 1 files to predict. Neither checkpoint not config specified, assuming `accurate`. Predicting video.mp4 [#############] 100% fps: 5.9 Simplicity as a goal 57 Building a toolkit
luminoth $ lumi --help Usage: lumi [OPTIONS] COMMAND [ARGS]... Options: -h, --help Show this message and exit. Commands: checkpoint Groups of commands to manage checkpoints cloud Groups of commands to train models in the cloud dataset Groups of commands to manage datasets eval Evaluate trained (or training) models predict Obtain a model's predictions server Groups of commands to serve models train Train models
# Create tfrecords for optimizing data consumption. $ lumi train --config pascal-fasterrcnn.yml # Hours of training... $ tensorboard --logdir jobs/ # On another GPU/Machine/CPU. $ lumi eval --config pascal-fasterrcnn.yml # Checks for new checkpoints and writes logs. # Finally. $ lumi server web --config pascal-fasterrcnn.yml # Looks for checkpoint and loads it into a simple frontend/json API server. Luminoth use cycle 60 Building a toolkit