Deep Learning for Computer Vision by Alex Conway

Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor
Confidential – Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy PyConZA’17

Hands up!

Check out the Deep Learning Indaba videos & practicals! http://www.deeplearningindaba.com/videos.html
http://www.deeplearningindaba.com/practicals.html

Deep Learning is Sexy (for a reason!) 4

Image Classification 5 http://yann.lecun.com/exdb/mnist/ https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192
seconds and 70 lines of code)

Image Classification 6

Image Classification 7 ImageNet Classification with Deep Convolutional Neural Networks,
Krizhevsky et. Al. Advances in Neural Information Processing Systems 25 (NIPS2012)

https://research.googleblog.com/2017/06/supercharge- your-computer-vision-models.html Object detection

https://www.youtube.com/watch?v=VOC3huqHrss Object detection

Object detection

Image Captioning & Visual Attention XXX 11 https://einstein.ai/research/knowing-when-to-look-adaptive-attention- via-a-visual-sentinel-for-image-captioning

Image Q&A 12 https://arxiv.org/pdf/1612.00837.pdf

Video Q&A XXX 13 https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix https://affinelayer.com/pix2pix/ https://github.com/affinelayer/pix2pix-tensorflow 14

Pix2Pix https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that- mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66 15

16 Original input Rear Window (1954) Pix2pix output Fully Automated
Remastered Painstakingly by Hand https://hackernoon.com /remastering-classic- films-in-tensorflow-with- pix2pix-f4d551fa0503

Style Transfer https://github.com/junyanz/CycleGAN 17

Style Transfer SORCERY https://github.com/junyanz/CycleGAN 18

Real Fake News https://www.youtube.com/watch?v=MVBe6_o4cMI 19

Deep learning is Magic Deep learning is Magic Deep learning
is EASY!

1. What is a neural network? 2. What is a
convolutionalneural network? 3. How to use a convolutional neural network 4. More advanced Methods 5. Case studies & applications 21

Big Shout Outs Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej
Karpathy http://cs231n.github.io François Chollet (Keras lead dev) https://keras.io/ 22

1.What is a neural network?

What is a neuron? 24 • 3 inputs [x1,x2,x3] •
3 weights [w1,w2,w3] • Element-wise multiply and sum • Apply activation function f • Often add a bias too (weight of 1) – not shown

What is an Activation Function? 25 Sigmoid Tanh ReLU Nonlinearities
… “squashing functions” … transform neuron’s output NB: sigmoid output in [0,1]

What is a (Deep) Neural Network? 26 Inputs outputs hidden
layer 1 hidden layer 2 hidden layer 3 Outputs of one layer are inputs into the next layer

How does a neural network learn? 27 • We need
labelled examples “training data” • We initialize network weights randomly and initially get random predictions • For each labelled training data point, we calculate the error between the network’s predictions and the ground-truth labels • Use ‘backpropagation’ (chain rule), to update the network parameters (weights + convolutional filters ) in the opposite direction to the error

How does a neural network learn? 28 New weight =
Old weight Learning rate - Gradient of weight with respect to Error ( ) x “How much error increases when we increase this weight”

Gradient Descent Interpretation 29 http://scs.ryerson.ca/~aharley/neural-networks/

http://playground.tensorflow.org

What is a Neural Network? For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html 2. Anrej Karpathy’s CS231n Notes http://neuralnetworksanddeeplearning.com/chap1.html 31

2. What is a convolutional neural network?

What is a Convolutional Neural Network? 33 “like a ordinary
neural network but with special types of layers that work well on images” (math works on numbers) • Pixel = 3 colour channels (R, G, B) • Pixel intensity ∈[0,255] • Image has width w and height h • Therefore image is w x h x 3 numbers

34 This is VGGNet – don’t panic, we’ll break it
down piece by piece Example Architecture

35 This is VGGNet – don’t panic, we’ll break it
down piece by piece Example Architecture

How does a neural network learn? 36 • MNIST stop
and think how remarkable it is that we can recognise all of these MNISt as a 3 (change number) • Different pixels!

Convolutions 37 http://setosa.io/ev/image-kernels/

Convolutions 38 http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

New Layer Type: ConvolutionalLayer 39 • 2-d weighted average when
multiply kernel over pixel patches • We slide the kernel over all pixels of the image (handle borders) • Kernel starts off with “random” values and network updates (learns) the kernel values (using backpropagation) to try minimize loss • Kernels shared across the whole image (parameter sharing)

Many Kernels = Many “Activation Maps” = Volume 40 http://cs231n.github.io/convolutional-networks/

New Layer Type: ConvolutionalLayer 41

Convolutions 42 https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Convolutions 43

Convolutions 44

Convolutions 45

Convolution Learn Hierarchical Features 46

Great vid 47 https://www.youtube.com/watch?v=AgkfIQ4IGaM

New Layer Type: Max Pooling 48

New Layer Type: Max Pooling • Reduces dimensionality from one
layer to next • …by replacing NxN sub-area with max value • Makes network “look” at larger areas of the image at a time • e.g. Instead of identifying fur, identify cat • Reduces overfittingsince losing information helps the network generalize 49 http://cs231n.github.io/convolutional-networks/

New Layer Type: Max Pooling 50

Stack Conv + Pooling Layers and Go Deep 51 Convolution
+ max pooling + fully connected + softmax

52 Stack Conv + Pooling Layers and Go Deep Convolution
+ max pooling + fully connected + softmax

53 Stack These Layers and Go Deep Convolution + max
pooling + fully connected + softmax

54 Stack These Layers and Go Deep Convolution + max
pooling + fully connected + softmax

55 Flatten the Final “Bottleneck” layer Convolution + max pooling
+ fully connected + softmax Flatten the final 7 x 7 x 512 max pooling layer Add fully-connecteddense layer on top

56 Bringing it all together Convolution + max pooling +
fully connected + softmax

Softmax Convert scores ∈ ℝ to probabilities ∈ [0,1] Final
output prediction = highest probability class 57

Bringing it all together 58 Convolution + max pooling +
fully connected + softmax

We need labelled training data!

ImageNet 60 http://image-net.org/explore 1000 object categories 1.2 million training images

ImageNet 61

ImageNet 62

ImageNet Top 5 Error Rate 63 Traditional Image Processing Methods
AlexNet 8 Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble

3. How to use a convolutional neural network

Using a Pre-Trained ImageNet-Winning CNN 65 • We’ve been looking
at “VGGNet” • Oxford Visual Geometry Group (VGG) • ImageNet 2014 Runner-up • Network is 16 layers (deep!) • Easy to fine-tune https://blog.keras.io/building-powerful-image-classification-models-using- very-little-data.html

Example: Classifying Product Images 66 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620 Classifying products into 9
categories

67 https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html Start with Pre-Trained ImageNet Model Consumers vs Producers
of Machine Learning

“Transfer Learning” is a game changer

Fine-tuning A CNN To Solve A New Problem • Cut
off last layer of pre-trained Imagenet winning CNN • Keep learned network (convolutions) but replace final layer • Can learn to predict new (completely different) classes • Fine-tuning is re-training new final layer - learn for new task 70

Fine-tuning A CNN To Solve A New Problem 71

72 Before Fine-Tuning

73 After Fine-Tuning

Fine-tuning A CNN To Solve A New Problem • Fix
weights in convolutional layers (set trainable=False) • Remove final dense layer that predicts 1000 ImageNet classes • Replace with new dense layer to predict 9 categories 74 88% accuracy in under 2 minutes for classifying products into categories Fine-tuning is awesome! Insert obligatory brain analogy

Visual Similarity 76 • Chop off last 2 VGG layers
• Use dense layer with 4096 activations • Compute nearest neighbours in the space of these activations https://memeburn.com/2017/06/spree-image-search/

77 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

78 Input Image not seen by model Results Top 10
most “visually similar”

79 Final Convolutional Layer = Semantic Vector • The final
convolutional layer encodes everything the network needs to make predictions • The dense layer added on top and the softmax layer both have lower dimensionality

4. More Advanced Methods

Use a Better Architecture (or all of them!) 81 “Ensambles
win” learn a weighted average of many models’ predictions

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf There are MANY Computer Vision Tasks

Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR
2015 Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016 Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf Object detection

Object detection

https://www.youtube.com/watch?v=VOC3huqHrss Object detection

http://blog.romanofoti.com/style_transfer/ Johnson et al. Perceptual losses for real-time style transfer
and super-resolution. 2016 Style Transfer f ( ) = ,

https://www.youtube.com/watch?v=LhF_56SxrGk

Pixelated Original Output https://arstech nica.com/infor mation- technology/20 17/02/google- brain-super- resolution-
zoom- enhance/

This image is 3.8 kb Super-Resolution

https://github.com/tdeboissiere/BGG16CAM-keras Visual Attention

Image Captioning https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning Karpathy & Li. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 39(4), pp. 664–676

Image Q&A XXX 93 http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

king + woman – man ≈ queen 96 Frome et
al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’, Advances in Neural Information Processing Systems, pp. 2121–2129 CNN + Word2Vec = AWESOME

DeViSE: A Deep Visual-SemanticEmbedding Model XXX 97 Before: Encode labels
as 1-hot vector

DeViSE: A Deep Visual-SemanticEmbedding Model XXX 98 After: Encode labels
as word2vec vectors (FROM A SEPARATE MODEL) Can look these up for all the nouns in ImageNet 300-d word2vec vectors

DeViSE: A Deep Visual-SemanticEmbedding Model wv(fish)+ wv(net) 99 https://www.youtube.com/watch?v=uv0gmrXSXVg 2
wv* = …get nearest neighbours to wv*

5. Case Studies

Estimating Accident Repair Cost from Photos TODO 102 Prototype for
large SA insurer Detect car make & model from registration disk Predict repair cost using learnedmodel

Image & Video Moderation TODO 103 Large international gay dating
app with tens of millions of users uploading hundreds-of-thousands of photos per day

Segmenting Medical Images TODO 104

m Counting People TODO Countshoppers, segment on age & gender
facial recognition loyalty is next

Counting Cars TODO

Detecting Potholes

Extracting Data from Product Catalogues

Real-time ATM Video Classification 109

Optical Sorting TODO 110 https://www.youtube.com/watch?v=Xf7jaxwnyso

We are hiring! alex @ numberboost.com

GET IN TOUCH! Alex Conway alex @ numberboost.com @alxcnwy

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

Deep Learning for Computer Vision by Alex Conway

Deep Learning for Computer Vision by Alex Conway

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript