Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning for Computer Vision by Alex Conway

Pycon ZA
October 06, 2017

Deep Learning for Computer Vision by Alex Conway

The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the international ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 before deep learning to just 2.25% in 2017 (human level error is around 5%).

In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.

The talk will give an overview of the cutting edge in the field and some of the core mathematical concepts behind the models. It will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…

Pycon ZA

October 06, 2017
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor

    Confidential – Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy PyConZA’17
  2. Image Classification 7 ImageNet Classification with Deep Convolutional Neural Networks,

    Krizhevsky et. Al. Advances in Neural Information Processing Systems 25 (NIPS2012)
  3. 16 Original input Rear Window (1954) Pix2pix output Fully Automated

    Remastered Painstakingly by Hand https://hackernoon.com /remastering-classic- films-in-tensorflow-with- pix2pix-f4d551fa0503
  4. 1. What is a neural network? 2. What is a

    convolutionalneural network? 3. How to use a convolutional neural network 4. More advanced Methods 5. Case studies & applications 21
  5. Big Shout Outs Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej

    Karpathy http://cs231n.github.io François Chollet (Keras lead dev) https://keras.io/ 22
  6. What is a neuron? 24 • 3 inputs [x1,x2,x3] •

    3 weights [w1,w2,w3] • Element-wise multiply and sum • Apply activation function f • Often add a bias too (weight of 1) – not shown
  7. What is an Activation Function? 25 Sigmoid Tanh ReLU Nonlinearities

    … “squashing functions” … transform neuron’s output NB: sigmoid output in [0,1]
  8. What is a (Deep) Neural Network? 26 Inputs outputs hidden

    layer 1 hidden layer 2 hidden layer 3 Outputs of one layer are inputs into the next layer
  9. How does a neural network learn? 27 • We need

    labelled examples “training data” • We initialize network weights randomly and initially get random predictions • For each labelled training data point, we calculate the error between the network’s predictions and the ground-truth labels • Use ‘backpropagation’ (chain rule), to update the network parameters (weights + convolutional filters ) in the opposite direction to the error
  10. How does a neural network learn? 28 New weight =

    Old weight Learning rate - Gradient of weight with respect to Error ( ) x “How much error increases when we increase this weight”
  11. What is a Neural Network? For much more detail, see:

    1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html 2. Anrej Karpathy’s CS231n Notes http://neuralnetworksanddeeplearning.com/chap1.html 31
  12. What is a Convolutional Neural Network? 33 “like a ordinary

    neural network but with special types of layers that work well on images” (math works on numbers) • Pixel = 3 colour channels (R, G, B) • Pixel intensity ∈[0,255] • Image has width w and height h • Therefore image is w x h x 3 numbers
  13. 34 This is VGGNet – don’t panic, we’ll break it

    down piece by piece Example Architecture
  14. 35 This is VGGNet – don’t panic, we’ll break it

    down piece by piece Example Architecture
  15. How does a neural network learn? 36 • MNIST stop

    and think how remarkable it is that we can recognise all of these MNISt as a 3 (change number) • Different pixels!
  16. New Layer Type: ConvolutionalLayer 39 • 2-d weighted average when

    multiply kernel over pixel patches • We slide the kernel over all pixels of the image (handle borders) • Kernel starts off with “random” values and network updates (learns) the kernel values (using backpropagation) to try minimize loss • Kernels shared across the whole image (parameter sharing)
  17. New Layer Type: Max Pooling • Reduces dimensionality from one

    layer to next • …by replacing NxN sub-area with max value • Makes network “look” at larger areas of the image at a time • e.g. Instead of identifying fur, identify cat • Reduces overfittingsince losing information helps the network generalize 49 http://cs231n.github.io/convolutional-networks/
  18. Stack Conv + Pooling Layers and Go Deep 51 Convolution

    + max pooling + fully connected + softmax
  19. 52 Stack Conv + Pooling Layers and Go Deep Convolution

    + max pooling + fully connected + softmax
  20. 53 Stack These Layers and Go Deep Convolution + max

    pooling + fully connected + softmax
  21. 54 Stack These Layers and Go Deep Convolution + max

    pooling + fully connected + softmax
  22. 55 Flatten the Final “Bottleneck” layer Convolution + max pooling

    + fully connected + softmax Flatten the final 7 x 7 x 512 max pooling layer Add fully-connecteddense layer on top
  23. Softmax Convert scores ∈ ℝ to probabilities ∈ [0,1] Final

    output prediction = highest probability class 57
  24. ImageNet Top 5 Error Rate 63 Traditional Image Processing Methods

    AlexNet 8 Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  25. Using a Pre-Trained ImageNet-Winning CNN 65 • We’ve been looking

    at “VGGNet” • Oxford Visual Geometry Group (VGG) • ImageNet 2014 Runner-up • Network is 16 layers (deep!) • Easy to fine-tune https://blog.keras.io/building-powerful-image-classification-models-using- very-little-data.html
  26. 68

  27. Fine-tuning A CNN To Solve A New Problem • Cut

    off last layer of pre-trained Imagenet winning CNN • Keep learned network (convolutions) but replace final layer • Can learn to predict new (completely different) classes • Fine-tuning is re-training new final layer - learn for new task 70
  28. Fine-tuning A CNN To Solve A New Problem • Fix

    weights in convolutional layers (set trainable=False) • Remove final dense layer that predicts 1000 ImageNet classes • Replace with new dense layer to predict 9 categories 74 88% accuracy in under 2 minutes for classifying products into categories Fine-tuning is awesome! Insert obligatory brain analogy
  29. Visual Similarity 76 • Chop off last 2 VGG layers

    • Use dense layer with 4096 activations • Compute nearest neighbours in the space of these activations https://memeburn.com/2017/06/spree-image-search/
  30. 79 Final Convolutional Layer = Semantic Vector • The final

    convolutional layer encodes everything the network needs to make predictions • The dense layer added on top and the softmax layer both have lower dimensionality
  31. Use a Better Architecture (or all of them!) 81 “Ensambles

    win” learn a weighted average of many models’ predictions
  32. Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR

    2015 Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016 Semantic Segmentation
  33. king + woman – man ≈ queen 96 Frome et

    al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’, Advances in Neural Information Processing Systems, pp. 2121–2129 CNN + Word2Vec = AWESOME
  34. DeViSE: A Deep Visual-SemanticEmbedding Model XXX 98 After: Encode labels

    as word2vec vectors (FROM A SEPARATE MODEL) Can look these up for all the nouns in ImageNet 300-d word2vec vectors
  35. Estimating Accident Repair Cost from Photos TODO 102 Prototype for

    large SA insurer Detect car make & model from registration disk Predict repair cost using learnedmodel
  36. Image & Video Moderation TODO 103 Large international gay dating

    app with tens of millions of users uploading hundreds-of-thousands of photos per day