Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning for Computer Vision by Alex Conway

7b0645f018c0bddc8ce3900ccc3ba70c?s=47 Pycon ZA
October 06, 2017

Deep Learning for Computer Vision by Alex Conway

The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the international ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 before deep learning to just 2.25% in 2017 (human level error is around 5%).

In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.

The talk will give an overview of the cutting edge in the field and some of the core mathematical concepts behind the models. It will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…

7b0645f018c0bddc8ce3900ccc3ba70c?s=128

Pycon ZA

October 06, 2017
Tweet

Transcript

  1. Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor

    Confidential – Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy PyConZA’17
  2. Hands up!

  3. Check out the Deep Learning Indaba videos & practicals! http://www.deeplearningindaba.com/videos.html

    http://www.deeplearningindaba.com/practicals.html
  4. Deep Learning is Sexy (for a reason!) 4

  5. Image Classification 5 http://yann.lecun.com/exdb/mnist/ https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192

    seconds and 70 lines of code)
  6. Image Classification 6

  7. Image Classification 7 ImageNet Classification with Deep Convolutional Neural Networks,

    Krizhevsky et. Al. Advances in Neural Information Processing Systems 25 (NIPS2012)
  8. https://research.googleblog.com/2017/06/supercharge- your-computer-vision-models.html Object detection

  9. https://www.youtube.com/watch?v=VOC3huqHrss Object detection

  10. Object detection

  11. Image Captioning & Visual Attention XXX 11 https://einstein.ai/research/knowing-when-to-look-adaptive-attention- via-a-visual-sentinel-for-image-captioning

  12. Image Q&A 12 https://arxiv.org/pdf/1612.00837.pdf

  13. Video Q&A XXX 13 https://www.youtube.com/watch?v=UeheTiBJ0Io

  14. Pix2Pix https://affinelayer.com/pix2pix/ https://github.com/affinelayer/pix2pix-tensorflow 14

  15. Pix2Pix https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that- mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66 15

  16. 16 Original input Rear Window (1954) Pix2pix output Fully Automated

    Remastered Painstakingly by Hand https://hackernoon.com /remastering-classic- films-in-tensorflow-with- pix2pix-f4d551fa0503
  17. Style Transfer https://github.com/junyanz/CycleGAN 17

  18. Style Transfer SORCERY https://github.com/junyanz/CycleGAN 18

  19. Real Fake News https://www.youtube.com/watch?v=MVBe6_o4cMI 19

  20. Deep learning is Magic Deep learning is Magic Deep learning

    is EASY!
  21. 1. What is a neural network? 2. What is a

    convolutionalneural network? 3. How to use a convolutional neural network 4. More advanced Methods 5. Case studies & applications 21
  22. Big Shout Outs Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej

    Karpathy http://cs231n.github.io François Chollet (Keras lead dev) https://keras.io/ 22
  23. 1.What is a neural network?

  24. What is a neuron? 24 • 3 inputs [x1,x2,x3] •

    3 weights [w1,w2,w3] • Element-wise multiply and sum • Apply activation function f • Often add a bias too (weight of 1) – not shown
  25. What is an Activation Function? 25 Sigmoid Tanh ReLU Nonlinearities

    … “squashing functions” … transform neuron’s output NB: sigmoid output in [0,1]
  26. What is a (Deep) Neural Network? 26 Inputs outputs hidden

    layer 1 hidden layer 2 hidden layer 3 Outputs of one layer are inputs into the next layer
  27. How does a neural network learn? 27 • We need

    labelled examples “training data” • We initialize network weights randomly and initially get random predictions • For each labelled training data point, we calculate the error between the network’s predictions and the ground-truth labels • Use ‘backpropagation’ (chain rule), to update the network parameters (weights + convolutional filters ) in the opposite direction to the error
  28. How does a neural network learn? 28 New weight =

    Old weight Learning rate - Gradient of weight with respect to Error ( ) x “How much error increases when we increase this weight”
  29. Gradient Descent Interpretation 29 http://scs.ryerson.ca/~aharley/neural-networks/

  30. http://playground.tensorflow.org

  31. What is a Neural Network? For much more detail, see:

    1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html 2. Anrej Karpathy’s CS231n Notes http://neuralnetworksanddeeplearning.com/chap1.html 31
  32. 2. What is a convolutional neural network?

  33. What is a Convolutional Neural Network? 33 “like a ordinary

    neural network but with special types of layers that work well on images” (math works on numbers) • Pixel = 3 colour channels (R, G, B) • Pixel intensity ∈[0,255] • Image has width w and height h • Therefore image is w x h x 3 numbers
  34. 34 This is VGGNet – don’t panic, we’ll break it

    down piece by piece Example Architecture
  35. 35 This is VGGNet – don’t panic, we’ll break it

    down piece by piece Example Architecture
  36. How does a neural network learn? 36 • MNIST stop

    and think how remarkable it is that we can recognise all of these MNISt as a 3 (change number) • Different pixels!
  37. Convolutions 37 http://setosa.io/ev/image-kernels/

  38. Convolutions 38 http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

  39. New Layer Type: ConvolutionalLayer 39 • 2-d weighted average when

    multiply kernel over pixel patches • We slide the kernel over all pixels of the image (handle borders) • Kernel starts off with “random” values and network updates (learns) the kernel values (using backpropagation) to try minimize loss • Kernels shared across the whole image (parameter sharing)
  40. Many Kernels = Many “Activation Maps” = Volume 40 http://cs231n.github.io/convolutional-networks/

  41. New Layer Type: ConvolutionalLayer 41

  42. Convolutions 42 https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

  43. Convolutions 43

  44. Convolutions 44

  45. Convolutions 45

  46. Convolution Learn Hierarchical Features 46

  47. Great vid 47 https://www.youtube.com/watch?v=AgkfIQ4IGaM

  48. New Layer Type: Max Pooling 48

  49. New Layer Type: Max Pooling • Reduces dimensionality from one

    layer to next • …by replacing NxN sub-area with max value • Makes network “look” at larger areas of the image at a time • e.g. Instead of identifying fur, identify cat • Reduces overfittingsince losing information helps the network generalize 49 http://cs231n.github.io/convolutional-networks/
  50. New Layer Type: Max Pooling 50

  51. Stack Conv + Pooling Layers and Go Deep 51 Convolution

    + max pooling + fully connected + softmax
  52. 52 Stack Conv + Pooling Layers and Go Deep Convolution

    + max pooling + fully connected + softmax
  53. 53 Stack These Layers and Go Deep Convolution + max

    pooling + fully connected + softmax
  54. 54 Stack These Layers and Go Deep Convolution + max

    pooling + fully connected + softmax
  55. 55 Flatten the Final “Bottleneck” layer Convolution + max pooling

    + fully connected + softmax Flatten the final 7 x 7 x 512 max pooling layer Add fully-connecteddense layer on top
  56. 56 Bringing it all together Convolution + max pooling +

    fully connected + softmax
  57. Softmax Convert scores ∈ ℝ to probabilities ∈ [0,1] Final

    output prediction = highest probability class 57
  58. Bringing it all together 58 Convolution + max pooling +

    fully connected + softmax
  59. We need labelled training data!

  60. ImageNet 60 http://image-net.org/explore 1000 object categories 1.2 million training images

  61. ImageNet 61

  62. ImageNet 62

  63. ImageNet Top 5 Error Rate 63 Traditional Image Processing Methods

    AlexNet 8 Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  64. 3. How to use a convolutional neural network

  65. Using a Pre-Trained ImageNet-Winning CNN 65 • We’ve been looking

    at “VGGNet” • Oxford Visual Geometry Group (VGG) • ImageNet 2014 Runner-up • Network is 16 layers (deep!) • Easy to fine-tune https://blog.keras.io/building-powerful-image-classification-models-using- very-little-data.html
  66. Example: Classifying Product Images 66 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620 Classifying products into 9

    categories
  67. 67 https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html Start with Pre-Trained ImageNet Model Consumers vs Producers

    of Machine Learning
  68. 68

  69. “Transfer Learning” is a game changer

  70. Fine-tuning A CNN To Solve A New Problem • Cut

    off last layer of pre-trained Imagenet winning CNN • Keep learned network (convolutions) but replace final layer • Can learn to predict new (completely different) classes • Fine-tuning is re-training new final layer - learn for new task 70
  71. Fine-tuning A CNN To Solve A New Problem 71

  72. 72 Before Fine-Tuning

  73. 73 After Fine-Tuning

  74. Fine-tuning A CNN To Solve A New Problem • Fix

    weights in convolutional layers (set trainable=False) • Remove final dense layer that predicts 1000 ImageNet classes • Replace with new dense layer to predict 9 categories 74 88% accuracy in under 2 minutes for classifying products into categories Fine-tuning is awesome! Insert obligatory brain analogy
  75. None
  76. Visual Similarity 76 • Chop off last 2 VGG layers

    • Use dense layer with 4096 activations • Compute nearest neighbours in the space of these activations https://memeburn.com/2017/06/spree-image-search/
  77. 77 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

  78. 78 Input Image not seen by model Results Top 10

    most “visually similar”
  79. 79 Final Convolutional Layer = Semantic Vector • The final

    convolutional layer encodes everything the network needs to make predictions • The dense layer added on top and the softmax layer both have lower dimensionality
  80. 4. More Advanced Methods

  81. Use a Better Architecture (or all of them!) 81 “Ensambles

    win” learn a weighted average of many models’ predictions
  82. cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf There are MANY Computer Vision Tasks

  83. Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR

    2015 Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016 Semantic Segmentation
  84. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf Object detection

  85. Object detection

  86. https://www.youtube.com/watch?v=VOC3huqHrss Object detection

  87. http://blog.romanofoti.com/style_transfer/ Johnson et al. Perceptual losses for real-time style transfer

    and super-resolution. 2016 Style Transfer f ( ) = ,
  88. https://www.youtube.com/watch?v=LhF_56SxrGk

  89. Pixelated Original Output https://arstech nica.com/infor mation- technology/20 17/02/google- brain-super- resolution-

    zoom- enhance/
  90. This image is 3.8 kb Super-Resolution

  91. https://github.com/tdeboissiere/BGG16CAM-keras Visual Attention

  92. Image Captioning https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning Karpathy & Li. IEEE Transactions on Pattern

    Analysis and Machine Intelligence, 39(4), pp. 664–676
  93. Image Q&A XXX 93 http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

  94. Video Q&A XXX 94 https://www.youtube.com/watch?v=UeheTiBJ0Io

  95. Video Q&A XXX 95 https://www.youtube.com/watch?v=UeheTiBJ0Io

  96. king + woman – man ≈ queen 96 Frome et

    al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’, Advances in Neural Information Processing Systems, pp. 2121–2129 CNN + Word2Vec = AWESOME
  97. DeViSE: A Deep Visual-SemanticEmbedding Model XXX 97 Before: Encode labels

    as 1-hot vector
  98. DeViSE: A Deep Visual-SemanticEmbedding Model XXX 98 After: Encode labels

    as word2vec vectors (FROM A SEPARATE MODEL) Can look these up for all the nouns in ImageNet 300-d word2vec vectors
  99. DeViSE: A Deep Visual-SemanticEmbedding Model wv(fish)+ wv(net) 99 https://www.youtube.com/watch?v=uv0gmrXSXVg 2

    wv* = …get nearest neighbours to wv*
  100. None
  101. 5. Case Studies

  102. Estimating Accident Repair Cost from Photos TODO 102 Prototype for

    large SA insurer Detect car make & model from registration disk Predict repair cost using learnedmodel
  103. Image & Video Moderation TODO 103 Large international gay dating

    app with tens of millions of users uploading hundreds-of-thousands of photos per day
  104. Segmenting Medical Images TODO 104

  105. m Counting People TODO Countshoppers, segment on age & gender

    facial recognition loyalty is next
  106. Counting Cars TODO

  107. Detecting Potholes

  108. Extracting Data from Product Catalogues

  109. Real-time ATM Video Classification 109

  110. Optical Sorting TODO 110 https://www.youtube.com/watch?v=Xf7jaxwnyso

  111. We are hiring! alex @ numberboost.com

  112. GET IN TOUCH! Alex Conway alex @ numberboost.com @alxcnwy

  113. https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/