Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Neural Networks for Video Applications by Alex Conway

Pycon ZA
October 10, 2019

Deep Neural Networks for Video Applications by Alex Conway

Most CCTV video cameras exist as a sort of time machine for insurance purposes. Deep neural networks make it easy to convert video into actionable data which can be used to trigger real-time anomaly alerts and optimize complex business processes. In addition to commercial applications, deep learning can be used to analyze large amounts of video recorded from the point of view of animals to study complex behavior patterns impossible to otherwise analyze. This talk will present some theory of deep neural networks for video applications as well as academic research and several applied real-world industrial examples, with code examples in python.

Pycon ZA

October 10, 2019
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. DEEP NEURAL NETWORKS Alex Conway alex @ numberboost.com PYCONZA Keynote

    2019 Neither confidential nor proprietary - please distribute ;) for Video Applications
  2. 2016 MultiChoice Innovation Competition 1st Prize Winners 2017 Mercedes-Benz Innovation

    Competition 1st Prize Winners 2018 Lloyd’s Register Innovation Competition 1st Prize Winners 2019 NTT & Dimension Data Innovation Competition 1st Prize Winners
  3. HANDS UP!

  4. None
  5. https://www.youtube.com/watch?v=Gz0QZP2RKWA

  6. None
  7. None
  8. https://twitter.com/goodfellow_ian/status/1084973596236144640

  9. 9 https://twitter.com/quasimondo/status/1100016467213516801

  10. 10 https://www.youtube.com/watch?feature=youtu.be&v=r6zZPn-6dPY&app=desktop

  11. ORIGINAL FILM Rear Window (1954) PIX2PIX MODEL OUTPUT Fully Automated

    RE-MASTERED BY HAND Painstakingly https://hackernoon.com/remastering-classic-films-in- tensorflow-with-pix2pix-f4d551fa0503
  12. INPUT OUTPUT ORIGINAL https://arstechnica.com/information-technology/2017/02/google-brain-super-resolution-zoom-enhance/

  13. https://techcrunch.com/2016/06/20/twitter-is-buying-magic-pony-technology-which-uses-neural-networks- to-improve-images/

  14. https:/ /arxiv.org/abs/1508.06576 CONTENT IMAGE STYLE IMAGE STYLE TRANSFER OUTPUT +

    =
  15. https://github.com/junyanz/CycleGAN 15

  16. https://news.developer.nvidia.com/ai-can-transform-anyone-into-a-professional-dancer/

  17. https://github.com/JoYoungjoo/SC-FEGAN

  18. https://www.linkedin.com/feed/update/urn:li:activity:6498172448196820993

  19. https://motherboard.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn

  20. https://www.youtube.com/watch?v=MVBe6_o4cMI

  21. https://twitter.com/XHNews/status/1098173090448629760

  22. https://www.youtube.com/watch?v=aE1kA0Jy0Xg

  23. https://www.youtube.com/watch?v=xhp47v5OBXQ

  24. None
  25. https://www.reddit.com/r/Cyberpunk/comments/ddplms/hk_wearable_face_projector_to_avoid_face/

  26. https://twitter.com/x0rz/status/1104744170529439744

  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. f (video) = useful data

  41. f (video) = useful data

  42. None
  43. f (video) = clip label

  44. f (video) = frame label

  45. f (video) = object count

  46. f (video) = object activity

  47. f (video) = object poses

  48. f (video) = facial expressions

  49. f (video) = higher res video

  50. f (video) = video with new faces

  51. Neural Networks Crash Course

  52. NEURAL NETWORKS Set of connected Neurons with randomly initialized weights

    and non-linear activation functions connected in a Network that are optimized (learned) using training data to minimize prediction error
  53. http://playground.tensorflow.org http://playground.tensorflow.org

  54. WHAT IS A NEURON?

  55. LINEAR

  56. NON-LINEAR

  57. NON-LINEAR ACTIVATION FUNCTIONS Tanh Sigmoid ReLU

  58. Inputs outputs hidden layer 1 hidden layer 2 hidden layer

    3 Note: Outputs of one layer are inputs into the next layer This (non-convolutional) architecture is called a “multi-layered perceptron” (DEEP) NEURAL NETWORKS
  59. HOW DOES A NEURAL NETWORK LEARN? New weight = Old

    weight Learning rate - ( ) x “How much error increases when we increase this weight”
  60. GRADIENT DESCENT http://scs.ryerson.ca/~aharley/neural-networks/

  61. 1 1, 3, 3, 7, … [[1, 2, 3 ]

    [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …]
  62. None
  63. image tensor 500 x 500 x 3 = 750’000 60

    second video at 10 FPS tensor 500 x 500 x 3 x 10 x 60 = 450’000’000
  64. None
  65. Convolutional Neural Networks (CNNs)

  66. None
  67. INPUT 28 x 28 pixel grayscale images = 784 numbers

  68. 2 LAYER NEURAL NETWORK 0 1 2 3 4 5

    6 7 8 9
  69. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK

  70. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK

  71. https://www.youtube.com/watch?v=aircAruvnKk 3 LAYER NEURAL NETWORK

  72. https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192 seconds and 46 lines

    of code)
  73. None
  74. None
  75. 3 KEY CONVOLUTIONAL NETWORK ARCHITECTURE IDEAS: 1. Local receptive fields

    2. Shared weights 3. Subsampling
  76. 76 VGGNet

  77. http://setosa.io/ev/image-kernels

  78. 78 http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

  79. 79

  80. 80 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  81. 81 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  82. Convolutional Nets Learn Hierarchical Features 82

  83. SUBSAMPLING aka “POOLING” 83

  84. 84 VGGNet

  85. None
  86. we need labelled training data

  87. 14,197,122 images, 21841 synsets indexed ILSVRC: 1‘200‘000 images, 1000 categories

    ImageNet
  88. None
  89. 89 ImageNet

  90. 90 ImageNet

  91. IMAGENET TOP-5 ERROR RATE Traditional Image Processing Methods AlexNet 8

    Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  92. None
  93. https://arxiv.org/abs/1611.01578

  94. None
  95. 95 https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

  96. Example: Use CNN to Classify Product Images 96 https://github.com/alexcnwy/ DeepLearning4ComputerVision

  97. 97

  98. TRANSFER LEARNING

  99. 99 USING A CNN AS A FEATURE EXTRACTOR Feature Extractor

    (“ENCODER”) Classifier
  100. Extracting Features from an Image

  101. feature vector = f ( )

  102. Adding a New Classifier

  103. Fine-tuning A CNN To Solve A New Problem 96.3% accuracy

    in under 2 minutes for classifying products into categories (WITH ONLY 3467 TRAINING IMAGES!!1!)
  104. https://www.youtube.com/watch?v=X4Q6C915sUY

  105. https://www.pyimagesearch.com/2019/06/03/fine- tuning-with-keras-and-deep-learning/

  106. IMAGE & VIDEO MODERATION TODO 106

  107. Object Detection

  108. None
  109. None
  110. None
  111. https://www.youtube.com/watch?v=VOC3huqHrss

  112. 1.5 million object instances 80 object categories http://cocodataset.org

  113. https://github.com/tensorflow/models/blob/master/research /object_detection/g3doc/detection_model_zoo.md

  114. DEMO (HOLD THUMBS)

  115. https://github.com/tzutalin/labelImg CUSTOM OBJECT DETECTION

  116. None
  117. https://towardsdatascience.com/how-to-train-your-own-object- detector-with-tensorflows-object-detector-api-bec72ecfe1d9

  118. None
  119. None
  120. None
  121. None
  122. None
  123. CNN … P(A) = 0.005 P(B) = 0.002 P(C) =

    0.98 P(9) = 0.001 P(0) = 0.03
  124. None
  125. None
  126. None
  127. None
  128. None
  129. None
  130. None
  131. None
  132. None
  133. None
  134. https://www.reddit.com/r/southafrica/comments/asl4n5/when_a_l ittle_is_just_not_enough/

  135. None
  136. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ ID #2 ID #1 “CENTROID TRACKING”

  137. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING”

  138. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING” For each object with ID in frame

    t, compute distance to centroid of every object in frame t + 1 and assign same ID provided distance less than threshold, else assign new ID
  139. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ “CENTROID TRACKING”

  140. https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ ID #1 ID #2 “CENTROID TRACKING”

  141. https://www.youtube.com/watch?v=FfU22I-_dI4

  142. https://www.youtube.com/watch?v=NW-rXqCl7us

  143. Recurrent Neural Networks (RNNs)

  144. 144

  145. None
  146. SPATIO-TEMPORAL

  147. None
  148. SPORTS 1-M

  149. SPATIAL … THEN TEMPORAL

  150. 150 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  151. feature vector = f ( )

  152. None
  153. Frame model accuracy <<< Video model accuracy

  154. None
  155. None
  156. https://i.imgur.com/mGXdpdp.gifv

  157. None
  158. Frame-level Action Recognition (7 classes)

  159. Frame model accuracy <<< Video model accuracy

  160. None
  161. 161 https://github.com/alxcnwy/Deep-Neural-Networks- for-Video-Classification

  162. MORE (CRAZY) APPLICATIONS

  163. XXX 163 https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A

  164. XXX 164 https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A

  165. 165 https://www.youtube.com/watch?v=UeheTiBJ0Io VIDEO Q&A

  166. None
  167. https://github.com/wuhuikai/FaceSwap FACE SWAP

  168. None
  169. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models https://www.youtube.com/watch?v=p1b5aiTrGzY

  170. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models https://www.youtube.com/watch?v=p1b5aiTrGzY

  171. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    1: CNN embedder compresses faces & landmarks to vector
  172. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    2: Generator takes landmarks and synthesizes photo
  173. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    3: Discriminator learns to tell apart real and synthesized photos
  174. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM

  175. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM

  176. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM

  177. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM

  178. POSE ESTIMATION https://www.youtube.com/watch?v=pW6nZXeWlGM

  179. https://github.com/CMU-Perceptual-Computing-Lab/openpose

  180. https://www.affectiva.com/product/affectiva- automotive-ai-for-driver-monitoring-solutions/ DISTRACTED DRIVING DETECTION

  181. SELF-DRIVING CARS https://www.youtube.com/watch?v=nuMQ4LNMWu8

  182. https://arstechnica.com/cars/2019/08/elon-musk-says- driverless-cars-dont-need-lidar-experts-arent-so-sure/

  183. REMEMBER

  184. f (video) = useful data

  185. Don’t be scared to git clone functions and use deep

    learning!
  186. Deep Learning Indaba http://www.deeplearningindaba.com Jeremy Howard & Rachel Thomas http://course.fast.ai

    Andrej Karpathy’s Class on Computer Vision http://cs231n.github.io Richard Socher’s Class on NLP (great RNN resource) http://web.stanford.edu/class/cs224n/ Keras docs https://keras.io/ GREAT FREE RESOURCES
  187. THANK YOU! @alxcnwy alex @ numberboost.com