Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GDG DevFest 2019 - "Deep Neural Networks for Video Applications"

Alex Conway
November 30, 2019

GDG DevFest 2019 - "Deep Neural Networks for Video Applications"

Deep neural networks can be used to convert real-time video into actionable data which can be used to trigger real-time anomaly alerts and optimize complex business processes.

In addition to commercial applications, deep learning can be used to analyze large amounts of video recorded from the point of view of animals to study complex behavior patterns otherwise impossible to analyze.

This talk will give an intuition for how deep neural networks work and anyone can get started using them for video applications. Several real-world examples will be presented with code examples in python and detail on how to train and deploy models using Google Cloud Platform

Feel free to ask me questions on Twitter: @alxcnwy

#numberboost #deeplearning #machinelearning #deeplearningforvideo #convolutionalneuralnetworks #recurrentneuralnetworks #centroidtracking #objectdetection #deepfakes #poseestimation #videomachinelearning #numberboost

Talk details here: https://devfest.co.za/schedule/2019-11-30?sessionId=412

Alex Conway

November 30, 2019
Tweet

More Decks by Alex Conway

Other Decks in Education

Transcript

  1. DEEP NEURAL NETWORKS Alex Conway alex @ numberboost.com Google Developer

    Group Cape Town #DevFest 2019 Neither confidential nor proprietary - please distribute ;) for Video Applications
  2. Note about these slides The original version of this presentation

    is over 500 MB (lots of videos) which is too big to upload but there are links to most of the videos in this compressed PDF version * *you might have to download the PDF to click on the links
  3. 2016 MultiChoice Innovation Competition 1st Prize Winners 2017 Mercedes-Benz Innovation

    Competition 1st Prize Winners 2018 Lloyd’s Register Innovation Competition 1st Prize Winners 2019 NTT & Dimension Data Innovation Competition 1st Prize Winners
  4. . ⋯ . ⋮ ⋱ ⋮ . ⋯ . .

    . . … . 13.37 . ⋯ . ⋮ ⋱ ⋮ . ⋯ . . ⋯ . ⋮ ⋱ ⋮ . ⋯ .
  5. video tensor (60 second clip @ 24fps) 500 x 500

    x 3 x 24 x 60 = 1’080’000’000 4-dimensions: width, height, colour, time
  6. “How much error increases when we increase this weight” HOW

    DOES A NEURAL NETWORK LEARN? New weight = Old weight Learning rate - Gradient of Error with respect to Weight ( ) x
  7. 82

  8. 83 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  9. 84 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  10. IMAGENET TOP-5 ERROR RATE Traditional Image Processing Methods AlexNet 8

    Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  11. = f ) CNN ( class probabilities . “ ”

    . “ ” . “ ” … . “” 1000 rows
  12. = f ) ( . “ ” . “ ”

    . “ ” … . “”
  13. 102

  14. = f ) CNN ( feature vector . . .

    … . “encoding”
  15. Fine-tuning A CNN To Solve A New Problem 96.3% accuracy

    in under 2 minutes for classifying products into categories (WITH ONLY 3467 TRAINING IMAGES!!1!)
  16. List of bounding boxes + classes = f ) ,

    , , , , , , , … , , , , Object Detection Model (
  17. CNN … P(A) = 0.005 P(B) = 0.002 P(C) =

    0.98 P(9) = 0.001 P(0) = 0.03
  18. ID #1 OBJECT TRACKING ID #2 https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ Start by assigning

    each object in first frame an ID Centroid shown as dot in center of each object
  19. OBJECT TRACKING For each object in frame t+1, compute the

    Euclidean distance between its centroid and the centroid of every object in frame t
  20. OBJECT TRACKING Assign object in frame t+1 the ID of

    nearest object from frame t provided distance less than threshold distance ID #1 ID #1 ID #2 ID #2
  21. ID #1 ID #2 ID #1 ID #2 ID #3

    OBJECT TRACKING If no object from frame t-1 within threshold distance then assign new ID
  22. CNN CNN CNN Rugby Rugby CNN CNN CNN Rugby Rugby

    Rugby Rugby RNN RNN RNN RNN RNN RNN
  23. feature vector = f ) . . . … .

    . . . … . . . . … . . . . … . , , , … RNN ( CNN frame encodings clip / frame prediction
  24. 166

  25. Frame CNN model test accuracy = 71% Video CNN +

    RNN model test accuracy = 93%
  26. Frame CNN model test accuracy = 51% Video CNN +

    RNN model test accuracy = 87%
  27. Network 1: CNN embedder compresses faces & landmarks to vector

    Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
  28. Network 3: Discriminator learns to tell apart real and synthesized

    photos Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
  29. Deep Learning Indaba http://www.deeplearningindaba.com Jeremy Howard & Rachel Thomas http://course.fast.ai

    Andrej Karpathy’s Class on Computer Vision http://cs231n.github.io Richard Socher’s Class on NLP (great RNN resource) http://web.stanford.edu/class/cs224n/ Keras docs https://keras.io/ GREAT FREE RESOURCES