Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

PyConZA 2019 Keynote - Deep Neural Networks for Video Applications

Slides from my PyConZA 2019 Keynote on "Deep Neural Networks for Video Applications"

Don't be afraid of A.I. - just git clone the relevant function (deep learning model) and start building things! I also do consulting if you get stuck or need help ;)

"Most CCTV video cameras exist as a sort of time machine for insurance purposes. Deep neural networks make it easy to convert video into actionable data which can be used to trigger real-time anomaly alerts and optimize complex business processes. In addition to commercial applications, deep learning can be used to analyze large amounts of video recorded from the point of view of animals to study complex behavior patterns impossible to otherwise analyze. This talk will present some theory of deep neural networks for video applications as well as academic research and several applied real-world industrial examples, with code examples in python."

#python #deeplearning #machinelearning #deeplearningforvideo #convolutionalneuralnetworks #recurrentneuralnetworks #centroidtracking #objectdetection #deepfakes #poseestimation #videomachinelearning #numberboost

Alex Conway

October 10, 2019
Tweet

More Decks by Alex Conway

Other Decks in Programming

Transcript

  1. DEEP NEURAL NETWORKS Alex Conway alex @ numberboost.com PYCONZA Keynote

    2019 Neither confidential nor proprietary - please distribute ;) for Video Applications
  2. 2016 MultiChoice Innovation Competition 1st Prize Winners 2017 Mercedes-Benz Innovation

    Competition 1st Prize Winners 2018 Lloyd’s Register Innovation Competition 1st Prize Winners 2019 NTT & Dimension Data Innovation Competition 1st Prize Winners
  3. ORIGINAL FILM Rear Window (1954) PIX2PIX MODEL OUTPUT Fully Automated

    RE-MASTERED BY HAND Painstakingly https://hackernoon.com/remastering-classic-films-in- tensorflow-with-pix2pix-f4d551fa0503
  4. NEURAL NETWORKS Set of connected Neurons with randomly initialized weights

    and non-linear activation functions connected in a Network that are optimized (learned) using training data to minimize prediction error
  5. Inputs outputs hidden layer 1 hidden layer 2 hidden layer

    3 Note: Outputs of one layer are inputs into the next layer This (non-convolutional) architecture is called a “multi-layered perceptron” (DEEP) NEURAL NETWORKS
  6. HOW DOES A NEURAL NETWORK LEARN? New weight = Old

    weight Learning rate - ( ) x “How much error increases when we increase this weight”
  7. 1 1, 3, 3, 7, … [[1, 2, 3 ]

    [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …] [[1, 2, 3 ] [3, 2, 1] [3, 4, 5] [7, 8, 9] …]
  8. image tensor 500 x 500 x 3 = 750’000 60

    second video at 10 FPS tensor 500 x 500 x 3 x 10 x 60 = 450’000’000
  9. 79

  10. 80 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  11. 81 Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and

    understanding convolutional networks. In European conference on computer vision (pp. 818-833).
  12. IMAGENET TOP-5 ERROR RATE Traditional Image Processing Methods AlexNet 8

    Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  13. 97

  14. Fine-tuning A CNN To Solve A New Problem 96.3% accuracy

    in under 2 minutes for classifying products into categories (WITH ONLY 3467 TRAINING IMAGES!!1!)
  15. CNN … P(A) = 0.005 P(B) = 0.002 P(C) =

    0.98 P(9) = 0.001 P(0) = 0.03
  16. HOW DOES IT WORK? Person Tracking Vanish in Door? Person

    Detector Door Detector Passenger Count Video Frame Stream
  17. ID #1 OBJECT TRACKING ID #2 https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ Start by assigning

    each object in first frame an ID Centroid shown as dot in center of each object
  18. OBJECT TRACKING For each object in frame t+1, compute the

    Euclidean distance between its centroid and the centroid of every object in frame t
  19. OBJECT TRACKING Assign object in frame t+1 the ID of

    nearest object from frame t provided distance less than threshold distance ID #1 ID #1 ID #2 ID #2
  20. ID #1 ID #2 ID #1 ID #2 ID #3

    OBJECT TRACKING If no object from frame t-1 within threshold distance then assign new ID
  21. 145

  22. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    1: CNN embedder compresses faces & landmarks to vector
  23. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    2: Generator takes landmarks and synthesizes photo
  24. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Network

    3: Discriminator learns to tell apart real and synthesized photos
  25. Deep Learning Indaba http://www.deeplearningindaba.com Jeremy Howard & Rachel Thomas http://course.fast.ai

    Andrej Karpathy’s Class on Computer Vision http://cs231n.github.io Richard Socher’s Class on NLP (great RNN resource) http://web.stanford.edu/class/cs224n/ Keras docs https://keras.io/ GREAT FREE RESOURCES