Slide 1

Slide 1 text

1 Python & Caffe: Getting Started with Deep Learning Saurabh Kumar SciPy India 2016

Slide 2

Slide 2 text

2 Machine Learning Applications ● Cognition: ● Face Recognition (Facebook) ● Image Classification ● Speech Recognition ● Anomaly Detection ● Genetics ● Weather Forecasting ● Spam Detection ● Ad placement on web pages Teaching machines to do a task by observing how its done rather than being programmed for it! Courtesy coursera.org

Slide 3

Slide 3 text

3 Types of Machine Learning Supervised ● Learning a desired behavior with labeled data. ● Make sense of new data based on prior data. ● Eg. Regression and Classification Unsupervised ● Making inferences without any labeled data. ● Discover unknown or hidden patterns. ● Eg. Clustering and Dimensionality Reduction Reinforcement ● Act in an environment to maximize reward. ● Build autonomous agents that learn. ● Eg. Recommendation Systems, Game Playing and Robot Navigation.

Slide 4

Slide 4 text

4 Deep Learning More Advanced tasks ● Self driving cars. ● Machine Translation ● Image Caption Generation ● Sentiment Analysis ● Text Generation ● Already better than humans in: ● Image Recognition ● Speech Recognition ● Board Games Courtesy google.com

Slide 5

Slide 5 text

5 What will we talk about today? ● Perceptron ● Artificial Neural Networks ● Deep Neural Networks ● Caffe ● A simple Image recognition Deep Net with Caffe ● 3D shape recognition with cascaded Deep Nets

Slide 6

Slide 6 text

6 Perceptron : The building block ● Built at Cornell in 1960 ● Inspired from the architecture of a neuron ● Multiplies each of its inputs with a set of weights and sums these products. ● This final sum is then passed through an activation function. Courtesy cs.utexas.edu

Slide 7

Slide 7 text

7 Basic Logic Gates with Perceptron Courtesy inf.ed.ac.uk

Slide 8

Slide 8 text

8 Artificial Neural Networks ● Large number of perceptron interconnected with each other ● Inspired from the architecture of mammalian brain ● The structure is organized in the form of layers ● Has an input layer, an output layer and a few hidden layers Courtesy codeproject.com

Slide 9

Slide 9 text

9 Deep Neural Networks ● Got popular due to availability of increasing computing power ● Large number of hidden layers. eg. ResNet has ~150 Layers ● Popular architectures: ● Convolutional Neural Net (CNN) ● Recurrent Neural Net (RNN) ● Fully Connected Neural Net ● Autoencoders ● Generative Adversarial Net (GAN) Courtesy quora.com

Slide 10

Slide 10 text

10 Convolutional Neural Net ConvNets are mainly used for Image recognition/classification. No need for difficult feature engineering. Has pushed Image recognition accuracy to ~92%. Main parts of a CNN: ● Convolutional Layer ● Fully Connected Layer ● Pooling Layer ● ReLu Layer Courtesy wikipedia.org

Slide 11

Slide 11 text

11 Building Your own Deep Net!

Slide 12

Slide 12 text

12 ● Fastest amongst all the available alternatives ● Can process over 60M images per day with a single NVIDIA K40 GPU ● That is 1 ms/image for inference and 4 ms/image for learning. ● Using CPU/GPU is as easy as switching a flag! ● Simple JSON style definition of Layers ● Pretrained models are available for use ● Completely Open Source ● Developed by Berkeley Vision Group

Slide 13

Slide 13 text

13 Working with Caffe ● Layers defined as prototxt files. ● Just have to write two files: ● NetArchitecture.prototxt ● Solver.prototxt ● Input data can be in the form of raw images or from database. ● Bulk Image transfromation inbuilt. ● Can generate visualization of our net. ● Available Layers: Input, Convolutional, Fully Connected, ReLu, Pooling, Softmax, Accuracy, LRN.

Slide 14

Slide 14 text

14 Writing the prototxt files Net Architecture ● Provide location of input data ● Provide a mean image file ● Design individual layers ● Provide any image transformations if necessary Solver ● Location of netArchitecture ● Learning Parameters ● Preferences ● MaxIterations ● Saving the intermediate models ● CPU/GPU flag

Slide 15

Slide 15 text

15 Simple Image Recognition Convolutional Neural Net ● One of the available examples in Caffe installation ● Trained model and mean image available ● BVLC Reference caffeNet: AlexNet trained on ILSVRC 2012 ● Uses 227 x 227 image data

Slide 16

Slide 16 text

16 3D Object recognition using Multi-View Convolutional Neural Networks ● Princeton ModelNet10 dataset was used. ● 12 views rendered of each of the mesh objects ● First CNN extracts the feature descriptors ● Second CNN uses these and gives out the class labels ● Accuracy of 88.1% attained. Su, Hang, et al. "Multi-view convolutional neural networks for 3d shape recognition." Proceedings of the IEEE International Conference on Computer Vision. 2015.

Slide 17

Slide 17 text

17 This talk is based on the Course Project done in collaboration with Nikunj Patel, Abhinav Kumar and Chandra Mohan Sharma for the course CS725:Foundations of Machine Learning at the Computer Science Department, IIT Bombay in Spring 2016.

Slide 18

Slide 18 text

18 Thank You!