Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python & Caffe: Getting Started with Deep Learning

Saurabh Kumar
December 11, 2016

Python & Caffe: Getting Started with Deep Learning

Saurabh Kumar

December 11, 2016
Tweet

More Decks by Saurabh Kumar

Other Decks in Research

Transcript

  1. 2 Machine Learning Applications • Cognition: • Face Recognition (Facebook)

    • Image Classification • Speech Recognition • Anomaly Detection • Genetics • Weather Forecasting • Spam Detection • Ad placement on web pages Teaching machines to do a task by observing how its done rather than being programmed for it! Courtesy coursera.org
  2. 3 Types of Machine Learning Supervised • Learning a desired

    behavior with labeled data. • Make sense of new data based on prior data. • Eg. Regression and Classification Unsupervised • Making inferences without any labeled data. • Discover unknown or hidden patterns. • Eg. Clustering and Dimensionality Reduction Reinforcement • Act in an environment to maximize reward. • Build autonomous agents that learn. • Eg. Recommendation Systems, Game Playing and Robot Navigation.
  3. 4 Deep Learning More Advanced tasks • Self driving cars.

    • Machine Translation • Image Caption Generation • Sentiment Analysis • Text Generation • Already better than humans in: • Image Recognition • Speech Recognition • Board Games Courtesy google.com
  4. 5 What will we talk about today? • Perceptron •

    Artificial Neural Networks • Deep Neural Networks • Caffe • A simple Image recognition Deep Net with Caffe • 3D shape recognition with cascaded Deep Nets
  5. 6 Perceptron : The building block • Built at Cornell

    in 1960 • Inspired from the architecture of a neuron • Multiplies each of its inputs with a set of weights and sums these products. • This final sum is then passed through an activation function. Courtesy cs.utexas.edu
  6. 8 Artificial Neural Networks • Large number of perceptron interconnected

    with each other • Inspired from the architecture of mammalian brain • The structure is organized in the form of layers • Has an input layer, an output layer and a few hidden layers Courtesy codeproject.com
  7. 9 Deep Neural Networks • Got popular due to availability

    of increasing computing power • Large number of hidden layers. eg. ResNet has ~150 Layers • Popular architectures: • Convolutional Neural Net (CNN) • Recurrent Neural Net (RNN) • Fully Connected Neural Net • Autoencoders • Generative Adversarial Net (GAN) Courtesy quora.com
  8. 10 Convolutional Neural Net ConvNets are mainly used for Image

    recognition/classification. No need for difficult feature engineering. Has pushed Image recognition accuracy to ~92%. Main parts of a CNN: • Convolutional Layer • Fully Connected Layer • Pooling Layer • ReLu Layer Courtesy wikipedia.org
  9. 12 • Fastest amongst all the available alternatives • Can

    process over 60M images per day with a single NVIDIA K40 GPU • That is 1 ms/image for inference and 4 ms/image for learning. • Using CPU/GPU is as easy as switching a flag! • Simple JSON style definition of Layers • Pretrained models are available for use • Completely Open Source • Developed by Berkeley Vision Group
  10. 13 Working with Caffe • Layers defined as prototxt files.

    • Just have to write two files: • NetArchitecture.prototxt • Solver.prototxt • Input data can be in the form of raw images or from database. • Bulk Image transfromation inbuilt. • Can generate visualization of our net. • Available Layers: Input, Convolutional, Fully Connected, ReLu, Pooling, Softmax, Accuracy, LRN.
  11. 14 Writing the prototxt files Net Architecture • Provide location

    of input data • Provide a mean image file • Design individual layers • Provide any image transformations if necessary Solver • Location of netArchitecture • Learning Parameters • Preferences • MaxIterations • Saving the intermediate models • CPU/GPU flag
  12. 15 Simple Image Recognition Convolutional Neural Net • One of

    the available examples in Caffe installation • Trained model and mean image available • BVLC Reference caffeNet: AlexNet trained on ILSVRC 2012 • Uses 227 x 227 image data
  13. 16 3D Object recognition using Multi-View Convolutional Neural Networks •

    Princeton ModelNet10 dataset was used. • 12 views rendered of each of the mesh objects • First CNN extracts the feature descriptors • Second CNN uses these and gives out the class labels • Accuracy of 88.1% attained. Su, Hang, et al. "Multi-view convolutional neural networks for 3d shape recognition." Proceedings of the IEEE International Conference on Computer Vision. 2015.
  14. 17 This talk is based on the Course Project done

    in collaboration with Nikunj Patel, Abhinav Kumar and Chandra Mohan Sharma for the course CS725:Foundations of Machine Learning at the Computer Science Department, IIT Bombay in Spring 2016.