• Image Classification • Speech Recognition • Anomaly Detection • Genetics • Weather Forecasting • Spam Detection • Ad placement on web pages Teaching machines to do a task by observing how its done rather than being programmed for it! Courtesy coursera.org
behavior with labeled data. • Make sense of new data based on prior data. • Eg. Regression and Classification Unsupervised • Making inferences without any labeled data. • Discover unknown or hidden patterns. • Eg. Clustering and Dimensionality Reduction Reinforcement • Act in an environment to maximize reward. • Build autonomous agents that learn. • Eg. Recommendation Systems, Game Playing and Robot Navigation.
Artificial Neural Networks • Deep Neural Networks • Caffe • A simple Image recognition Deep Net with Caffe • 3D shape recognition with cascaded Deep Nets
in 1960 • Inspired from the architecture of a neuron • Multiplies each of its inputs with a set of weights and sums these products. • This final sum is then passed through an activation function. Courtesy cs.utexas.edu
with each other • Inspired from the architecture of mammalian brain • The structure is organized in the form of layers • Has an input layer, an output layer and a few hidden layers Courtesy codeproject.com
of increasing computing power • Large number of hidden layers. eg. ResNet has ~150 Layers • Popular architectures: • Convolutional Neural Net (CNN) • Recurrent Neural Net (RNN) • Fully Connected Neural Net • Autoencoders • Generative Adversarial Net (GAN) Courtesy quora.com
recognition/classification. No need for difficult feature engineering. Has pushed Image recognition accuracy to ~92%. Main parts of a CNN: • Convolutional Layer • Fully Connected Layer • Pooling Layer • ReLu Layer Courtesy wikipedia.org
process over 60M images per day with a single NVIDIA K40 GPU • That is 1 ms/image for inference and 4 ms/image for learning. • Using CPU/GPU is as easy as switching a flag! • Simple JSON style definition of Layers • Pretrained models are available for use • Completely Open Source • Developed by Berkeley Vision Group
• Just have to write two files: • NetArchitecture.prototxt • Solver.prototxt • Input data can be in the form of raw images or from database. • Bulk Image transfromation inbuilt. • Can generate visualization of our net. • Available Layers: Input, Convolutional, Fully Connected, ReLu, Pooling, Softmax, Accuracy, LRN.
of input data • Provide a mean image file • Design individual layers • Provide any image transformations if necessary Solver • Location of netArchitecture • Learning Parameters • Preferences • MaxIterations • Saving the intermediate models • CPU/GPU flag
the available examples in Caffe installation • Trained model and mean image available • BVLC Reference caffeNet: AlexNet trained on ILSVRC 2012 • Uses 227 x 227 image data
Princeton ModelNet10 dataset was used. • 12 views rendered of each of the mesh objects • First CNN extracts the feature descriptors • Second CNN uses these and gives out the class labels • Accuracy of 88.1% attained. Su, Hang, et al. "Multi-view convolutional neural networks for 3d shape recognition." Proceedings of the IEEE International Conference on Computer Vision. 2015.
in collaboration with Nikunj Patel, Abhinav Kumar and Chandra Mohan Sharma for the course CS725:Foundations of Machine Learning at the Computer Science Department, IIT Bombay in Spring 2016.