Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
Challenge: Robust Functions Input → f(x) → Output Write the function user types the text 4. Update database 4 Challenge Test to ensure it is robust for all possible inputs
Input → Feature → g(x) → Output Challenge: Hand-crafted features Create features user types the text Update database 4 Learn the Function Challenge How do I hand-craft the right set of features to learn the function
- Structure of Data (tabular, text, image, video, sound) - Amount of Data (none, small, medium, large) - Knowledge of Domain (limited, expert) When to use which paradigm?
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
First model : Transfer Learning Pre-trained model - Model built on a large dataset (eg: ImageNet) - Most libraries have model zoo - architecture with final trained weights
Transfer Learning: Practicalities Less Data More Data Same Domain Retrain last classifier layer Fine tune last few layers Different Domain TROUBLE !! Fine tune a number of layers
Less Data More Data Same Domain Retrain last classifier layer Fine tune last few layers Different Domain TROUBLE !! Fine tune a number of layers Pre-trained models: Initial results We started here! Using pre-trained models - achieved 88% accuracy. < 10 min train time
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
Training: Data parallelization http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/ Need to synchronize gradients during backward pass MXNet uses data parallelism by default
Training: Model parallelization Need to synchronize for both forward pass and backward pass http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?
Software: Computational Graph Static Model Architecture Defined Computational Graph compiled Model trained Dynamic Model Architecture Defined Computational Graph created for every run
Software: Tensorflow Vs PyTorch Tensorflow: good for productionizing PyTorch: good for rapid prototyping of ideas Some pointers on making the choice: - Tensorflow does have eager execution and fold - but PyTorch is more Pythonic and quite popular with researchers - Horovod is quite good for distributed training on tensorflow - MXNet has distributed training at its core - but no widespread adoption yet
Deploy: Cloud vs Edge vs Browser - Easier to update on cloud - Faster prediction on edge - Energy consumption ! - Model size is HUGE! - Pruning - Quantization (typical: 8 bit) - SqueezeNet
Outline for today... 1. Why deep learning now? 2. How to adopt a practical approach? a. Learning b. Data c. Tools & Deploy 3. Where do you go from here?