Deep Learning
for Image
A practitioner’s perspective
Amit Kapoor
amitkaps.com
Bargava Subramanian
bargava.com
Slide 2
Slide 2 text
Practitioners
Amit Bargava
Slide 3
Slide 3 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 4
Slide 4 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 5
Slide 5 text
Classical Programming Paradigm
Input → ? → Output
Slide 6
Slide 6 text
Classical Programming Paradigm
Input → ? → Output
user types
the text
4
Update
database
4
Slide 7
Slide 7 text
Task: Write the function
Input → f(x) → Output
Write the
function
user types
the text
4
Update
database
4
Slide 8
Slide 8 text
Challenge: Robust Functions
Input → f(x) → Output
Write the
function
user types
the text
4.
Update
database
4
Challenge
Test to ensure it is
robust for all possible
inputs
Slide 9
Slide 9 text
Learning Paradigm
Input → ? → Output
Learn the
function
user writes
the text
Update
database
4
Slide 10
Slide 10 text
Task: Create Features & Learn
Input → Feature → g(x) →
Output
Create
features
user types
the text
Update
database
4
Learn the
Function
Slide 11
Slide 11 text
Input → Feature → g(x) →
Output
Challenge: Hand-crafted features
Create
features
user types
the text
Update
database
4
Learn the
Function
Challenge
How do I hand-craft the
right set of features
to learn the function
- Structure of Data (tabular, text, image, video, sound)
- Amount of Data (none, small, medium, large)
- Knowledge of Domain (limited, expert)
When to use which paradigm?
Slide 14
Slide 14 text
Input → Features → g(x) → Output
learn
Learning Paradigm
Create
Traditional Machine Learning
Slide 15
Slide 15 text
Input → Features → g(x) → Output
Input → Features → h(x) → Output
learn
Deep Learning
Create
Deep Learning
Traditional Machine Learning
learn
Slide 16
Slide 16 text
What is deep about it?
user types
the text
Update
database
4
Layer 1 Layer 2 Layer 3 Layer 4
Learning Higher Order Represenastions
Slide 17
Slide 17 text
What is deep about it?
Source: Deep Learning by Francois Chollet
Slide 18
Slide 18 text
- Access to more Data
- Faster Compute (using GPUs)
- Clever Algorithmic choices
Why now?
Slide 19
Slide 19 text
Open Discussion on use Cases
- Tabular
- Text
- Image
- Video
- Speech
Slide 20
Slide 20 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 21
Slide 21 text
Image: Logo Detection
Industry Ad Tech
Objective User engagement
Outcome Targeted ads on digital media
Slide 22
Slide 22 text
Image: Traffic Sign Detection
Industry Self-driving cars
Objective Traffic Sign Adherence
Outcome Traffic Sign in native language
Model: Convolutional Neural Network
Key ideas for image
1. Local Receptive Fields
2. Shared Weights
3. Sub-sampling
Slide 25
Slide 25 text
Local Receptive Fields: Convolution
Input Image Conv Kernel Output
Slide 26
Slide 26 text
Shared Weights: localized feature maps
- One feature map detects a
single kind of localized feature
- Use several feature maps
Slide 27
Slide 27 text
Sub-sampling: Max Pooling
Slide 28
Slide 28 text
CNN: Architecture
Slide 29
Slide 29 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 30
Slide 30 text
Data: Input Structure
- Varied input sizes
- Color images
- Around 20k images
Slide 31
Slide 31 text
Input: Pre-processing
- Zero-centered
X = X - np.mean(X, axis = 0)
- Normalization
X = X / np.std(X, axis = 0)
Slide 32
Slide 32 text
In the wild : CNN from scratch (1/2)
- Define architecture
- Smart weight initialization (e.g. Xavier)
Slide 33
Slide 33 text
In the wild : CNN from scratch (2/2)
Could we do better?
Slide 34
Slide 34 text
First model : Transfer Learning
Pre-trained model
- Model built on a large dataset (eg: ImageNet)
- Most libraries have model zoo - architecture with
final trained weights
Slide 35
Slide 35 text
Pre-trained model : vgg16
First model to surpass human-level
performance on ImageNet
Slide 36
Slide 36 text
Transfer Learning: Practicalities
Less Data More Data
Same Domain Retrain last classifier
layer
Fine tune last few
layers
Different Domain TROUBLE !! Fine tune a number of
layers
Slide 37
Slide 37 text
Less Data More Data
Same Domain Retrain last classifier
layer
Fine tune last few
layers
Different Domain TROUBLE !! Fine tune a number of
layers
Pre-trained models: Initial results
We started here! Using pre-trained models -
achieved 88% accuracy. < 10 min train time
Slide 38
Slide 38 text
Client needed 95% accuracy
Needed more data !
Slide 39
Slide 39 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Generation: Why?
- Need images in different conditions e.g.
snow, rain, fog
- Models and compute better than
manually coding many possibilities
Slide 43
Slide 43 text
Data: Generation
- Neural Style Transfer
- Generative Adversarial Network
Slide 44
Slide 44 text
Generation: Neural Style Transfer
Content of an image fused with style of another image
*This is illustrative. Not real output from the model(s)
Slide 45
Slide 45 text
Generation: GAN
Slide 46
Slide 46 text
- Training takes a lot of time
- More data
- Complex model
Training: Challenges
Slide 47
Slide 47 text
- Data Parallelism
- Model Parallelism
Training: Parallelism
Slide 48
Slide 48 text
Training: Data parallelization
http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/
Need to synchronize
gradients during
backward pass
MXNet uses data parallelism by
default
Slide 49
Slide 49 text
Training: Model parallelization
Need to synchronize
for both forward
pass and backward
pass
http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/
Slide 50
Slide 50 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 51
Slide 51 text
Code : Tools
- Hardware
- Software
Slide 52
Slide 52 text
Hardware: GPU (no brainer
)
- Single GPU?
- Cluster?
- Cloud?
- Build your own?
It depends on the problem(s)
Slide 53
Slide 53 text
Software: Computational Graph
Static
Model
Architecture
Defined
Computational
Graph
compiled
Model trained
Dynamic
Model
Architecture
Defined
Computational
Graph created
for every run
Slide 54
Slide 54 text
Software: Tensorflow Vs PyTorch
Tensorflow: good for productionizing
PyTorch: good for rapid prototyping of ideas
Some pointers on making the choice:
- Tensorflow does have eager execution and fold - but PyTorch is
more Pythonic and quite popular with researchers
- Horovod is quite good for distributed training on tensorflow
- MXNet has distributed training at its core - but no widespread
adoption yet
Deploy: Cloud vs Edge vs Browser
- Easier to update on cloud
- Faster prediction on edge
- Energy consumption !
- Model size is HUGE!
- Pruning
- Quantization (typical: 8 bit)
- SqueezeNet
Slide 57
Slide 57 text
Outline for today...
1. Why deep learning now?
2. How to adopt a practical approach?
a. Learning
b. Data
c. Tools & Deploy
3. Where do you go from here?
Slide 58
Slide 58 text
Where do you go from here?
- Learn deep learning: resource link
- Practice, Practice, Practice!
- Take iterative approach
Slide 59
Slide 59 text
Deep Learning
A practitioner’s perspective
Amit Kapoor
amitkaps.com
Bargava Subramanian
bargava.com