Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning for Image

Deep Learning for Image

A practioners' perspective on doing Deep Learning for Image

Amit Kapoor

March 10, 2018
Tweet

More Decks by Amit Kapoor

Other Decks in Programming

Transcript

  1. Deep Learning
    for Image
    A practitioner’s perspective
    Amit Kapoor
    amitkaps.com
    Bargava Subramanian
    bargava.com

    View Slide

  2. Practitioners
    Amit Bargava

    View Slide

  3. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  4. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  5. Classical Programming Paradigm
    Input → ? → Output

    View Slide

  6. Classical Programming Paradigm
    Input → ? → Output
    user types
    the text
    4
    Update
    database
    4

    View Slide

  7. Task: Write the function
    Input → f(x) → Output
    Write the
    function
    user types
    the text
    4
    Update
    database
    4

    View Slide

  8. Challenge: Robust Functions
    Input → f(x) → Output
    Write the
    function
    user types
    the text
    4.
    Update
    database
    4
    Challenge
    Test to ensure it is
    robust for all possible
    inputs

    View Slide

  9. Learning Paradigm
    Input → ? → Output
    Learn the
    function
    user writes
    the text
    Update
    database
    4

    View Slide

  10. Task: Create Features & Learn
    Input → Feature → g(x) →
    Output
    Create
    features
    user types
    the text
    Update
    database
    4
    Learn the
    Function

    View Slide

  11. Input → Feature → g(x) →
    Output
    Challenge: Hand-crafted features
    Create
    features
    user types
    the text
    Update
    database
    4
    Learn the
    Function
    Challenge
    How do I hand-craft the
    right set of features
    to learn the function

    View Slide

  12. Input → f(x) → Output
    Input → Features → g(x) → Output
    learn
    Two Paradigms
    Create
    Create
    Learning Paradigm
    Programming Paradigm

    View Slide

  13. - Structure of Data (tabular, text, image, video, sound)
    - Amount of Data (none, small, medium, large)
    - Knowledge of Domain (limited, expert)
    When to use which paradigm?

    View Slide

  14. Input → Features → g(x) → Output
    learn
    Learning Paradigm
    Create
    Traditional Machine Learning

    View Slide

  15. Input → Features → g(x) → Output
    Input → Features → h(x) → Output
    learn
    Deep Learning
    Create
    Deep Learning
    Traditional Machine Learning
    learn

    View Slide

  16. What is deep about it?
    user types
    the text
    Update
    database
    4
    Layer 1 Layer 2 Layer 3 Layer 4
    Learning Higher Order Represenastions

    View Slide

  17. What is deep about it?
    Source: Deep Learning by Francois Chollet

    View Slide

  18. - Access to more Data
    - Faster Compute (using GPUs)
    - Clever Algorithmic choices
    Why now?

    View Slide

  19. Open Discussion on use Cases
    - Tabular
    - Text
    - Image
    - Video
    - Speech

    View Slide

  20. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  21. Image: Logo Detection
    Industry Ad Tech
    Objective User engagement
    Outcome Targeted ads on digital media

    View Slide

  22. Image: Traffic Sign Detection
    Industry Self-driving cars
    Objective Traffic Sign Adherence
    Outcome Traffic Sign in native language

    View Slide

  23. Architecture Design
    Spatial structure
    Weights/Overfitting

    View Slide

  24. Model: Convolutional Neural Network
    Key ideas for image
    1. Local Receptive Fields
    2. Shared Weights
    3. Sub-sampling

    View Slide

  25. Local Receptive Fields: Convolution
    Input Image Conv Kernel Output

    View Slide

  26. Shared Weights: localized feature maps
    - One feature map detects a
    single kind of localized feature
    - Use several feature maps

    View Slide

  27. Sub-sampling: Max Pooling

    View Slide

  28. CNN: Architecture

    View Slide

  29. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  30. Data: Input Structure
    - Varied input sizes
    - Color images
    - Around 20k images

    View Slide

  31. Input: Pre-processing
    - Zero-centered
    X = X - np.mean(X, axis = 0)
    - Normalization
    X = X / np.std(X, axis = 0)

    View Slide

  32. In the wild : CNN from scratch (1/2)
    - Define architecture
    - Smart weight initialization (e.g. Xavier)

    View Slide

  33. In the wild : CNN from scratch (2/2)
    Could we do better?

    View Slide

  34. First model : Transfer Learning
    Pre-trained model
    - Model built on a large dataset (eg: ImageNet)
    - Most libraries have model zoo - architecture with
    final trained weights

    View Slide

  35. Pre-trained model : vgg16
    First model to surpass human-level
    performance on ImageNet

    View Slide

  36. Transfer Learning: Practicalities
    Less Data More Data
    Same Domain Retrain last classifier
    layer
    Fine tune last few
    layers
    Different Domain TROUBLE !! Fine tune a number of
    layers

    View Slide

  37. Less Data More Data
    Same Domain Retrain last classifier
    layer
    Fine tune last few
    layers
    Different Domain TROUBLE !! Fine tune a number of
    layers
    Pre-trained models: Initial results
    We started here! Using pre-trained models -
    achieved 88% accuracy. < 10 min train time

    View Slide

  38. Client needed 95% accuracy
    Needed more data !

    View Slide

  39. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  40. More data: Approaches
    - Augmentation
    - Generation

    View Slide

  41. Data: Augmentation
    - Horizontal/Vertical Flips
    - Scale
    - Random cropping
    - Jitter
    Any combination(s) of:
    - Translation
    - Rotation
    - Stretching
    - Shearing
    - Lens distortion

    View Slide

  42. Generation: Why?
    - Need images in different conditions e.g.
    snow, rain, fog
    - Models and compute better than
    manually coding many possibilities

    View Slide

  43. Data: Generation
    - Neural Style Transfer
    - Generative Adversarial Network

    View Slide

  44. Generation: Neural Style Transfer
    Content of an image fused with style of another image
    *This is illustrative. Not real output from the model(s)

    View Slide

  45. Generation: GAN

    View Slide

  46. - Training takes a lot of time
    - More data
    - Complex model
    Training: Challenges

    View Slide

  47. - Data Parallelism
    - Model Parallelism
    Training: Parallelism

    View Slide

  48. Training: Data parallelization
    http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/
    Need to synchronize
    gradients during
    backward pass
    MXNet uses data parallelism by
    default

    View Slide

  49. Training: Model parallelization
    Need to synchronize
    for both forward
    pass and backward
    pass
    http://timdettmers.com/2014/10/09/deep-learning-data-parallelism/

    View Slide

  50. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  51. Code : Tools
    - Hardware
    - Software

    View Slide

  52. Hardware: GPU (no brainer
    )
    - Single GPU?
    - Cluster?
    - Cloud?
    - Build your own?
    It depends on the problem(s)

    View Slide

  53. Software: Computational Graph
    Static
    Model
    Architecture
    Defined
    Computational
    Graph
    compiled
    Model trained
    Dynamic
    Model
    Architecture
    Defined
    Computational
    Graph created
    for every run

    View Slide

  54. Software: Tensorflow Vs PyTorch
    Tensorflow: good for productionizing
    PyTorch: good for rapid prototyping of ideas
    Some pointers on making the choice:
    - Tensorflow does have eager execution and fold - but PyTorch is
    more Pythonic and quite popular with researchers
    - Horovod is quite good for distributed training on tensorflow
    - MXNet has distributed training at its core - but no widespread
    adoption yet

    View Slide

  55. Deploy: Production
    - Cloud (Rest API - Tensorflow Serving, Flask)
    - Edge (CoreML, Tensorflow Lite)
    - Browser (deeplearn.js / mxnet.js / keras.js)

    View Slide

  56. Deploy: Cloud vs Edge vs Browser
    - Easier to update on cloud
    - Faster prediction on edge
    - Energy consumption !
    - Model size is HUGE!
    - Pruning
    - Quantization (typical: 8 bit)
    - SqueezeNet

    View Slide

  57. Outline for today...
    1. Why deep learning now?
    2. How to adopt a practical approach?
    a. Learning
    b. Data
    c. Tools & Deploy
    3. Where do you go from here?

    View Slide

  58. Where do you go from here?
    - Learn deep learning: resource link
    - Practice, Practice, Practice!
    - Take iterative approach

    View Slide

  59. Deep Learning
    A practitioner’s perspective
    Amit Kapoor
    amitkaps.com
    Bargava Subramanian
    bargava.com

    View Slide