$30 off During Our Annual Pro Sale. View Details »

How to Patch Image Classifiers

How to Patch Image Classifiers

A workshop on the architecture and benefits of patched classification in the context of images of food. At Cookpad, we needed to automatically filter images of food versus... everything else. But we ran into some images that confused our binary classifier. Patched classification came to the rescue.

Presented on NIKKEI's AI Summit 2019

Leszek Rybicki

April 23, 2019
Tweet

More Decks by Leszek Rybicki

Other Decks in Research

Transcript

  1. tinyurl.com/aisum-cookpad
    How to Patch 

    Image Classifiers
    Leszek Rybicki AI/SUM 2019.04.23

    View Slide

  2. tinyurl.com/aisum-cookpad
    The Introduction
    Hello, my name is …

    View Slide

  3. tinyurl.com/aisum-cookpad
    Who am I?
    ϨγΣοΫɾϦϏπΩ

    Leszek Rybicki
    Originally from Torun, Poland
    2005 M.Sc. in Machine Learning,

    Nicolaus Copernicus University
    2010 RIKEN Brain Science Institute
    2016 Cookpad R&D

    View Slide

  4. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  5. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  6. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  7. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  8. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  9. tinyurl.com/aisum-cookpad
    Make Everyday Cooking Fun
    Established Oct 1st 1997
    web, iOS, Android, 

    print magazines, Cookpad TV 

    OiCy, mart, Komerco
    54 million monthly users in Japan

    over 5 million recipes worldwide
    71 countries, 26 languages, 

    11 offshore offices

    View Slide

  10. tinyurl.com/aisum-cookpad
    Cookpad R&D
    Researchers (NLP, CV), Engineers,
    Annotators and Project Managers
    Alexa skill in Japanese and Spanish
    Food image filtering
    Recipe similarity search
    Ingredient autocomplete

    View Slide

  11. tinyurl.com/aisum-cookpad
    Cookpad R&D
    Researchers (NLP, CV), Engineers,
    Annotators and Project Managers
    Alexa skill in Japanese and Spanish
    Food image filtering
    Recipe similarity search
    Ingredient autocomplete

    View Slide

  12. tinyurl.com/aisum-cookpad
    The Outline
    • The Concept
    • The Background
    • The Architecture
    • The Benefits
    • The Summary
    Photo by Monika Grabkowska on Unsplash

    View Slide

  13. tinyurl.com/aisum-cookpad
    The Concept
    What an AI can see in images

    View Slide

  14. tinyurl.com/aisum-cookpad
    This is an image of food.
    There is food in this image.
    Classification
    Photo by Cayla1 on Unsplash

    View Slide

  15. tinyurl.com/aisum-cookpad
    Object Detection
    There are two dishes.
    This is where they are.

    View Slide

  16. tinyurl.com/aisum-cookpad
    Segmentation
    Here are the exact pixels,

    which belong to dishes.

    View Slide

  17. tinyurl.com/aisum-cookpad
    Patched
    Classification
    These patches of the image
    are in varying degree 

    occupied by food dishes.

    View Slide

  18. tinyurl.com/aisum-cookpad
    The Background
    We needed to filter out images of food…

    View Slide

  19. tinyurl.com/aisum-cookpad
    FOOD

    View Slide

  20. tinyurl.com/aisum-cookpad
    NOT FOOD

    View Slide

  21. tinyurl.com/aisum-cookpad
    Binary Image
    Classifier
    We used an Inception v3
    DCNN model, pre-trained on
    ImageNet
    We replaced the top layers
    with our own:
    • a fully connected layer
    • a softmax layer that
    outputs a one-hot vector
    DCNN
    global pooling
    fully connected
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    2048 features
    food not

    View Slide

  22. tinyurl.com/aisum-cookpad
    Threshold
    The model outputs fractions,
    we need a yes or no answer.
    We found a threshold ϑ=.81
    which maximises the accuracy
    for our test dataset.
    •if the model’s output value is
    above the threshold, it’s food
    •otherwise, not food.
    global pooling
    fully connected
    one-hot class vector
    2048 features
    food not
    ϑ

    View Slide

  23. tinyurl.com/aisum-cookpad

    View Slide

  24. tinyurl.com/aisum-cookpad
    food / nonfood accuracy
    96%
    precision: 0.97 recall: 0.89

    View Slide

  25. tinyurl.com/aisum-cookpad

    View Slide

  26. tinyurl.com/aisum-cookpad
    4%
    96%

    View Slide

  27. tinyurl.com/aisum-cookpad

    View Slide

  28. tinyurl.com/aisum-cookpad
    EXPECTATION

    View Slide

  29. tinyurl.com/aisum-cookpad
    REALITY

    View Slide

  30. tinyurl.com/aisum-cookpad
    Problem 1
    Food that looks like Things
    Things that look like Food

    View Slide

  31. tinyurl.com/aisum-cookpad
    twitter: @teenybiscuit

    View Slide

  32. tinyurl.com/aisum-cookpad
    It happens quite a lot!
    Ωϟϥห❤ո౪άϧʔɺϛχΦϯห౰❤
    ˑͪΌʙ͖ˑ
    ΦϜ͢ͼˑϐΧνϡ΢ͷΩϟϥห
    Έ΍͖ͬͪΜ
    ৭ʑόϦΤὑ؆୯εϥΠενʔζόϥͷՖ
    mew⁂mam

    View Slide

  33. tinyurl.com/aisum-cookpad
    Problem 2
    Food and Things together

    View Slide

  34. tinyurl.com/aisum-cookpad
    On the Fence
    Food with other items
    Food with text
    Food, but small in the photo
    People with food
    Photo by Jason Leung on Unsplash

    View Slide

  35. tinyurl.com/aisum-cookpad
    Food is Social
    We cook and eat with our
    family and friends
    Food is at the centre of
    important events
    When we use a binary
    classifier for food photos,
    aren’t we losing something?
    Photo by Jonathan Borba on Unsplash

    View Slide

  36. tinyurl.com/aisum-cookpad
    The Real Problem
    with our binary classifier is…

    View Slide

  37. tinyurl.com/aisum-cookpad
    It is equally confused by these images.
    Photo by Petful on Flickr
    Photo by Charles on Unsplash

    View Slide

  38. tinyurl.com/aisum-cookpad
    Threshold
    is not enough
    Our model can’t distinguish
    between these two kinds of
    images using threshold alone.
    Either both get accepted, 

    or both are rejected.
    food not
    ϑ

    View Slide

  39. tinyurl.com/aisum-cookpad
    Rule of Thumb
    The image can be
    cropped to contain
    mostly food…


    and the cropped image
    is 30% or more of the
    original image surface.
    Photo by Kevin Wolf on Unsplash
    30%
    1
    2
    It’s an image of food if…

    View Slide

  40. tinyurl.com/aisum-cookpad
    Solution
    Patch the Classifier!

    View Slide

  41. tinyurl.com/aisum-cookpad
    food / nonfood accuracy
    96%
    precision: 0.97 recall: 0.89

    View Slide

  42. tinyurl.com/aisum-cookpad
    food / nonfood accuracy
    96%
    precision: 0.97 recall: 0.89

    View Slide

  43. tinyurl.com/aisum-cookpad
    98%
    food / nonfood accuracy
    precision: 0.98 recall: 0.96

    View Slide

  44. tinyurl.com/aisum-cookpad
    The Architecture
    How to build a Patched Classifier

    View Slide

  45. tinyurl.com/aisum-cookpad
    Convolutional
    Layers
    • Convert from colour space to
    feature space
    • From red, green, blue to
    tomato, basil and blue cheese
    • Their output scales with
    image size
    DCNN
    global pooling
    fully connected
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    2048 features
    food not

    View Slide

  46. tinyurl.com/aisum-cookpad
    Pooling
    • Allows the network to focus
    on most important features
    • Reduces data size and
    computation time
    • Global Average Pooling
    removes all spatial information
    DCNN
    global pooling
    fully connected
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    2048 features
    food not

    View Slide

  47. tinyurl.com/aisum-cookpad
    Fully Connected
    Layers
    • The traditional neural
    network layers
    • Also called Dense Layers
    • Don’t scale with image size
    • Generate the classification
    DCNN
    global pooling
    fully connected
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    2048 features
    food not

    View Slide

  48. tinyurl.com/aisum-cookpad
    DCNN
    global pooling
    1x1 conv. layer
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    food not
    DCNN
    global pooling
    fully connected
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    2048 features
    food not

    View Slide

  49. tinyurl.com/aisum-cookpad
    Convolutional
    Classifier
    • Fully convolutional
    • Scales with image size
    • Outputs a one-hot vector
    • Can be trained on a dataset
    designed for classification
    DCNN
    global pooling
    1x1 conv. layer
    RGB x 240 x 240 pixels
    2048 features x 8 x 8
    one-hot class vector
    food not

    View Slide

  50. tinyurl.com/aisum-cookpad
    Patched 

    Classifier
    • Fully convolutional
    • Outputs a “Class Activation Map”
    or “Heat-map”
    • More interpretable than 

    a traditional classifier
    • Can implement our “30% area”
    rule, and more!
    DCNN
    1x1 conv. layer
    RGB x 240 x 240 pixels
    2048 features x 8 x 8

    View Slide

  51. tinyurl.com/aisum-cookpad
    Train Patch Classifiers on non-ambiguous images
    in which the subject takes more than 80% space.
    You may want to add more than one 

    1x1 Convolutional Layer, or make it 3x3.
    Add 20% ~ 30% Dropout before Global Pooling,

    just for training to achieve smooth mapping.
    Use Patched Classifiers with caution, 

    they are very powerful!
    Fine Print

    View Slide

  52. tinyurl.com/aisum-cookpad
    The Benefits
    What you can do with a Patched Classifier

    View Slide

  53. tinyurl.com/aisum-cookpad
    Hot Dog or…
    Not Hot Dog
    We heard that detecting hot-
    dogs is in demand thanks to
    an American TV show called
    “Silicon Valley”.

    View Slide

  54. tinyurl.com/aisum-cookpad
    Hot Dog Detector
    • MobileNet base
    • Image dataset from Kaggle
    • Trains in 2 hours on a laptop
    • Works with webcam stream

    View Slide

  55. tinyurl.com/aisum-cookpad
    https:/
    /tokyo-ml.github.io/hotdog-tf-js/

    View Slide

  56. tinyurl.com/aisum-cookpad
    Smart Framing
    Crop, Resize or Reshape the
    image, while keeping the food
    item in the centre of attention.
    Dahl med kylling
    by Ingeborg Andersen

    View Slide

  57. tinyurl.com/aisum-cookpad

    View Slide

  58. tinyurl.com/aisum-cookpad

    View Slide

  59. tinyurl.com/aisum-cookpad

    View Slide

  60. tinyurl.com/aisum-cookpad
    Multi-Class
    Classify more than one kind of thing at a time

    View Slide

  61. tinyurl.com/aisum-cookpad
    CLASSIFY THIS!

    View Slide

  62. tinyurl.com/aisum-cookpad

    View Slide

  63. tinyurl.com/aisum-cookpad

    View Slide

  64. tinyurl.com/aisum-cookpad

    View Slide

  65. tinyurl.com/aisum-cookpad

    View Slide

  66. tinyurl.com/aisum-cookpad

    View Slide

  67. tinyurl.com/aisum-cookpad

    View Slide

  68. tinyurl.com/aisum-cookpad
    test images from https:/
    /snappygoat.com/

    View Slide

  69. tinyurl.com/aisum-cookpad

    View Slide

  70. tinyurl.com/aisum-cookpad
    The Summary
    The Last Slide

    View Slide

  71. tinyurl.com/aisum-cookpad
    Today we Learned
    • Sometimes AI has to deal with data not easy to classify.
    • Especially food photos are a fun challenge!
    • Patched Classifiers work great in this case.
    • They are less “black-boxy” than normal classifiers,
    • Quick and easy to build and train (just flip some layers around),
    • And can be used for many things other than classification.

    View Slide

  72. tinyurl.com/aisum-cookpad
    tinyurl.com/aisum-cookpad

    View Slide