How to Patch Image Classifiers

How to Patch Image Classifiers

A workshop on the architecture and benefits of patched classification in the context of images of food. At Cookpad, we needed to automatically filter images of food versus... everything else. But we ran into some images that confused our binary classifier. Patched classification came to the rescue.

Presented on NIKKEI's AI Summit 2019

C1595f6a99fc51c0fb8e04b54863dbeb?s=128

Leszek Rybicki

April 23, 2019
Tweet

Transcript

  1. 3.

    tinyurl.com/aisum-cookpad Who am I? ϨγΣοΫɾϦϏπΩ
 Leszek Rybicki Originally from Torun,

    Poland 2005 M.Sc. in Machine Learning,
 Nicolaus Copernicus University 2010 RIKEN Brain Science Institute 2016 Cookpad R&D
  2. 4.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  3. 5.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  4. 6.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  5. 7.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  6. 8.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  7. 9.

    tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  8. 10.

    tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project

    Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete
  9. 11.

    tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project

    Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete
  10. 12.

    tinyurl.com/aisum-cookpad The Outline • The Concept • The Background •

    The Architecture • The Benefits • The Summary Photo by Monika Grabkowska on Unsplash
  11. 14.

    tinyurl.com/aisum-cookpad This is an image of food. There is food

    in this image. Classification Photo by Cayla1 on Unsplash
  12. 21.

    tinyurl.com/aisum-cookpad Binary Image Classifier We used an Inception v3 DCNN

    model, pre-trained on ImageNet We replaced the top layers with our own: • a fully connected layer • a softmax layer that outputs a one-hot vector DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  13. 22.

    tinyurl.com/aisum-cookpad Threshold The model outputs fractions, we need a yes

    or no answer. We found a threshold ϑ=.81 which maximises the accuracy for our test dataset. •if the model’s output value is above the threshold, it’s food •otherwise, not food. global pooling fully connected one-hot class vector 2048 features food not ϑ
  14. 34.

    tinyurl.com/aisum-cookpad On the Fence Food with other items Food with

    text Food, but small in the photo People with food Photo by Jason Leung on Unsplash
  15. 35.

    tinyurl.com/aisum-cookpad Food is Social We cook and eat with our

    family and friends Food is at the centre of important events When we use a binary classifier for food photos, aren’t we losing something? Photo by Jonathan Borba on Unsplash
  16. 37.
  17. 38.

    tinyurl.com/aisum-cookpad Threshold is not enough Our model can’t distinguish between

    these two kinds of images using threshold alone. Either both get accepted, 
 or both are rejected. food not ϑ
  18. 39.

    tinyurl.com/aisum-cookpad Rule of Thumb The image can be cropped to

    contain mostly food…
 
 and the cropped image is 30% or more of the original image surface. Photo by Kevin Wolf on Unsplash 30% 1 2 It’s an image of food if…
  19. 45.

    tinyurl.com/aisum-cookpad Convolutional Layers • Convert from colour space to feature

    space • From red, green, blue to tomato, basil and blue cheese • Their output scales with image size DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  20. 46.

    tinyurl.com/aisum-cookpad Pooling • Allows the network to focus on most

    important features • Reduces data size and computation time • Global Average Pooling removes all spatial information DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  21. 47.

    tinyurl.com/aisum-cookpad Fully Connected Layers • The traditional neural network layers

    • Also called Dense Layers • Don’t scale with image size • Generate the classification DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  22. 48.

    tinyurl.com/aisum-cookpad DCNN global pooling 1x1 conv. layer RGB x 240

    x 240 pixels 2048 features x 8 x 8 one-hot class vector food not DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  23. 49.

    tinyurl.com/aisum-cookpad Convolutional Classifier • Fully convolutional • Scales with image

    size • Outputs a one-hot vector • Can be trained on a dataset designed for classification DCNN global pooling 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector food not
  24. 50.

    tinyurl.com/aisum-cookpad Patched 
 Classifier • Fully convolutional • Outputs a

    “Class Activation Map” or “Heat-map” • More interpretable than 
 a traditional classifier • Can implement our “30% area” rule, and more! DCNN 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8
  25. 51.

    tinyurl.com/aisum-cookpad Train Patch Classifiers on non-ambiguous images in which the

    subject takes more than 80% space. You may want to add more than one 
 1x1 Convolutional Layer, or make it 3x3. Add 20% ~ 30% Dropout before Global Pooling,
 just for training to achieve smooth mapping. Use Patched Classifiers with caution, 
 they are very powerful! Fine Print
  26. 53.

    tinyurl.com/aisum-cookpad Hot Dog or… Not Hot Dog We heard that

    detecting hot- dogs is in demand thanks to an American TV show called “Silicon Valley”.
  27. 54.

    tinyurl.com/aisum-cookpad Hot Dog Detector • MobileNet base • Image dataset

    from Kaggle • Trains in 2 hours on a laptop • Works with webcam stream
  28. 56.

    tinyurl.com/aisum-cookpad Smart Framing Crop, Resize or Reshape the image, while

    keeping the food item in the centre of attention. Dahl med kylling by Ingeborg Andersen
  29. 71.

    tinyurl.com/aisum-cookpad Today we Learned • Sometimes AI has to deal

    with data not easy to classify. • Especially food photos are a fun challenge! • Patched Classifiers work great in this case. • They are less “black-boxy” than normal classifiers, • Quick and easy to build and train (just flip some layers around), • And can be used for many things other than classification.