How to Patch Image Classifiers

How to Patch Image Classifiers

A workshop on the architecture and benefits of patched classification in the context of images of food. At Cookpad, we needed to automatically filter images of food versus... everything else. But we ran into some images that confused our binary classifier. Patched classification came to the rescue.

Presented on NIKKEI's AI Summit 2019

C1595f6a99fc51c0fb8e04b54863dbeb?s=128

Leszek Rybicki

April 23, 2019
Tweet

Transcript

  1. tinyurl.com/aisum-cookpad How to Patch 
 Image Classifiers Leszek Rybicki AI/SUM

    2019.04.23
  2. tinyurl.com/aisum-cookpad The Introduction Hello, my name is …

  3. tinyurl.com/aisum-cookpad Who am I? ϨγΣοΫɾϦϏπΩ
 Leszek Rybicki Originally from Torun,

    Poland 2005 M.Sc. in Machine Learning,
 Nicolaus Copernicus University 2010 RIKEN Brain Science Institute 2016 Cookpad R&D
  4. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  5. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  6. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  7. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  8. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  9. tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,

    iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices
  10. tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project

    Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete
  11. tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project

    Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete
  12. tinyurl.com/aisum-cookpad The Outline • The Concept • The Background •

    The Architecture • The Benefits • The Summary Photo by Monika Grabkowska on Unsplash
  13. tinyurl.com/aisum-cookpad The Concept What an AI can see in images

  14. tinyurl.com/aisum-cookpad This is an image of food. There is food

    in this image. Classification Photo by Cayla1 on Unsplash
  15. tinyurl.com/aisum-cookpad Object Detection There are two dishes. This is where

    they are.
  16. tinyurl.com/aisum-cookpad Segmentation Here are the exact pixels,
 which belong to

    dishes.
  17. tinyurl.com/aisum-cookpad Patched Classification These patches of the image are in

    varying degree 
 occupied by food dishes.
  18. tinyurl.com/aisum-cookpad The Background We needed to filter out images of

    food…
  19. tinyurl.com/aisum-cookpad FOOD

  20. tinyurl.com/aisum-cookpad NOT FOOD

  21. tinyurl.com/aisum-cookpad Binary Image Classifier We used an Inception v3 DCNN

    model, pre-trained on ImageNet We replaced the top layers with our own: • a fully connected layer • a softmax layer that outputs a one-hot vector DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  22. tinyurl.com/aisum-cookpad Threshold The model outputs fractions, we need a yes

    or no answer. We found a threshold ϑ=.81 which maximises the accuracy for our test dataset. •if the model’s output value is above the threshold, it’s food •otherwise, not food. global pooling fully connected one-hot class vector 2048 features food not ϑ
  23. tinyurl.com/aisum-cookpad

  24. tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

  25. tinyurl.com/aisum-cookpad

  26. tinyurl.com/aisum-cookpad 4% 96%

  27. tinyurl.com/aisum-cookpad

  28. tinyurl.com/aisum-cookpad EXPECTATION

  29. tinyurl.com/aisum-cookpad REALITY

  30. tinyurl.com/aisum-cookpad Problem 1 Food that looks like Things Things that

    look like Food
  31. tinyurl.com/aisum-cookpad twitter: @teenybiscuit

  32. tinyurl.com/aisum-cookpad It happens quite a lot! Ωϟϥห❤ո౪άϧʔɺϛχΦϯห౰❤ ˑͪΌʙ͖ˑ ΦϜ͢ͼˑϐΧνϡ΢ͷΩϟϥห Έ΍͖ͬͪΜ

    ৭ʑόϦΤὑ؆୯εϥΠενʔζόϥͷՖ mew⁂mam
  33. tinyurl.com/aisum-cookpad Problem 2 Food and Things together

  34. tinyurl.com/aisum-cookpad On the Fence Food with other items Food with

    text Food, but small in the photo People with food Photo by Jason Leung on Unsplash
  35. tinyurl.com/aisum-cookpad Food is Social We cook and eat with our

    family and friends Food is at the centre of important events When we use a binary classifier for food photos, aren’t we losing something? Photo by Jonathan Borba on Unsplash
  36. tinyurl.com/aisum-cookpad The Real Problem with our binary classifier is…

  37. tinyurl.com/aisum-cookpad It is equally confused by these images. Photo by

    Petful on Flickr Photo by Charles on Unsplash
  38. tinyurl.com/aisum-cookpad Threshold is not enough Our model can’t distinguish between

    these two kinds of images using threshold alone. Either both get accepted, 
 or both are rejected. food not ϑ
  39. tinyurl.com/aisum-cookpad Rule of Thumb The image can be cropped to

    contain mostly food…
 
 and the cropped image is 30% or more of the original image surface. Photo by Kevin Wolf on Unsplash 30% 1 2 It’s an image of food if…
  40. tinyurl.com/aisum-cookpad Solution Patch the Classifier!

  41. tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

  42. tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

  43. tinyurl.com/aisum-cookpad 98% food / nonfood accuracy precision: 0.98 recall: 0.96

  44. tinyurl.com/aisum-cookpad The Architecture How to build a Patched Classifier

  45. tinyurl.com/aisum-cookpad Convolutional Layers • Convert from colour space to feature

    space • From red, green, blue to tomato, basil and blue cheese • Their output scales with image size DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  46. tinyurl.com/aisum-cookpad Pooling • Allows the network to focus on most

    important features • Reduces data size and computation time • Global Average Pooling removes all spatial information DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  47. tinyurl.com/aisum-cookpad Fully Connected Layers • The traditional neural network layers

    • Also called Dense Layers • Don’t scale with image size • Generate the classification DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  48. tinyurl.com/aisum-cookpad DCNN global pooling 1x1 conv. layer RGB x 240

    x 240 pixels 2048 features x 8 x 8 one-hot class vector food not DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not
  49. tinyurl.com/aisum-cookpad Convolutional Classifier • Fully convolutional • Scales with image

    size • Outputs a one-hot vector • Can be trained on a dataset designed for classification DCNN global pooling 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector food not
  50. tinyurl.com/aisum-cookpad Patched 
 Classifier • Fully convolutional • Outputs a

    “Class Activation Map” or “Heat-map” • More interpretable than 
 a traditional classifier • Can implement our “30% area” rule, and more! DCNN 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8
  51. tinyurl.com/aisum-cookpad Train Patch Classifiers on non-ambiguous images in which the

    subject takes more than 80% space. You may want to add more than one 
 1x1 Convolutional Layer, or make it 3x3. Add 20% ~ 30% Dropout before Global Pooling,
 just for training to achieve smooth mapping. Use Patched Classifiers with caution, 
 they are very powerful! Fine Print
  52. tinyurl.com/aisum-cookpad The Benefits What you can do with a Patched

    Classifier
  53. tinyurl.com/aisum-cookpad Hot Dog or… Not Hot Dog We heard that

    detecting hot- dogs is in demand thanks to an American TV show called “Silicon Valley”.
  54. tinyurl.com/aisum-cookpad Hot Dog Detector • MobileNet base • Image dataset

    from Kaggle • Trains in 2 hours on a laptop • Works with webcam stream
  55. tinyurl.com/aisum-cookpad https:/ /tokyo-ml.github.io/hotdog-tf-js/

  56. tinyurl.com/aisum-cookpad Smart Framing Crop, Resize or Reshape the image, while

    keeping the food item in the centre of attention. Dahl med kylling by Ingeborg Andersen
  57. tinyurl.com/aisum-cookpad

  58. tinyurl.com/aisum-cookpad

  59. tinyurl.com/aisum-cookpad

  60. tinyurl.com/aisum-cookpad Multi-Class Classify more than one kind of thing at

    a time
  61. tinyurl.com/aisum-cookpad CLASSIFY THIS!

  62. tinyurl.com/aisum-cookpad

  63. tinyurl.com/aisum-cookpad

  64. tinyurl.com/aisum-cookpad

  65. tinyurl.com/aisum-cookpad

  66. tinyurl.com/aisum-cookpad

  67. tinyurl.com/aisum-cookpad ⏳

  68. tinyurl.com/aisum-cookpad test images from https:/ /snappygoat.com/

  69. tinyurl.com/aisum-cookpad

  70. tinyurl.com/aisum-cookpad The Summary The Last Slide

  71. tinyurl.com/aisum-cookpad Today we Learned • Sometimes AI has to deal

    with data not easy to classify. • Especially food photos are a fun challenge! • Patched Classifiers work great in this case. • They are less “black-boxy” than normal classifiers, • Quick and easy to build and train (just flip some layers around), • And can be used for many things other than classification.
  72. tinyurl.com/aisum-cookpad tinyurl.com/aisum-cookpad