How to Patch Image Classifiers

tinyurl.com/aisum-cookpad How to Patch   Image Classifiers Leszek Rybicki AI/SUM
2019.04.23

tinyurl.com/aisum-cookpad The Introduction Hello, my name is …

tinyurl.com/aisum-cookpad Who am I? ϨγΣοΫɾϦϏπΩ  Leszek Rybicki Originally from Torun,
Poland 2005 M.Sc. in Machine Learning,  Nicolaus Copernicus University 2010 RIKEN Brain Science Institute 2016 Cookpad R&D

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web,
iOS, Android,   print magazines, Cookpad TV   OiCy, mart, Komerco 54 million monthly users in Japan  over 5 million recipes worldwide 71 countries, 26 languages,   11 offshore offices

tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project
Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete

tinyurl.com/aisum-cookpad The Outline • The Concept • The Background •
The Architecture • The Benefits • The Summary Photo by Monika Grabkowska on Unsplash

tinyurl.com/aisum-cookpad The Concept What an AI can see in images

tinyurl.com/aisum-cookpad This is an image of food. There is food
in this image. Classification Photo by Cayla1 on Unsplash

tinyurl.com/aisum-cookpad Object Detection There are two dishes. This is where
they are.

tinyurl.com/aisum-cookpad Segmentation Here are the exact pixels,  which belong to
dishes.

tinyurl.com/aisum-cookpad Patched Classification These patches of the image are in
varying degree   occupied by food dishes.

tinyurl.com/aisum-cookpad The Background We needed to filter out images of
food…

tinyurl.com/aisum-cookpad FOOD

tinyurl.com/aisum-cookpad NOT FOOD

tinyurl.com/aisum-cookpad Binary Image Classifier We used an Inception v3 DCNN
model, pre-trained on ImageNet We replaced the top layers with our own: • a fully connected layer • a softmax layer that outputs a one-hot vector DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

tinyurl.com/aisum-cookpad Threshold The model outputs fractions, we need a yes
or no answer. We found a threshold ϑ=.81 which maximises the accuracy for our test dataset. •if the model’s output value is above the threshold, it’s food •otherwise, not food. global pooling fully connected one-hot class vector 2048 features food not ϑ

tinyurl.com/aisum-cookpad

tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

tinyurl.com/aisum-cookpad 4% 96%

tinyurl.com/aisum-cookpad EXPECTATION

tinyurl.com/aisum-cookpad REALITY

tinyurl.com/aisum-cookpad Problem 1 Food that looks like Things Things that
look like Food

tinyurl.com/aisum-cookpad twitter: @teenybiscuit

tinyurl.com/aisum-cookpad It happens quite a lot! Ωϟϥห❤ո౪άϧʔɺϛχΦϯห౰❤ ˑͪΌʙ͖ˑ ΦϜ͢ͼˑϐΧνϡ΢ͷΩϟϥห Έ΍͖ͬͪΜ
৭ʑόϦΤὑ؆୯εϥΠενʔζόϥͷՖ mew⁂mam

tinyurl.com/aisum-cookpad Problem 2 Food and Things together

tinyurl.com/aisum-cookpad On the Fence Food with other items Food with
text Food, but small in the photo People with food Photo by Jason Leung on Unsplash

tinyurl.com/aisum-cookpad Food is Social We cook and eat with our
family and friends Food is at the centre of important events When we use a binary classifier for food photos, aren’t we losing something? Photo by Jonathan Borba on Unsplash

tinyurl.com/aisum-cookpad The Real Problem with our binary classifier is…

tinyurl.com/aisum-cookpad It is equally confused by these images. Photo by
Petful on Flickr Photo by Charles on Unsplash

tinyurl.com/aisum-cookpad Threshold is not enough Our model can’t distinguish between
these two kinds of images using threshold alone. Either both get accepted,   or both are rejected. food not ϑ

tinyurl.com/aisum-cookpad Rule of Thumb The image can be cropped to
contain mostly food…    and the cropped image is 30% or more of the original image surface. Photo by Kevin Wolf on Unsplash 30% 1 2 It’s an image of food if…

tinyurl.com/aisum-cookpad Solution Patch the Classifier!

tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

tinyurl.com/aisum-cookpad 98% food / nonfood accuracy precision: 0.98 recall: 0.96

tinyurl.com/aisum-cookpad The Architecture How to build a Patched Classifier

tinyurl.com/aisum-cookpad Convolutional Layers • Convert from colour space to feature
space • From red, green, blue to tomato, basil and blue cheese • Their output scales with image size DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

tinyurl.com/aisum-cookpad Pooling • Allows the network to focus on most
important features • Reduces data size and computation time • Global Average Pooling removes all spatial information DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

tinyurl.com/aisum-cookpad Fully Connected Layers • The traditional neural network layers
• Also called Dense Layers • Don’t scale with image size • Generate the classification DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

tinyurl.com/aisum-cookpad DCNN global pooling 1x1 conv. layer RGB x 240
x 240 pixels 2048 features x 8 x 8 one-hot class vector food not DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

tinyurl.com/aisum-cookpad Convolutional Classifier • Fully convolutional • Scales with image
size • Outputs a one-hot vector • Can be trained on a dataset designed for classification DCNN global pooling 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector food not

tinyurl.com/aisum-cookpad Patched   Classifier • Fully convolutional • Outputs a
“Class Activation Map” or “Heat-map” • More interpretable than   a traditional classifier • Can implement our “30% area” rule, and more! DCNN 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8

tinyurl.com/aisum-cookpad Train Patch Classifiers on non-ambiguous images in which the
subject takes more than 80% space. You may want to add more than one   1x1 Convolutional Layer, or make it 3x3. Add 20% ~ 30% Dropout before Global Pooling,  just for training to achieve smooth mapping. Use Patched Classifiers with caution,   they are very powerful! Fine Print

tinyurl.com/aisum-cookpad The Benefits What you can do with a Patched
Classifier

tinyurl.com/aisum-cookpad Hot Dog or… Not Hot Dog We heard that
detecting hot- dogs is in demand thanks to an American TV show called “Silicon Valley”.

tinyurl.com/aisum-cookpad Hot Dog Detector • MobileNet base • Image dataset
from Kaggle • Trains in 2 hours on a laptop • Works with webcam stream

tinyurl.com/aisum-cookpad https:/ /tokyo-ml.github.io/hotdog-tf-js/

tinyurl.com/aisum-cookpad Smart Framing Crop, Resize or Reshape the image, while
keeping the food item in the centre of attention. Dahl med kylling by Ingeborg Andersen

tinyurl.com/aisum-cookpad Multi-Class Classify more than one kind of thing at
a time

tinyurl.com/aisum-cookpad CLASSIFY THIS!

tinyurl.com/aisum-cookpad ⏳

tinyurl.com/aisum-cookpad test images from https:/ /snappygoat.com/

tinyurl.com/aisum-cookpad The Summary The Last Slide

tinyurl.com/aisum-cookpad Today we Learned • Sometimes AI has to deal
with data not easy to classify. • Especially food photos are a fun challenge! • Patched Classifiers work great in this case. • They are less “black-boxy” than normal classifiers, • Quick and easy to build and train (just flip some layers around), • And can be used for many things other than classification.

tinyurl.com/aisum-cookpad tinyurl.com/aisum-cookpad

How to Patch Image Classifiers

How to Patch Image Classifiers

More Decks by Leszek Rybicki

Other Decks in Research

Featured

Transcript