Slide 1

Slide 1 text

tinyurl.com/aisum-cookpad How to Patch 
 Image Classifiers Leszek Rybicki AI/SUM 2019.04.23

Slide 2

Slide 2 text

tinyurl.com/aisum-cookpad The Introduction Hello, my name is …

Slide 3

Slide 3 text

tinyurl.com/aisum-cookpad Who am I? ϨγΣοΫɾϦϏπΩ
 Leszek Rybicki Originally from Torun, Poland 2005 M.Sc. in Machine Learning,
 Nicolaus Copernicus University 2010 RIKEN Brain Science Institute 2016 Cookpad R&D

Slide 4

Slide 4 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 5

Slide 5 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 6

Slide 6 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 7

Slide 7 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 8

Slide 8 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 9

Slide 9 text

tinyurl.com/aisum-cookpad Make Everyday Cooking Fun Established Oct 1st 1997 web, iOS, Android, 
 print magazines, Cookpad TV 
 OiCy, mart, Komerco 54 million monthly users in Japan
 over 5 million recipes worldwide 71 countries, 26 languages, 
 11 offshore offices

Slide 10

Slide 10 text

tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete

Slide 11

Slide 11 text

tinyurl.com/aisum-cookpad Cookpad R&D Researchers (NLP, CV), Engineers, Annotators and Project Managers Alexa skill in Japanese and Spanish Food image filtering Recipe similarity search Ingredient autocomplete

Slide 12

Slide 12 text

tinyurl.com/aisum-cookpad The Outline • The Concept • The Background • The Architecture • The Benefits • The Summary Photo by Monika Grabkowska on Unsplash

Slide 13

Slide 13 text

tinyurl.com/aisum-cookpad The Concept What an AI can see in images

Slide 14

Slide 14 text

tinyurl.com/aisum-cookpad This is an image of food. There is food in this image. Classification Photo by Cayla1 on Unsplash

Slide 15

Slide 15 text

tinyurl.com/aisum-cookpad Object Detection There are two dishes. This is where they are.

Slide 16

Slide 16 text

tinyurl.com/aisum-cookpad Segmentation Here are the exact pixels,
 which belong to dishes.

Slide 17

Slide 17 text

tinyurl.com/aisum-cookpad Patched Classification These patches of the image are in varying degree 
 occupied by food dishes.

Slide 18

Slide 18 text

tinyurl.com/aisum-cookpad The Background We needed to filter out images of food…

Slide 19

Slide 19 text

tinyurl.com/aisum-cookpad FOOD

Slide 20

Slide 20 text

tinyurl.com/aisum-cookpad NOT FOOD

Slide 21

Slide 21 text

tinyurl.com/aisum-cookpad Binary Image Classifier We used an Inception v3 DCNN model, pre-trained on ImageNet We replaced the top layers with our own: • a fully connected layer • a softmax layer that outputs a one-hot vector DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

Slide 22

Slide 22 text

tinyurl.com/aisum-cookpad Threshold The model outputs fractions, we need a yes or no answer. We found a threshold ϑ=.81 which maximises the accuracy for our test dataset. •if the model’s output value is above the threshold, it’s food •otherwise, not food. global pooling fully connected one-hot class vector 2048 features food not ϑ

Slide 23

Slide 23 text

tinyurl.com/aisum-cookpad

Slide 24

Slide 24 text

tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

Slide 25

Slide 25 text

tinyurl.com/aisum-cookpad

Slide 26

Slide 26 text

tinyurl.com/aisum-cookpad 4% 96%

Slide 27

Slide 27 text

tinyurl.com/aisum-cookpad

Slide 28

Slide 28 text

tinyurl.com/aisum-cookpad EXPECTATION

Slide 29

Slide 29 text

tinyurl.com/aisum-cookpad REALITY

Slide 30

Slide 30 text

tinyurl.com/aisum-cookpad Problem 1 Food that looks like Things Things that look like Food

Slide 31

Slide 31 text

tinyurl.com/aisum-cookpad twitter: @teenybiscuit

Slide 32

Slide 32 text

tinyurl.com/aisum-cookpad It happens quite a lot! Ωϟϥห❤ո౪άϧʔɺϛχΦϯห౰❤ ˑͪΌʙ͖ˑ ΦϜ͢ͼˑϐΧνϡ΢ͷΩϟϥห Έ΍͖ͬͪΜ ৭ʑόϦΤὑ؆୯εϥΠενʔζόϥͷՖ mew⁂mam

Slide 33

Slide 33 text

tinyurl.com/aisum-cookpad Problem 2 Food and Things together

Slide 34

Slide 34 text

tinyurl.com/aisum-cookpad On the Fence Food with other items Food with text Food, but small in the photo People with food Photo by Jason Leung on Unsplash

Slide 35

Slide 35 text

tinyurl.com/aisum-cookpad Food is Social We cook and eat with our family and friends Food is at the centre of important events When we use a binary classifier for food photos, aren’t we losing something? Photo by Jonathan Borba on Unsplash

Slide 36

Slide 36 text

tinyurl.com/aisum-cookpad The Real Problem with our binary classifier is…

Slide 37

Slide 37 text

tinyurl.com/aisum-cookpad It is equally confused by these images. Photo by Petful on Flickr Photo by Charles on Unsplash

Slide 38

Slide 38 text

tinyurl.com/aisum-cookpad Threshold is not enough Our model can’t distinguish between these two kinds of images using threshold alone. Either both get accepted, 
 or both are rejected. food not ϑ

Slide 39

Slide 39 text

tinyurl.com/aisum-cookpad Rule of Thumb The image can be cropped to contain mostly food…
 
 and the cropped image is 30% or more of the original image surface. Photo by Kevin Wolf on Unsplash 30% 1 2 It’s an image of food if…

Slide 40

Slide 40 text

tinyurl.com/aisum-cookpad Solution Patch the Classifier!

Slide 41

Slide 41 text

tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

Slide 42

Slide 42 text

tinyurl.com/aisum-cookpad food / nonfood accuracy 96% precision: 0.97 recall: 0.89

Slide 43

Slide 43 text

tinyurl.com/aisum-cookpad 98% food / nonfood accuracy precision: 0.98 recall: 0.96

Slide 44

Slide 44 text

tinyurl.com/aisum-cookpad The Architecture How to build a Patched Classifier

Slide 45

Slide 45 text

tinyurl.com/aisum-cookpad Convolutional Layers • Convert from colour space to feature space • From red, green, blue to tomato, basil and blue cheese • Their output scales with image size DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

Slide 46

Slide 46 text

tinyurl.com/aisum-cookpad Pooling • Allows the network to focus on most important features • Reduces data size and computation time • Global Average Pooling removes all spatial information DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

Slide 47

Slide 47 text

tinyurl.com/aisum-cookpad Fully Connected Layers • The traditional neural network layers • Also called Dense Layers • Don’t scale with image size • Generate the classification DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

Slide 48

Slide 48 text

tinyurl.com/aisum-cookpad DCNN global pooling 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector food not DCNN global pooling fully connected RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector 2048 features food not

Slide 49

Slide 49 text

tinyurl.com/aisum-cookpad Convolutional Classifier • Fully convolutional • Scales with image size • Outputs a one-hot vector • Can be trained on a dataset designed for classification DCNN global pooling 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8 one-hot class vector food not

Slide 50

Slide 50 text

tinyurl.com/aisum-cookpad Patched 
 Classifier • Fully convolutional • Outputs a “Class Activation Map” or “Heat-map” • More interpretable than 
 a traditional classifier • Can implement our “30% area” rule, and more! DCNN 1x1 conv. layer RGB x 240 x 240 pixels 2048 features x 8 x 8

Slide 51

Slide 51 text

tinyurl.com/aisum-cookpad Train Patch Classifiers on non-ambiguous images in which the subject takes more than 80% space. You may want to add more than one 
 1x1 Convolutional Layer, or make it 3x3. Add 20% ~ 30% Dropout before Global Pooling,
 just for training to achieve smooth mapping. Use Patched Classifiers with caution, 
 they are very powerful! Fine Print

Slide 52

Slide 52 text

tinyurl.com/aisum-cookpad The Benefits What you can do with a Patched Classifier

Slide 53

Slide 53 text

tinyurl.com/aisum-cookpad Hot Dog or… Not Hot Dog We heard that detecting hot- dogs is in demand thanks to an American TV show called “Silicon Valley”.

Slide 54

Slide 54 text

tinyurl.com/aisum-cookpad Hot Dog Detector • MobileNet base • Image dataset from Kaggle • Trains in 2 hours on a laptop • Works with webcam stream

Slide 55

Slide 55 text

tinyurl.com/aisum-cookpad https:/ /tokyo-ml.github.io/hotdog-tf-js/

Slide 56

Slide 56 text

tinyurl.com/aisum-cookpad Smart Framing Crop, Resize or Reshape the image, while keeping the food item in the centre of attention. Dahl med kylling by Ingeborg Andersen

Slide 57

Slide 57 text

tinyurl.com/aisum-cookpad

Slide 58

Slide 58 text

tinyurl.com/aisum-cookpad

Slide 59

Slide 59 text

tinyurl.com/aisum-cookpad

Slide 60

Slide 60 text

tinyurl.com/aisum-cookpad Multi-Class Classify more than one kind of thing at a time

Slide 61

Slide 61 text

tinyurl.com/aisum-cookpad CLASSIFY THIS!

Slide 62

Slide 62 text

tinyurl.com/aisum-cookpad

Slide 63

Slide 63 text

tinyurl.com/aisum-cookpad

Slide 64

Slide 64 text

tinyurl.com/aisum-cookpad

Slide 65

Slide 65 text

tinyurl.com/aisum-cookpad

Slide 66

Slide 66 text

tinyurl.com/aisum-cookpad

Slide 67

Slide 67 text

tinyurl.com/aisum-cookpad ⏳

Slide 68

Slide 68 text

tinyurl.com/aisum-cookpad test images from https:/ /snappygoat.com/

Slide 69

Slide 69 text

tinyurl.com/aisum-cookpad

Slide 70

Slide 70 text

tinyurl.com/aisum-cookpad The Summary The Last Slide

Slide 71

Slide 71 text

tinyurl.com/aisum-cookpad Today we Learned • Sometimes AI has to deal with data not easy to classify. • Especially food photos are a fun challenge! • Patched Classifiers work great in this case. • They are less “black-boxy” than normal classifiers, • Quick and easy to build and train (just flip some layers around), • And can be used for many things other than classification.

Slide 72

Slide 72 text

tinyurl.com/aisum-cookpad tinyurl.com/aisum-cookpad