Creating Recipes from Videos

Creating Recipes from Videos Misha Fain, Cookpad Ltd. CSS Bristol
guest lecture, 27 November 2018, Bristol University

Recipe Creation on Cookpad Platform

Recipe Creation on Cookpad Platform 1. Start cooking your meal
2. Take pictures along the way 3. Eat your food and have a nap 4. Open Cookpad website or app and start creating a recipe 5. Enter all the ingredients and quantities 6. Write down all the steps 7. Attach the images to the corresponding steps

Recipe Creation on Cookpad Platform

What if you could do it like this instead? 1.
Start recording video 2. Cook your meal 3. Eat it 4. Have your recipe generated for you

How to generate a step from a short video clip?

Encoder (from video to vector) Internal Representation ... Time 0.1
-0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)

Encoder: image-to-vector (Convolutional Neural Network) Adapted from https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f Vector: 0.1
-0.34 0.435 ... -0.01 1.324 0.32

Encoder (from sequence of image vectors to vector) From http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Encoder (from video to vector)

0.1 -0.34 0.435 ... -0.01 1.324 0.32 Decoder (from vector
to text) Internal Representation Time ... Recurrent Network (GRU) Words cut the potatoes slices <stop> <stop>

Data Public dataset YouCookII: annotated Youtube cooking videos http://cmos.eecs.umich.edu/static/YouCookII/youcookii_readme.pdf

Training Procedure Training = Optimization w_opt = argmin(L(w)) w: Neural
Network parameters, millions of them L: loss function, problem-dependent (params -> number) Optimization method - gradient descent, aka steepest descent (intuition - reckless runner with short attention span gets lost in the fog in the mountains)

Results by Evander DaCosta

Results - what does not work by Evander DaCosta

Summary • We have seen how a Sequence-to-Sequence model can
be used to predict recipe text from videos • Predicting the actions is easier than identifying the ingredients • While it somewhat works, still not nearly as good as we'd like. Lots of research is going on elsewhere to get better results, e.g. https://arxiv.org/pdf/1804.00819.pdf

Creating Recipes from Videos

Creating Recipes from Videos

Cookpad Bristol

More Decks by Cookpad Bristol

Featured

Transcript