Creating Recipes from Videos

Ba836a0fae2870340ad72e053b1ed86f?s=47 Cookpad Bristol
November 27, 2018
320

Creating Recipes from Videos

Guest talk for the CSS Bristol students at Bristol University, 27 November 2018

Ba836a0fae2870340ad72e053b1ed86f?s=128

Cookpad Bristol

November 27, 2018
Tweet

Transcript

  1. Creating Recipes from Videos Misha Fain, Cookpad Ltd. CSS Bristol

    guest lecture, 27 November 2018, Bristol University
  2. Recipe Creation on Cookpad Platform

  3. Recipe Creation on Cookpad Platform 1. Start cooking your meal

    2. Take pictures along the way 3. Eat your food and have a nap 4. Open Cookpad website or app and start creating a recipe 5. Enter all the ingredients and quantities 6. Write down all the steps 7. Attach the images to the corresponding steps
  4. Recipe Creation on Cookpad Platform

  5. What if you could do it like this instead? 1.

    Start recording video 2. Cook your meal 3. Eat it 4. Have your recipe generated for you
  6. How to generate a step from a short video clip?

  7. How to generate a step from a short video clip?

  8. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  9. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  10. Encoder: image-to-vector (Convolutional Neural Network) Adapted from https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f Vector: 0.1

    -0.34 0.435 ... -0.01 1.324 0.32
  11. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  12. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  13. Encoder (from sequence of image vectors to vector) From http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

  14. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  15. Encoder (from video to vector)

  16. How to generate a step from a short video clip?

  17. How to generate a step from a short video clip?

  18. 0.1 -0.34 0.435 ... -0.01 1.324 0.32 Decoder (from vector

    to text) Internal Representation Time ... Recurrent Network (GRU) Words cut the potatoes slices <stop> <stop>
  19. How to generate a step from a short video clip?

  20. Data Public dataset YouCookII: annotated Youtube cooking videos http://cmos.eecs.umich.edu/static/YouCookII/youcookii_readme.pdf

  21. Training Procedure Training = Optimization w_opt = argmin(L(w)) w: Neural

    Network parameters, millions of them L: loss function, problem-dependent (params -> number) Optimization method - gradient descent, aka steepest descent (intuition - reckless runner with short attention span gets lost in the fog in the mountains)
  22. Results by Evander DaCosta

  23. Results - what does not work by Evander DaCosta

  24. Summary • We have seen how a Sequence-to-Sequence model can

    be used to predict recipe text from videos • Predicting the actions is easier than identifying the ingredients • While it somewhat works, still not nearly as good as we'd like. Lots of research is going on elsewhere to get better results, e.g. https://arxiv.org/pdf/1804.00819.pdf