Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating Recipes from Videos

Cookpad Bristol
November 27, 2018
950

Creating Recipes from Videos

Guest talk for the CSS Bristol students at Bristol University, 27 November 2018

Cookpad Bristol

November 27, 2018
Tweet

Transcript

  1. Creating Recipes from Videos Misha Fain, Cookpad Ltd. CSS Bristol

    guest lecture, 27 November 2018, Bristol University
  2. Recipe Creation on Cookpad Platform 1. Start cooking your meal

    2. Take pictures along the way 3. Eat your food and have a nap 4. Open Cookpad website or app and start creating a recipe 5. Enter all the ingredients and quantities 6. Write down all the steps 7. Attach the images to the corresponding steps
  3. What if you could do it like this instead? 1.

    Start recording video 2. Cook your meal 3. Eat it 4. Have your recipe generated for you
  4. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  5. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  6. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  7. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  8. Encoder (from video to vector) Internal Representation ... Time 0.1

    -0.34 0.435 ... -0.01 1.324 0.32 ... Video Frames Recurrent Network (GRU) ... CNN (ResNet)
  9. 0.1 -0.34 0.435 ... -0.01 1.324 0.32 Decoder (from vector

    to text) Internal Representation Time ... Recurrent Network (GRU) Words cut the potatoes slices <stop> <stop>
  10. Training Procedure Training = Optimization w_opt = argmin(L(w)) w: Neural

    Network parameters, millions of them L: loss function, problem-dependent (params -> number) Optimization method - gradient descent, aka steepest descent (intuition - reckless runner with short attention span gets lost in the fog in the mountains)
  11. Summary • We have seen how a Sequence-to-Sequence model can

    be used to predict recipe text from videos • Predicting the actions is easier than identifying the ingredients • While it somewhat works, still not nearly as good as we'd like. Lots of research is going on elsewhere to get better results, e.g. https://arxiv.org/pdf/1804.00819.pdf