Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating Recipes from Videos

Cookpad Bristol
November 27, 2018
780

Creating Recipes from Videos

Guest talk for the CSS Bristol students at Bristol University, 27 November 2018

Cookpad Bristol

November 27, 2018
Tweet

Transcript

  1. Creating Recipes from Videos
    Misha Fain, Cookpad Ltd.
    CSS Bristol guest lecture, 27 November 2018, Bristol University

    View Slide

  2. Recipe Creation on Cookpad Platform

    View Slide

  3. Recipe Creation on Cookpad Platform
    1. Start cooking your meal
    2. Take pictures along the way
    3. Eat your food and have a nap
    4. Open Cookpad website or app and start creating a recipe
    5. Enter all the ingredients and quantities
    6. Write down all the steps
    7. Attach the images to the corresponding steps

    View Slide

  4. Recipe Creation on Cookpad Platform

    View Slide

  5. What if you could do it like this instead?
    1. Start recording video
    2. Cook your meal
    3. Eat it
    4. Have your recipe generated for you

    View Slide

  6. How to generate a step from a short video clip?

    View Slide

  7. How to generate a step from a short video clip?

    View Slide

  8. Encoder (from video to vector)
    Internal Representation
    ...
    Time
    0.1 -0.34 0.435 ... -0.01 1.324 0.32
    ...
    Video
    Frames
    Recurrent
    Network
    (GRU)
    ...
    CNN
    (ResNet)

    View Slide

  9. Encoder (from video to vector)
    Internal Representation
    ...
    Time
    0.1 -0.34 0.435 ... -0.01 1.324 0.32
    ...
    Video
    Frames
    Recurrent
    Network
    (GRU)
    ...
    CNN
    (ResNet)

    View Slide

  10. Encoder: image-to-vector (Convolutional Neural Network)
    Adapted from https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f
    Vector: 0.1 -0.34 0.435 ... -0.01 1.324 0.32

    View Slide

  11. Encoder (from video to vector)
    Internal Representation
    ...
    Time
    0.1 -0.34 0.435 ... -0.01 1.324 0.32
    ...
    Video
    Frames
    Recurrent
    Network
    (GRU)
    ...
    CNN
    (ResNet)

    View Slide

  12. Encoder (from video to vector)
    Internal Representation
    ...
    Time
    0.1 -0.34 0.435 ... -0.01 1.324 0.32
    ...
    Video
    Frames
    Recurrent
    Network
    (GRU)
    ...
    CNN
    (ResNet)

    View Slide

  13. Encoder (from sequence of image vectors to vector)
    From http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

    View Slide

  14. Encoder (from video to vector)
    Internal Representation
    ...
    Time
    0.1 -0.34 0.435 ... -0.01 1.324 0.32
    ...
    Video
    Frames
    Recurrent
    Network
    (GRU)
    ...
    CNN
    (ResNet)

    View Slide

  15. Encoder (from video to vector)

    View Slide

  16. How to generate a step from a short video clip?

    View Slide

  17. How to generate a step from a short video clip?

    View Slide

  18. 0.1 -0.34 0.435 ... -0.01 1.324 0.32
    Decoder (from vector to text)
    Internal Representation
    Time
    ...
    Recurrent Network
    (GRU)
    Words cut the potatoes slices

    View Slide

  19. How to generate a step from a short video clip?

    View Slide

  20. Data
    Public dataset YouCookII: annotated Youtube cooking videos
    http://cmos.eecs.umich.edu/static/YouCookII/youcookii_readme.pdf

    View Slide

  21. Training Procedure
    Training = Optimization
    w_opt = argmin(L(w))
    w: Neural Network parameters, millions of them
    L: loss function, problem-dependent (params -> number)
    Optimization method - gradient descent, aka steepest descent
    (intuition - reckless runner with short attention span gets lost in the fog in the mountains)

    View Slide

  22. Results
    by Evander DaCosta

    View Slide

  23. Results - what does not work
    by Evander DaCosta

    View Slide

  24. Summary
    ● We have seen how a Sequence-to-Sequence model can be used to predict
    recipe text from videos
    ● Predicting the actions is easier than identifying the ingredients
    ● While it somewhat works, still not nearly as good as we'd like. Lots of research
    is going on elsewhere to get better results, e.g.
    https://arxiv.org/pdf/1804.00819.pdf

    View Slide