Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Paper Reading] Inverse Cooking: Recipe Generation from Food Images

D437caab1b9a46d0b49bc0aac8bdb91a?s=47 Huy Van
August 18, 2019

[Paper Reading] Inverse Cooking: Recipe Generation from Food Images

Present "Inverse Cooking: Recipe Generation from Food Images" paper at Paper Reading study group


Huy Van

August 18, 2019

More Decks by Huy Van

Other Decks in Research


  1. Inverse Cooking: Recipe Generation from Food Images Amaia Salvador, Michal

    Drozdzal, Xavier Giro-i-Nieto, Adriana Romero Universitat Politecnica de Catalunya, Facebook AI Research @CVPR2019 Presented by Huy Van 1 Paper Reading 2019
  2. 1. Introduction Problem • too much food photos but limited

    detailed information about food Solution • Inverse Cooking: infers ingredients and cooking instructions from a food photo 2 Paper Reading 2019
  3. Contributions • Presents a system which generates cooking instructions on

    an image and its ingredients • Studies ingredients as both a list and a set, and proposes a new architecture for ingredient prediction that exploits co-dependencies among ingredients without imposing order • Superior than image-to-recipe retrieval approaches in ingredient predictions 3 Paper Reading 2019
  4. 2. Related Work Food Understanding • Large scale datasets: Food-101

    and Recipe1M • Visual food recognition • Esitmating the number of calories given a food image • Predicting the list of present ingredients • Finding the recipe for a given image Multi-label Classification Conditional Text Generation 4 Paper Reading 2019
  5. 3. Genarating Recipes from Images 5 Paper Reading 2019

  6. Cooking Instruction Transformer 6 Paper Reading 2019

  7. Ingredient Decoder • Ingredients as a list (ordered) • Ingredients

    as a set (unordered) 7 Paper Reading 2019
  8. Ingredients as a List • Present each ingredient as a

    one-hot vector • Use a transformer model 8 Paper Reading 2019
  9. Ingredients as a Set • Method 1: set transformer •

    Use transformer as above • To remove the order → aggregate the outputs accross different time-steps by using a max pooling operation • Method 2: • Present ingredient output as a binary set then convert to a target distribution • Use feed forward network with cross- entropy loss 9 Paper Reading 2019
  10. Optimization 2 stages: - pre-train the image encoder and ingredients

    decoder - train the ingredient encoder and instruction decoder 10 Paper Reading 2019
  11. 4. Experiments Dataset • Recipe1M: includes ~1M recipes • Preprocessing:

    • rule-base to reduce ingredients from 16,823 to 1,488 • tokenize raw text and remove less frequent words → 23,231 words 11 Paper Reading 2019
  12. Results: Recipe Generation 12 Paper Reading 2019

  13. Results: Ingredient Prediction 13 Paper Reading 2019

  14. Results: Generation vs Retrieval 14 Paper Reading 2019

  15. Results: User Studies 15 Paper Reading 2019

  16. 5. Conclusion • Introduced an image-to-recipe generation system, which takes

    a food image and produces a recipe consisting of a title, ingredients and sequence of cooking instructions • First predicted sets of ingredients from food images, showing that modeling dependencies matters • Then explored instruction generation conditioned on images and inferred ingredients, highlighting the importance of reasoning about both modalities at the same time • Finally, user study results confirm the difficulty of the task, and demonstrate the superiority of the system against state- of-the-art image-to-recipe retrieval approaches 16 Paper Reading 2019