Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finding beans in burgers: paper reading notes

Finding beans in burgers: paper reading notes

Notes from my reading of the CVPR 2018 paper: "Finding beans in burgers:
Deep semantic-visual embedding with localization"

Leszek Rybicki

July 07, 2018
Tweet

More Decks by Leszek Rybicki

Other Decks in Research

Transcript

  1. Finding beans in burgers
    Deep semantic-visual embedding with localization
    @lunardog
    関東コンピュータービジョン勉強会  2018.07.07

    View full-size slide

  2. 自己紹介
    ● レシェック
    ● ポーランド人
    ● 2005~ 機械学習の研究者
    ● 2010~ 日本に来ました
    ● 2016~ クックパッドに入社
    ● github: @lunardog
    twitter: @_lunardog_

    View full-size slide

  3. CVPR 2018
    SIGIR 2018
    MsCOCO
    Recipe1M

    View full-size slide

  4. Learning Cross-modal
    Embeddings for Cooking
    Recipes and Food Images
    ● CVPR 2017
    ● joint embedding of images and recipes

    View full-size slide

  5. MsCOCO -> MsCOCO
    MsCOCO -> Flickr30K

    View full-size slide

  6. triplet loss
    WELDON pooling

    View full-size slide

  7. Triplet Loss

    View full-size slide

  8. FaceNet: A Unified Embedding
    for Face Recognition and Clustering
    Florian Schroff, Dmitry Kalenichenko, James Philbin

    View full-size slide

  9. FaceNet: A Unified Embedding for Face Recognition and Clustering
    Florian Schroff, Dmitry Kalenichenko, James Philbin

    View full-size slide

  10. y
    z
    z’
    1-
    1-
    α

    View full-size slide

  11. ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    ≥α

    View full-size slide

  12. ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    ≥α

    View full-size slide

  13. triplet loss
    WELDON pooling

    View full-size slide

  14. ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    Instance Loss

    View full-size slide

  15. ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    ≥α
    Semantic Loss

    View full-size slide

  16. WELDON Pooling

    View full-size slide

  17. Global Average
    Pooling
    Linear
    Typical Image Classifier

    View full-size slide

  18. 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.05 0.3 0.1 0.0 0.0 0.0 00
    0.5 1.0 1.0 0.3 0.0 0.0 0.0
    0.5 1.0 1.0 1.0 0.01 0.0 0.0
    0.2 1.0 1.0 1.0 0.0 0.0 0.0
    0.0 0.05 0.4 0.2 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.05 0.3 0.1 0.0 0.0 0.0 00
    0.5 1.0 1.0 0.3 0.0 0.0 0.0
    0.5 1.0 1.0 1.0 0.01 0.0 0.0
    0.2 1.0 1.0 1.0 0.0 0.0 0.0
    0.0 0.05 0.4 0.2 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.05 0.3 0.1 0.0 0.0 0.0 00
    0.5 1.0 1.0 0.3 0.0 0.0 0.0
    0.5 1.0 1.0 1.0 0.01 0.0 0.0
    0.2 1.0 1.0 1.0 0.0 0.0 0.0
    0.0 0.05 0.4 0.2 0.0 0.0 0.0
    Global
    MAX Pooling
    Global
    Average Pooling

    View full-size slide

  19. 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.05 0.3 0.1 0.0 0.0 0.0 00
    0.5 1.0 1.0 0.3 0.0 0.0 0.0
    0.5 1.0 1.0 1.0 0.01 0.0 0.0
    0.2 1.0 1.0 1.0 0.0 0.0 0.0
    0.0 0.05 0.4 0.2 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0
    0.05 0.3 0.1 0.0 0.0 0.0 00
    0.5 1.0 1.0 0.3 0.0 0.0 0.0
    0.5 1.0 1.0 1.0 0.01 0.0 0.0
    0.2 1.0 1.0 1.0 0.0 0.0 0.0
    0.0 0.05 0.4 0.2 0.0 0.0 0.0
    min + max Pooling
    bottom m
    top k

    View full-size slide

  20. https://tokyo-ml.github.io/hotdog-tf-js/
    http://techlife.cookpad.com/entry/2018/04/06/124455

    View full-size slide